Abusive Use of Login Nodes
What is a login node?
The login nodes on a compute cluster are a shared resource. They need to be readily available for numerous tasks and see steady use by a constant stream of ARC researchers and students. It is very common to see 40-60 simultaneous login sessions at any given time on any ARC cluster login node.
A login node is sometime referred to as a “front-end” or “head” node to evoke the sense that they are system users will access as an entry point to the computational clusters. In contrast, the compute nodes are the computational workhorses of the clusters, but are not directly accessible to users outside the context of a running job.
Examples:
The Tinkercliffs cluster has two login nodes:
tinkercliffs1.arc.vt.edu
andtinkercliffs2.arc.vt.edu
The Infer cluster has one login node:
infer1.arc.vt.edu
The Cascades cluster has two login nodes:
cascades1.arc.vt.edu
andcascades2.arc.vt.edu
Acceptible use of a login node
Normal usage of a login node includes activities like
composing or editing a job script with a text editor like
nano
,vi
, oremacs
submitting jobs to the scheduler and monitoring the status of jobs using commands like
sbatch
,squeue
, andsacct
organizing files for job or viewing the output from a job
intiating an interactive job to get a shell on a compute node using
interact
Examples activities which are sometimes okay and sometimes are abusive
There is a significant “gray area” of workloads which are okay to run on login nodes in some cases, but are unacceptible in other cases. The deciding factor is always the impact they have on the login node. As a rule-of-thumb, if an intensive task will run for more than 2-3 minutes, it should probably be running on a compute node as part of a job.
compiling software or building python virtual environments
compressing or decompressing datasets
Unacceptible use of a login node
Any activities on a login node which noticably impacts the performance, reliability, or availability of a login node is considered unacceptible and may be subject to administrative termination.
genomic assembly or sequencing
simulations or models in StarCCM+, Ansys, COMSOL, Abaqus, Matlab, R, etc. which take more than 2-3 minutes to run