Slurm Overview

SLURM is the “Simple Linux Utility for Resource Management” and is an open-source bundle of software use by ARC and many other research computing centers. It provides several critical functions:

  • Resource management - monitoring cluster resources, allocating them to user workloads, and keeping track of states

  • Scheduling workloads - accepting resource requests from users, managing a queue and dynamic job prioritization, launching jobs on compute nodes

  • Accounting - Tracking user and group usage, enforcing defined resource usage limits, reporting functionality

Cluster Terminology

  • Cluster - A set of computing resources with centralized management.

  • Login node - A publicly accessible computer which serves as an entry point to access cluster resources. As a shared resource for many users, login nodes are not suitable for intensive workloads.

  • Compute Node - A physical computer which is one of several or many identically configured computers in a cluster. Access to run workloads on computer nodes is usually controlled by a resource manager. Discrete sets of resources on computer nodes (e.g. CPU cores, memory, GPUs) are usually made exclusively available to one exclusively to one job at a time.

  • Partition - A set of nodes which are grouped together to define a resource pool which is a target for jobs.

  • Job - A request for resources, usually to run a workload.

  • Queue - The list of jobs waiting for resources to become available.

  • Quality of Service (QOS) - QOS determines the priority, maximum time, and resource limits assigned to a job.

Types of Jobs

There are two basic types of jobs: “batch” and “interactive”. They use the same underlying infrastructure and parameters, but differ in that batch jobs run a sequence of commands in a remote, unattended session, while interactive jobs provide direct input and output in real time.

Batch

Job scripts have four basic required parts:

  • shebang - The first line of the batch script must be #!/bin/bash.

  • options - Define the resource request with job options #SBATCH --option=value. See section Job Configuration Parameters/Options

  • environment - Define the context within which your job executes. This could include virtual environments (conda or pip/venv) or loading modules.

  • script - A sequence of commands to run in the allocation, just like you would from the command-line.

After preparing a batch script, you can submit it to Slurm using the command sbatch <jobscript>.

sbatch <jobscript>

which will return a message something like the following:

Submitted batch job 5123

You can check the job while it sits in the queue easily with squeue.

Basic batch script example

Example script that will allocate 1 node, 1 CPU core, and will run for a maximum of 1 hour.

#!/bin/bash
#SBATCH --account=<myaccount>
#SBATCH --partition=normal_q
#SBATCH --time=0-1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
module reset
echo "Job $SLURM_JOB_ID has started on node"
hostname

A batch job will write output to a file in the working directory named by default slurm-<jobid>.out, unless otherwise specified by the job script. The job will start to run as soon as all resources required by the job (CPU cores, memory, and GPUs - if any) are available. The job will relinquish the resources when it finishes.

Interactive

Interactive jobs are resource allocations which allow for interactive input and output directly to the user terminal. This is useful for:

  • Developing and testing batch scripts

  • Debugging code

  • Running software that requires live input

  • Short exploratory workflows Important: CPU cores, memory, and GPUs remain allocated to your job until you explicitly exit the session. Idle interactive jobs waste shared resources and increase wait times for other users.

interact script

For most users, interact should be the preferred entry point for interactive jobs. interact is a convenience wrapper around Slurm that:

  • Requests an allocation

  • Launches a shell on a compute node

  • Clearly places you inside the allocated resources

  • Automatically releases resources when you exit

Request a single core for 1 hour on the normal_q partition:

[user@owl1 ~]$ interact --account=arcadm --partition=normal_q --time=1:00:00 --nodes=1 --ntasks-per-node=1 --cpus-per-task=1

After the job starts, your shell prompt will change to reflect that you are now on a compute node, not the login node:

[user@owl009 ~ ]$

You can confirm this by running:

[user@owl009 ~ ]$ hostname

Example output:

owl009

This confirms that:

  • You are now running on a compute node

  • You are using the allocated resources for your interactive job When you are finished, exit the session:

exit

You will see Slurm release the allocation:

srun: error: owl009: task 0: Exited with exit code 130

Your prompt will return to the login node:

[user@owl1 ~]$

At this point:

  • Your interactive job has ended

  • All resources have been returned to the cluster

  • You are no longer consuming compute resources

Releasing resources when finished

Always exit your interactive session as soon as you are done:

exit

Leaving an interactive session idle:

  • Blocks CPU cores, memory, and GPUs

  • Increases queue wait times for others

  • May lead to administrative intervention if resources are held unnecessarily If you accidentally leave a job running, you can terminate it from the login node:

scancel <jobid>

You can find your job ID with:

squeue

salloc + srun

While interact is recommended, users may use salloc and srun directly. Important distinction:

  • salloc creates a resource allocation

  • Commands typed after salloc still run on the login node unless launched with srun

  • The job ends when you exit the subshell

This example demonstrates how salloc gets a resource allocation which srun is then able to use. Commands which do not use a cluster task launcher like srun or mpirun are executed in place on the login node. The salloc command starts a new subshell on the login node and the job ends when the subshell is closed with the exit command:

[user@owl1 ~]$ salloc --nodes=1 --account=arcadm --ntasks=2 --partition=normal_q
salloc: Pending job allocation 80578
salloc: job 80578 queued and waiting for resources
salloc: job 80578 has been allocated resources
salloc: Granted job allocation 80578
salloc: Waiting for resource configuration
salloc: Nodes owl006 are ready for job
[user@owl1 ~]$ hostname
owl1               # Still on login node
[user@owl1 ~]$ srun hostname
owl006             # Now running on compute node
owl006
[user@owl1 ~]$ srun --ntasks=1 hostname
owl006
[user@owl1 ~]$ exit
exit
salloc: Relinquishing job allocation 80578
[user@owl1 ~]$ hostname
owl1

Best Practices for Interactive Jobs

  • Request only the resources you actually need.

  • Keep walltime short for testing.

  • Move repeatable workflows into batch scripts.

  • Do not run heavy computations on login nodes.

  • Monitor usage with:

squeue
showjobusage <jobid>

Interactive jobs are a powerful development tool, but they should be used responsibly to ensure fair access for all ARC users.

Job Status and Cluster Inspection

When you’re actively running jobs, you will often want more information about the status of the jobs including when they might start, where they are running, how long they have been running, are they making good use of the allocated resources, and so on.

Here are some commands you can use to inspect and control your jobs:

Command

Scope

scontrol show job --detail <jobid>

show full job information for a pending or running job

squeue

display your jobs which are currently pending or running

scancel <jobid>

request that Slurm immediate terminate a running job or delete a pending job from the queue

sacct

displays accounting data for jobs of all states, but by default only today’s jobs

seff <jobid>

display job efficiency information for completed jobs only

showjobusage <jobid>

display node-level resource usage information for running jobs only

sstat <jobid>

display job resource status for running job steps (advanced)

ssh <nodename>

make a direct SSH connection to a node where you have a running job

You also might want information about our cluster partitions, QoS’s, or Slurm configurations. The following are a set of commands that can be used to do this while on a login node.

Command

Scope

showqos

Shows the possible QoS options and their associated resource limits and billing costs

sinfo

View information about Slurm nodes and partitions

scontrol show partition <partition name>

Shows some default settings for job configuration variables for each partition

sinfo has many options to provide different information. The -s option provides concise list of cluster partitions and status:

[user@owl1 ~]$ sinfo -s
PARTITION     AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
normal_q*        up   infinite        84/0/0/84 owl[001-084]
preemptable_q    up   infinite        84/0/0/84 owl[001-084]

This can help identify the cluster partitions and node statuses where A/I/O/T for “Available/Idle/Other/Total”.

More information about each cluster, the node types, partitions, and usage limits can be found on cluster resource pages. You can also see real-time dashboards with information about the cluster load, utilization, and resources.

Accounting/Billing

Users will want to see information about their personal or group’s usage of time/space on ARC systems to get a sense of how much time is remaining or update their predictions for how long it takes jobs to run.

Much of this information can be viewed through the account/allocation management system Coldfront, but you can also make basic and advanced inquiries using command line tools.

Command

Scope

quota

ARC custom command to print summary information about all your active Slurm accounts and storage allocations

sacct -A <account> --start=YYYY-MM-DD -X

show all jobs run in the specified account since the specified date

getusage --account <slurm_account_name>

Shows compute hours used per account per cluster

getusage --pi <VT_PI_pid>

Shows compute hours used per PI per cluster

getusage --user <VT_pid>

Shows compute hours used per user per cluster

sshare -A <account>

View Slurm’s tally of usage