Slurm Overview and Quick Reference
SLURM is the “Simple Linux Utility for Resource Management” and is an open-source bundle of software use by ARC and many other research computing centers. It provides several critical functions:
Resource management - monitoring cluster resources, allocating them to user workloads, and keeping track of states
Scheduling workloads - accepting resource requests from users, managing a queue and dynamic job prioritization, lauching jobs on compute nodes
Accounting - Tracking user and group usage, enforcing defined resource usage limits, reporting functionality
Cluster Terminology
Cluster - A set of computing resources with centralized management.
Login node - A publicly accessible computer which serves as an entry point to access cluster resources. As a shared resource for many users, login nodes are not suitable for intensive workloads.
Compute Node - A physical computer which is one of several or many identically configured computers in a cluster. Access to run workloads on computer nodes is usually controlled by a resource manager. Discrete sets of resources on computer nodes (e.g. CPUs, memory, GPUs) are usually made exclusively available to one exclusively to one job at a time.
Partition - A set of nodes which are grouped together to define a resource pool which is a target for jobs.
Job - A request for resources, usually to run a workload.
Queue - The list of jobs waiting for resources to become available.
Cluster Inspection and Status
command |
scope |
---|---|
|
“View information about Slurm nodes and partitions” |
|
“View information about jobs located in the Slurm scheduling queue.” |
|
“View or modify Slurm configuration and state.” |
sinfo
has many options to provide different information. The -s
option provides concise list of cluster partitions and status:
[user@owl1 ~]$ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
normal_q* up infinite 81/0/1/82 owl[001-082]
dev_q up infinite 81/2/1/84 owl[001-084]
preemptable_q up infinite 81/2/1/84 owl[001-084]
largemem_q up infinite 2/0/0/2 owl-hm[001-002]
hugemem_q up infinite 1/0/0/1 owl-hm003
test_q up infinite 0/1/0/1 owltest01
interactive_q up infinite 0/4/0/4 owlmln[001-004]
This can help identify the cluster partitions and node statuses where A/I/O/T
for “Available/Idle/Other/Total”.
More information about each cluster, the node types, partitions, and usage limits can be found on cluster resource pages.
Jobs: Requesting resources
There are two basic types of jobs: “batch” and “interactive”. They use the same underlying infrastructure and parameters, but differ in that batch jobs run a sequence of commands in a remote, unattended session, while interactive jobs provide direct input and output in real time.
Batch
Job scripts have three basic required parts:
shebang - The first line of the batch script must be
#!/bin/bash
.options - Define the resource request with job options
#SBATCH --option=value
script - A sequence of commands to run in the allocation, just like you would from the command-line
After preparing a batch script, you can submit it to Slurm using the command sbatch <jobscript>
.
command |
scope |
---|---|
|
submit a job to Slurm for scheduling and execution |
Basic batch script example
#!/bin/bash
#SBATCH --account=<myaccount>
#SBATCH --partition=normal_q
#SBATCH --nodes=2
#SBATCH --time=0-1:00:00
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task-2
module reset
module list
echo "job $SLURM_JOB_ID has started on node"
hostname
A batch job will write output to a file in the working directory named like slurm-<jobid>.out
.
Interactive
Interactive jobs are resource allocations which allow for interactive input and output directly to the user terminal. This is useful for developing batch scripts or for running tasks which require direct user interation.
command |
scope |
---|---|
|
request a resource allocation and wait for commands |
|
launch a process inside a resource allocation. Requests a new allocation if one doesn’t already exist |
salloc
+ srun
example
This example demonstrates how salloc
gets a resource allocation which srun
is then able to use. Commands which do not use a cluster task launcher like srun
or mpirun
are executed in place on the login node. The salloc
command starts a new subshell on the login node and the job ends when the subshell is closed with the exit
command:
[brownm12@owl1 ~]$ salloc --nodes=1 --account=arcadm --ntasks=2 --partition=normal_q
salloc: Pending job allocation 80578
salloc: job 80578 queued and waiting for resources
salloc: job 80578 has been allocated resources
salloc: Granted job allocation 80578
salloc: Waiting for resource configuration
salloc: Nodes owl006 are ready for job
[brownm12@owl1 ~]$ hostname
owl1
[brownm12@owl1 ~]$ srun hostname
owl006
owl006
[brownm12@owl1 ~]$ srun --ntasks=1 hostname
owl006
[brownm12@owl1 ~]$ exit
exit
salloc: Relinquishing job allocation 80578
[brownm12@owl1 ~]$ hostname
owl1
Most commonly used job configuration options
The three resource request commands share a common set of options which provide a plethora of ways to set up and configure jobs. The manuals provide exhaustive information, but here are the most commonly used options with brief explanations:
short |
long |
default |
function |
notes |
---|---|---|---|---|
|
|
n/a |
name of Slurm billing account |
this is the only mandatory option |
|
|
1 |
how many nodes |
extending jobs to multiple nodes requires software orchestration |
|
|
|
select the partition to use |
|
|
|
n/a |
how many concurrent tasks you want |
not recommended for multi-node jobs |
n/a |
|
1 |
number of concurrent tasks to expect on each node |
|
|
|
1 |
number of cores to allocate to each task |
affects task to cpu binding |
|
|
30 min. |
format is |
|
n/a |
|
request |
Open OnDemand
When you use Interactive Apps in Open OnDemand, you are triggering precomposed batch scripts. The form you fill out provides entries for the options listed above for the job and the resource request is sent into the queue. When resources are available, the precomposed batch script is run on the allocated resources and for most apps, this starts a graphical user interface (GUI) which is automatically connected through to the OnDemand server so that you can interact with it in your web browser.
Job status and control
When you’re actively running jobs, you will often want more information about the status of the jobs including when they might start, where they are running, how long they have been running, are they making good use of the allocated resources, and so on.
Here are some commands you can use to inspect and control your jobs:
command |
scope |
---|---|
|
show full job information for a pending or running job |
|
display your jobs which are currently pending or running |
|
request that Slurm immediate terminate a running job or delete a pending job from the queue |
|
displays accounting data for jobs of all states, but by default only today’s jobs |
|
display job efficiency information for completed jobs |
|
display node-level resource usage information for a running job |
|
display job resource status for running job steps (advanced) |
|
make a direct SSH connection to a node where you have a running job |
Accounting
At one time or another, most people will want to see information about their personal or group’s usage of time/space on ARC systems to get a sense of how much time is remaining or update their predictions for how long it takes jobs to run.
Much of this information can be viewed through the account/allocation management system Coldfront, but you can also make basic and advanced inquiries using command line tools.
command |
scope |
---|---|
|
ARC custom command to print summary information about all your active Slurm accounts and storage allocations |
|
show all jobs run in the specified account since the specified date |
|
ARC custome command to print detail Slurm account usage |
|
View Slurm’s tally of usage |