Slurm: Basics¶
Isambard-AI and Isambard 3 use the Slurm Workload Manager to schedule and run jobs on compute nodes. This guide covers the day-to-day commands for monitoring the queue, submitting jobs, and managing them.
For advanced topics including multi-node jobs, salloc, scheduler flexibility, complex job dependencies, QOS limits, and --exclusive across platforms, see the advanced guide.
Monitoring the queue¶
Checking jobs in the queue: squeue¶
squeue lists jobs currently in the queue, and can be combined with --me to show only your own jobs:
user.project@nid001040:~> squeue --me
JOBID USER PARTITION NAME ST TIME_LIMIT TIME TIME_LEFT NODES NODELIST(REASON)
19130 user.project workq my_job R 1:00:00 0:12 0:48 1 nid001038
19131 user.project workq my_array PD 1:00:00 0:00 1:00:00 1 (Priority)
The squeue command is highly customisable but is not covered here. Please consult the Slurm documentation for further information.
Avoid excessively polling the queue
Running squeue or sinfo in a rapid loop — using watch or the --iterate flag with a short interval — floods the Slurm scheduler with queries.
This can slow down job scheduling for every user on the system, not just your own jobs.
Instead:
- Run
squeue --meorsinfomanually when you need a status update. - Use
sacctto review completed jobs (see Slurm: Advanced). - If you must poll from a script, use an interval of at least 60 seconds and do so from a single terminal only.
Disruptive polling is a breach of the acceptable use policy. Accounts may be suspended to protect the service.
Other users' jobs may not be visible
On some BriCS services, we have disabled the ability to see other users' jobs in the queue.
For details on partitions, time limits, and per-project resource limits on each system, see the job scheduling page.
Checking available resources: sinfo¶
sinfo shows the partitions available on the system and the current state of their nodes.
user.project@nid001040:~> sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
workq* up 1-00:00:00 36 idle~ nid[001001-001036]
workq* up 1-00:00:00 2 drain nid[001037-001038]
Avoid over-interpretation of sinfo
The information displayed by sinfo can be difficult to interpret and should only be used to gain a general impression of node availability.
Submitting jobs¶
At the basic level, there are two ways to submit work to the compute nodes.
Non-interactive 'batch' jobs using the sbatch command and interactive jobs using srun.
Both types are submitted to the queue.
The examples below cover single-node jobs. Running a job across multiple nodes is a common requirement and is covered in the advanced guide.
Two common mistakes when specifying resources
- Resources spread across nodes: Requesting GPUs or tasks without
--nodes=1allows Slurm to draw resources from more than one node. Most single-node workloads will fail or perform poorly as a result. The examples below include--nodes=1to prevent this. - Accidental node reservation with
--exclusive: Adding--exclusiveto ansbatchscript reserves an entire node regardless of how many GPUs or cores you actually use. This increases your queue wait time and reduces overall system utilisation. Avoid it unless your workload specifically requires it — see the advanced guide for details.
Batch jobs: sbatch¶
Jobs submitted using sbatch allow you to submit jobs to the queue to run whether you are currently logged in or not, for example overnight or at weekends. It consists of writing a batch script containing #SBATCH directives and submitting it with sbatch.
The scheduler queues the script and runs it when the requested resources are available.
Always set a time limit
Use --time to set a walltime limit for your job.
The maximum walltime on all Isambard systems is 24 hours — see the job scheduling page.
If your workload needs to run for longer, see job dependencies in the advanced guide.
Setting a shorter walltime may result in better queuing times, noting that jobs will be terminated when they reach their time limit.
Running a single batch job¶
To run a batch job, you must write a batch file, and then submit it using the sbatch command.
Requesting one GPU allocates one complete GH200 Superchip:
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --output=my_job.out
#SBATCH --nodes=1
#SBATCH --gpus=1
#SBATCH --time=00:05:00
hostname
nvidia-smi --list-gpus
user.project@nid001040:~> sbatch my_job.sh
Submitted batch job 19159
user.project@nid001040:~> cat my_job.out
nid001038
GPU 0: GH200 120GB (UUID: GPU-f9fc6950-574a-5310-dc7c-b8d02cc529db)
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --output=my_job.out
#SBATCH --nodes=1
#SBATCH --time=00:05:00
hostname
numactl -s
user.project@login02:~> sbatch my_job.sh
Submitted batch job 19159
user.project@login02:~> cat my_job.out
x3003c0s31b2n0
policy: default
preferred node: current
physcpubind: 0
cpubind: 0
nodebind: 0
membind: 0 1
Running multiple tasks in parallel¶
Use srun inside a batch script to run a command across multiple tasks simultaneously.
The following example runs a Python script on four tasks in parallel, one per GPU:
#!/bin/bash
#SBATCH --job-name=parallel_tasks
#SBATCH --output=parallel_tasks.out
#SBATCH --nodes=1
#SBATCH --gpus=4
#SBATCH --ntasks-per-gpu=1
#SBATCH --time=00:05:00
module load cray-python
srun python3 pysrun.py
#!/bin/bash
#SBATCH --job-name=parallel_tasks
#SBATCH --output=parallel_tasks.out
#SBATCH --nodes=1
#SBATCH --ntasks=3
#SBATCH --time=00:05:00
module load cray-python
srun python3 pysrun.py
Running independent job steps concurrently
You can run several independent srun steps at the same time within a batch script by appending & to each and adding wait at the end:
srun --ntasks=1 --gpus=1 --exclusive job_step_a &
srun --ntasks=1 --gpus=1 --exclusive job_step_b &
wait
The --exclusive flag ensures each step uses only the resources it requests, allowing them to run concurrently without interfering with each other.
See the advanced guide for a full discussion of --exclusive.
srun inside sbatch vs srun on the command line
srun appears in two contexts that can be confusing at first.
Inside a batch script, srun launches a job step within the resources already allocated by sbatch — it does not submit a new job to the queue.
On the command line, srun submits an entirely new job to the queue and waits for it to start.
The same command behaves differently depending on whether it is running inside an existing allocation or not.
Interactive jobs: srun¶
If you want to work on a compute node interactively, use srun. Note that the job ends if you close the terminal.
Running a single command¶
user.project@nid001040:~> srun --nodes=1 --gpus=1 --time=00:02:00 nvidia-smi --list-gpus
srun: job 19164 queued and waiting for resources
srun: job 19164 has been allocated resources
GPU 0: GH200 120GB (UUID: GPU-4833ca67-f003-3dbd-1d44-a5c7645a5ae3)
user.project@login02:~> srun --nodes=1 --time=00:02:00 numactl -s
policy: default
preferred node: current
physcpubind: 0
cpubind: 0
nodebind: 0
membind: 0 1
Starting an interactive shell¶
Use --pty to start an interactive shell on a compute node:
user.project@nid001040:~> srun --nodes=1 --gpus=1 --time=00:15:00 --pty /bin/bash --login
srun: job 22874 queued and waiting for resources
srun: job 22874 has been allocated resources
user.project@nid001005:~> hostname
nid001005
user.project@nid001005:~> nvidia-smi --list-gpus
GPU 0: GH200 120GB (UUID: GPU-f00d9a03-840c-5ea2-a748-243383b6efbc)
user.project@login02:~> srun --nodes=1 --time=00:15:00 --pty /bin/bash --login
user.project@x3010c0s19b2n0:~> numactl -s
policy: default
preferred node: current
physcpubind: 0
cpubind: 0
nodebind: 0
membind: 0 1
Managing jobs¶
Job arrays¶
Job arrays allow you to submit many similar jobs with a single sbatch command.
Each task in the array receives a unique value in the SLURM_ARRAY_TASK_ID environment variable, which your script can use to vary its behaviour (for example, to process a different input file per task).
#!/bin/bash
#SBATCH --job-name=my_array
#SBATCH --output=my_array_%a.out
#SBATCH --array=1-4
#SBATCH --nodes=1
#SBATCH --gpus=1
#SBATCH --time=00:05:00
echo "Array task ${SLURM_ARRAY_TASK_ID} running on $(hostname)"
#!/bin/bash
#SBATCH --job-name=my_array
#SBATCH --output=my_array_%a.out
#SBATCH --array=1-4
#SBATCH --nodes=1
#SBATCH --time=00:05:00
echo "Array task ${SLURM_ARRAY_TASK_ID} running on $(hostname)"
Array tasks appear in squeue with the format JOBID_TASKID:
user.project@login02:~> squeue --me
JOBID USER PARTITION NAME ST TIME_LIMIT TIME TIME_LEFT NODES NODELIST(REASON)
19200_1 user.project workq my_array R 5:00 0:01 4:59 1 nid001005
19200_2 user.project workq my_array R 5:00 0:01 4:59 1 nid001012
19200_3 user.project workq my_array PD 5:00 0:00 5:00 1 (Priority)
19200_4 user.project workq my_array PD 5:00 0:00 5:00 1 (Resources)
Arrays and scheduler load
Large arrays with short tasks can place extra load on the scheduler and consume allocated credits quickly.
For many similar short tasks, consider using concurrent srun job steps within a single batch job instead.
See the advanced guide for more on arrays and scheduler efficiency.
Job dependencies¶
Job dependencies allow you to control the order in which jobs run. A common use case is chaining jobs so that the next step only starts if the previous one succeeded.
Dependencies are specified using the --dependency flag on sbatch.
The singleton type ensures that only one job with a given name and user runs at a time:
user.project@login02:~> sbatch --dependency=singleton my_job.sh
Submitted batch job 52642
user.project@login02:~> sbatch --dependency=singleton my_job.sh
Submitted batch job 52643
user.project@login02:~> squeue --me --Format="JobID,Name,StateCompact:6,ReasonList,Dependency:32"
JOBID NAME ST NODELIST(REASON) DEPENDENCY
52643 my_job PD (Dependency) singleton(unfulfilled)
52642 my_job R nid001005 (null)
For more dependency types — including afterok for conditional chains and tips on scripting with job IDs — see the advanced guide.
Cancelling jobs: scancel¶
Use scancel followed by a job ID to cancel a job, e.g. to cancel one task of the array example above:
user.project@login02:~> scancel 19200_1
To cancel all your jobs at once:
user.project@login02:~> scancel --me
To cancel all tasks in a job array:
user.project@login02:~> scancel 19200