✗ AIP1 Isambard-AI Phase 1 unsupported
✓ AIP2 Isambard-AI Phase 2 supported
✗ I3 Isambard 3 unsupported
✗ BC5 BlueCrystal 5 unsupported

Installing and Running Software

Abstract

This tutorial provides an introduction to installing and running software on the Isambard supercomputers using the command line. By the end of the tutorial, users will have an understanding of how to use some common tools for installing and running software. The principles demonstrated in this tutorial are applicable across all BriCS platforms.

Prerequisites

Users will need to have accepted an invitation to a project on Isambard-AI Phase 1, Phase 2 or Isambard 3, and completed the setup tutorial. A basic knowledge of the Linux command line is also assumed.

Learning Objectives

By the end of this tutorial, users will have an understanding of how to:

Use modules to access software
Submit jobs using the slurm workload manager
Install software using a range of common tools, e.g. via conda
Access software via containers
Navigate the documentation to find further information

Table of Contents¶

Table of Contents
Logging in via SSH
Exploring Directories
Modules
- Modules Example 1- osu_hello
- Modules Example 2- Compiler versions
Running jobs via Slurm
Python via Conda
- Conda Example 1- Installing and using scipy
- Conda Example 2- Creating an environment from a yaml file
Containers
Bonus material: Spack and beyond

Logging in via SSH¶

Use the Login Guide if needed to log in. The general format for logging in is:

ssh [PROJECT].[FACILITY].isambard

Exploring Directories¶

Users have access to various storage locations. Two important ones are $HOME and $PROJECTDIR which are intended for different purposes. Use the Storage spaces information page to answer these questions:

What types of files should be stored in each location?
What is my storage quota on each?
How can I check my current usage?

To check your understanding:

Navigate to your project directory
Check your current usage of the project storage

user.project@login40:~> echo $PROJECTDIR
user.project@login40:~> cd $PROJECTDIR
user.project@login41:/projects/project>

On Isambard-AI Phase 2:

user.project@login40:~> lfs quota -h -u $USER /lus/lfs1aip2
Disk quotas for usr user.project (uid XXXXXXXXX):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
  /lus/lfs1aip2   1012G     50T     55T       - 1186544  104857600 110100480

Return to $HOME when complete.

Modules¶

Modules are a method for providing a range of basic software and libraries, and is documented in the Modules and Compilers guide.

Use module avail to see what is provided, and then module list to see what is presently loaded. A minimal set of modules are automatically loaded at login. You can recover to the default set of modules at any time using module reset.

user.project@login40:~> module avail

---------------------------------- /opt/cray/pe/lmod/lmod/modulefiles/Core ----------------------------------
   lmod    settarg

------------------------------------ /opt/cray/pe/lmod/modulefiles/core -------------------------------------
   PrgEnv-cray/8.5.0          cray-dsmml/0.3.0               cuda/12.6
...[SNIP]...
   cray-cti/2.19.1     (D)    cuda/11.8               (D)

--------------------------- /opt/cray/pe/lmod/modulefiles/craype-targets/default ----------------------------
   craype-accel-amd-gfx908    craype-arm-grace        craype-hugepages4M      craype-x86-milan-x
   craype-accel-amd-gfx90a    craype-hugepages128M    craype-hugepages512M    craype-x86-milan
   craype-accel-amd-gfx940    craype-hugepages16M     craype-hugepages64M     craype-x86-rome


user.project@login40:~> module list

Currently Loaded Modules:
  1) brics/userenv/2.7   2) brics/default/1.0

user.project@login40:~> module reset
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None

Modules Example 1- `osu_hello`¶

The osu microbenchmark suite is commonly used to measure MPI performance on HPC systems and serve as a useful example for demonstrating several features of the Isambard supercomputers. In this example it will be used to demonstrate how modules work.

First return to a clean module environment using module reset

user.project@login40:~> module reset
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None

Try invoking the osu_hello command. The programme is not available by default:

user.project@login40:~> osu_hello
-bash: osu_hello: command not found

Use module avail to see the available modules. Find the module mentioning 'osu' and load it using module load [MODULE]. Then use module list to see what has changed:

user.project@login40:~> module avail
...[SNIP]...
-------------------------------------------------------------- /tools/brics/modulefiles --------------------------------------------------------------
   brics/apptainer-multi-node/0.3.1        brics/default/1.0       (L)    brics/nccl/2.21.5-1                   brics/tmux/3.4
   brics/apptainer-multi-node/0.3.2 (D)    brics/emacs/29.4               brics/nccl/2.26.6-1            (D)    brics/userenv/2.5
   brics/aws-ofi-nccl/1.6.0                brics/nano/8.2                 brics/openmpi/4.1.7                   brics/userenv/2.6
   brics/aws-ofi-nccl/1.8.1         (D)    brics/nccl-tests/2.13.6        brics/osu-micro-benchmarks/7.5        brics/userenv/2.7 (L,D)

user.project@login40:~> module load brics/osu-micro-benchmarks
user.project@login41:~> module list

Currently Loaded Modules:
  1) brics/userenv/2.7   2) brics/default/1.0   3) brics/openmpi/4.1.7   4) brics/osu-micro-benchmarks/7.5

There are a few things to note:

It is not required to specify the version of the module. When multiple versions are available the specific version marked (D) is loaded by default.
The module system will load any dependencies automatically. In this case, loading the osu-micro-benchmarks module also loaded the openmpi module.

Now invoke osu_hello again:

user.project@login40:~> osu_hello
# OSU MPI Hello World Test v7.5
This is a test with 1 processes

The command is now available on your $PATH and can be run.

Modules Example 2- Compiler versions¶

In this example, we will use modules to access a newer version of the GCC compiler. This can be very important for build systems for large applications because they may make tricky to change assumptions about how to invoke the compiler. The Isambard supercomputers are based on the AARCH64 architecture, and the default GCC version does not provide full support for the hardware. To achieve best performance and compatibility, newer GCC versions are provided via modules.

First, return to a clean module environment using module reset and run module list to confirm what is loaded.

user.project@login40:~> module reset
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None

user.project@login40:~> module list
Currently Loaded Modules:
  1) brics/userenv/2.7   2) brics/default/1.0

Then check the version of GCC available and its location on disk:

user.project@login40:~> gcc --version
gcc (SUSE Linux) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

user.project@login40:~> which gcc
/usr/bin/gcc

Now load the PrgEnv-gnu module and repeat the previous steps:

user.project@login40:~> module load PrgEnv-gnu
user.project@login40:~> module list

Currently Loaded Modules:
  1) brics/userenv/2.7   3) gcc-native/14.2   5) craype-arm-grace   7) craype-network-ofi    9) cray-mpich/8.1.32
  2) brics/default/1.0   4) craype/2.7.34     6) libfabric/1.22.0   8) cray-libsci/25.03.0  10) PrgEnv-gnu/8.6.0

user.project@login40:~> gcc --version
gcc (SUSE Linux) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

user.project@login40:~> which gcc
/opt/cray/pe/gcc-native/14/bin/gcc

As a result, a much newer version is available when invoking gcc.

Find out more about modules in the Modules and Compilers guide

Running jobs via Slurm¶

Slurm is a widely used workload manager for scheduling work on HPC systems, and is used on most BriCS platforms. It allows user to access compute resources via job submission scripts or interactive sessions. In this section, we will use both methods to access compute. The Slurm guide provides detailed information about using Slurm on BriCS systems.

Platform specific section

This section assumes you are running on a BriCS system with GPU compute nodes, such as Isambard-AI Phase 1 or Phase 2. The exercises may be adapted for CPU-only systems, but not shown here.

Scheduled tutorials

If you are following this tutorial as part of a scheduled event, you may be given instructions to use a Slurm reservation. Please follow any such instructions to ensure you can gain prompt access to the compute nodes.

Slurm Example 1- Launching an interactive session¶

In this example, we will launch an interactive session with one GH200 GPU. Look in the Slurm guide for how to run an interactive session. In that session, run nvidia-smi to confirm to see information about the GPU you have access to.

user.project@login40:~> srun --gpus=1 --time=00:15:00 --pty /bin/bash --login
srun: job 1566942 queued and waiting for resources
srun: job 1566942 has been allocated resources

user.project@nid010229:~> nvidia-smi
Thu Nov 20 14:09:44 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GH200 120GB             On  |   00000039:01:00.0 Off |                    0 |
| N/A   35C    P0             92W /  900W |       5MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

In a new terminal window, log into the same Isambard supercomputer and run squeue --me to see your running jobs:

user.project@login40:~> ssh project.aip2.isambard
 ___                _                 _      _   ___
|_ _|___ __ _ _ __ | |__  __ _ _ _ __| |___ /_\ |_ _|
 | |(_-</ _` | '  \| '_ \/ _` | '_/ _` |___/ _ \ | |
|___/__/\__,_|_|_|_|_.__/\__,_|_| \__,_|  /_/ \_\___|
--------------------- Phase 2 -----------------------

User Documentation: https://docs.isambard.ac.uk
BriCS Helpdesk:     https://support.isambard.ac.uk
Service Status:     https://status.isambard.ac.uk

Last login: Thu Nov 20 14:09:21 2025 from 10.129.104.25
user.project@login40:~> squeue --me
  JOBID         USER PARTITION                     NAME ST TIME_LIMIT       TIME  TIME_LEFT NODES NODELIST(REASON)
1566942 user.project     workq                     bash  R      15:00       1:39      13:21     1 nid010229

To exit the interactive session on the compute node, use Ctrl+D or type logout once. Repeating will log you out of the login node.

From this example, there are a few things to note:

The command line prompt gives clues to where you are. On the login node, it shows loginXX, whereas on the compute node it shows nidXXXXXX.
The job ID is unique to your job and appeared when you launched the interactive session and in the output from squeue.

Slurm Example 2- Running a single command¶

It is also possible to run a single command on a compute node via srun. In fact, the previous example is a special case of this!

This time we will run nvidia-smi directly via srun on all 4 GPUs of the node. Look in the Slurm guide for how to run a single command and adapt the example given.

user.project@login40:~> srun --gpus=4 --time=00:02:00 nvidia-smi
srun: job 1566960 queued and waiting for resources
srun: job 1566960 has been allocated resources
Thu Nov 20 14:23:00 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GH200 120GB             On  |   00000009:01:00.0 Off |                    0 |
| N/A   33C    P0             77W /  900W |       9MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GH200 120GB             On  |   00000019:01:00.0 Off |                    0 |
| N/A   35C    P0             80W /  900W |       9MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GH200 120GB             On  |   00000029:01:00.0 Off |                    0 |
| N/A   36C    P0             83W /  900W |      10MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GH200 120GB             On  |   00000039:01:00.0 Off |                    0 |
| N/A   34C    P0            105W /  900W |      11MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Slurm Example 3- Submitting a multi-node job¶

There are many occasions when you will want to submit a job to run when possible. This is particularly common for running large, long or many jobs at once.

In this example, we will return to the osu MPI benchmarks we used earlier. The main use of these benchmarks is to measure the performance of the interconnects between nodes using the MPI communication protocol. Therefore, we will submit a job to run one of these benchmarks across two nodes.

Create a file called test_osu.sh with the following contents:

#!/bin/bash

#SBATCH --job-name=test_osu
#SBATCH --output=test_osu.out
#SBATCH --gpus=8
#SBATCH --ntasks-per-gpu=1
#SBATCH --time=00:05:00         # Hours:Mins:Secs

#Load the module to make the osu tests available
module load brics/osu-micro-benchmarks

#Run the osu test
srun osu_alltoall

#Add a sleep command for the tutorial only to allow users to look at squeue
sleep 30

Next, submit the job using sbatch and check its status using squeue --me. Once the job has completed, check the output file test_osu.out to see the results of the benchmark.

user.project@login40:~> sbatch test_osu.sh
Submitted batch job 1567044

user.project@login40:~> squeue --me
  JOBID         USER PARTITION                     NAME ST TIME_LIMIT       TIME  TIME_LEFT NODES NODELIST(REASON)
1567044 user.project     workq                 docs_ex1  R       5:00       0:06       4:54     2 nid[011155,011170]

user.project@login40:~> cat test_osu.out

# OSU MPI All-to-All Personalized Exchange Latency Test v7.5
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                       8.86
2                       8.83
...[SNIP]...

Slurm Example 4- Persistence¶

When launching work on compute nodes, note that the loaded modules persist.

On a login node, load a module of your choice, e.g. PrgEnv-gnu and list the loaded modules using module list.

user.project@login40:~> module load PrgEnv-gnu
user.project@login40:~> module list

Currently Loaded Modules:
  1) brics/userenv/2.7   3) gcc-native/14.2   5) craype-arm-grace   7) craype-network-ofi    9) cray-mpich/8.1.32
  2) brics/default/1.0   4) craype/2.7.34     6) libfabric/1.22.0   8) cray-libsci/25.03.0  10) PrgEnv-gnu/8.6.0

Now start an interactive session on a compute node as before and run module list to see that the module is still loaded.

user.project@login40:~> srun --gpus=1 --time=00:15:00 --pty /bin/bash --login
srun: job 1620768 queued and waiting for resources
srun: job 1620768 has been allocated resources
user.project@nid010317:~> module list

Currently Loaded Modules:
  1) brics/userenv/2.7   3) gcc-native/14.2   5) craype-arm-grace   7) craype-network-ofi    9) cray-mpich/8.1.32
  2) brics/default/1.0   4) craype/2.7.34     6) libfabric/1.22.0   8) cray-libsci/25.03.0  10) PrgEnv-gnu/8.6.0

This behaviour can be useful or confusing, depending on the situation. Therefore, it is good practice to explicitly load any required modules.

Python via Conda¶

Conda is a popular package manager for Python and other languages, allowing fine-grained control over versions and environments. In this section we will install a package and run a simple script.

Follow the early steps of the Python guide to install Miniforge. When asked if you wish to initialize the shell with conda init, answer no.

Now we will activate conda. When conda is activated, it starts in the base environment by default.

user.project@login44:~> source ~/miniforge3/bin/activate
(base) user.project@login44:~>

Conda Example 1- Installing and using `scipy`¶

Now we will use Python interactively to try and load the scipy package. We will find that it is not available by default. Note that we are running on the login node as we do not need access to lots of compute power:

(base) user.project@login44:~> python --version
Python 3.12.12
(base) user.project@login44:~> python
Python 3.12.12 | packaged by conda-forge | (main, Oct 22 2025, 23:16:53) [GCC 14.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scipy import constants
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
>>> exit()
(base) user.project@login44:~>

To install scipy, we will not install it into the base environment. Over time, the base environment can become cluttered and become difficult to manage. It is best practice to create separate environments for different projects. Therefore, we will create an environment called tutorial. In this example we choose a specific version of Python to demonstrate some of the features and capabilities that can become useful for more advanced use cases:

(base) user.project@login44:~> conda create -n tutorial python=3.14.0

This will take a couple of minutes to complete. Once it is done, activate the new environment:

...[SNIP]...
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate tutorial
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) user.project@login44:~> conda activate tutorial
(tutorial) user.project@login44:~>

Within this environment we can install packages without them interfering with other environments. Activate the environment and install the scipy package:

(base) user.project@login44:~> conda activate tutorial
(tutorial) user.project@login44:~> conda install scipy

Again, it will take a few moments to complete.

Now try the same interactive Python session and demonstrate that scipy is now available. Use conda deactivate twice to exit both the tutorial environment and then the base environment when done.

(tutorial) user.project@login44:~> python --version
Python 3.14.0
(tutorial) user.project@login44:~> python
Python 3.14.0 | packaged by conda-forge | (main, Oct 22 2025, 23:15:56) [GCC 14.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scipy import constants
>>> print(constants.liter)
0.001
>>> exit()
(tutorial) user.project@login44:~> conda deactivate
(base) user.project@login44:~> conda deactivate
user.project@login44:~>

Conda Example 2- Creating an environment from a yaml file¶

For more complex Python environments, the approach above may not be suitable. Keeping track of everything can be difficult, potentially compromising reproducibility. On some platforms, the hardware on the login and compute nodes may differ meaning that, for example, GPUs options may not be selected when the environment is created.

In this example, we demonstrate how to manage these concerns by creating an environment for pytorch using a yaml file on a compute node.

Start in your $HOME directory. Create a file called pytorch_conda_env.yaml with the following contents:

pytorch_conda_env.yaml

name: pytorch_env
channels:
  - conda-forge
  - nodefaults
dependencies:
  - python=3.10
  - pytorch=2.7.0
  - torchvision
  - transformers

You can also download the file: pytorch_conda_env.yaml

Now start an interactive session on a compute node with one GPU as before. Activate conda and create the environment from the yaml file:

user.project@login41:~> srun --gpus=1 --time=00:15:00 --pty /bin/bash --login
srun: job 1620646 queued and waiting for resources
srun: job 1620646 has been allocated resources
user.project@nid010547:~> source ~/miniforge3/bin/activate
(base) user.project@nid010547:~> conda env create -f pytorch_conda_env.yaml
Retrieving notices: done
Channels:
 - conda-forge
Platform: linux-aarch64
Collecting package metadata (repodata.json): done
Solving environment: done
...[SNIP]...
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate pytorch_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate
(base) user.project@nid010547:~>

This command will take a some time to complete. Now you can activate the environment:

(base) user.project@nid010547:~> conda activate pytorch_env
(pytorch_env) user.project@nid010547:~>
(pytorch_env) user.project@nid010547:~> exit
logout
user.project@login41:~>

Note that when starting the interactive session on the compute node we needed to activate conda again and load the environment. This will be the case regardless of whether conda was activated on the login node. This is also different behaviour to modules which persist when starting work on compute nodes.

Containers¶

Containers provide another method for packaging and running software, and are available on BriCS platforms. Here we will obtain and run a simple container using singularity/apptainer to demonstrate the principles involved. Examples using different container engines are provided in the Containers guide.

Containers are stored as images. It is possible to define your own but it is common to 'pull' a pre-built image from a public repository. In this example we will use the lolcow image from Docker Hub, which displays an ASCII art cow when run.

Create a fresh directory, and then pull and build the container.

user.project@login41:~> mkdir sif-images
user.project@login41:~> cd sif-images/
user.project@login41:~/sif-images> ls
user.project@login41:~/sif-images> singularity build lolcow.sif docker://sylabsio/lolcow
INFO:    Starting build...
INFO:    Fetching OCI image...
25.9MiB / 25.9MiB [===============================================================================================================] 100 % 16.5 MiB/s 0s
43.2MiB / 43.2MiB [===============================================================================================================] 100 % 16.5 MiB/s 0s
INFO:    Extracting OCI image...
INFO:    Inserting Apptainer configuration...
INFO:    Creating SIF file...
[============================================================================================================================================] 100 % 0s
INFO:    Build complete: lolcow.sif
user.project@login40:~/sif-images> ls
lolcow.sif

Now you can start using the container. There are several different ways, and the exact commands will depend on which container engine you are using.

Container Example 1- Running a single command¶

In this case you can use singularity exec to run the cowsay command within the container. We make it say the hostname of where it is running, in this case a login node:

user.project@login41:~/sif-images> singularity exec lolcow.sif cowsay $(hostname)
 _________
< login41 >
 ---------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Container Example 2- Interactive shell¶

In this case you can use singularity shell to start an interactive shell within the container.

user.project@login41:~/sif-images> singularity shell lolcow.sif
Apptainer> cowsay $(hostname)
 _________
< login41 >
 ---------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
Apptainer> exit
exit
user.project@login41:~/sif-images>

Now start an interactive job on a compute node as we did in the Slurm section, and repeat the previous step to see that the container runs in the same way on compute nodes.

user.project@login41:~/sif-images> srun --gpus=1 --time=00:15:00 --pty /bin/bash --login
srun: job 1612172 queued and waiting for resources
srun: job 1612172 has been allocated resources
user.project@nid010017:~/sif-images> singularity shell lolcow.sif
Apptainer> cowsay $(hostname)
 ___________
< nid010017 >
 -----------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
Apptainer> exit
exit
user.project@nid010017:~/sif-images> exit
logout
user.project@login41:~/sif-images>

Container Example 3- Submitting a batch job¶

Finally, you can submit a batch job to run a command within the container. Create a file called moocow-batch.sh with the following contents:

#!/usr/bin/bash
#SBATCH --job-name=moocow-test
#SBATCH --output=moocow-test.out
#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --time=00:05:00         # Hours:Mins:Secs

singularity exec lolcow.sif cowsay $(hostname)

Then submit the job using sbatch and check the output file:

user.project@login41:~/sif-images> sbatch moocow-batch.sh
Submitted batch job 1612184

user.project@login41:~/sif-images> squeue --me
  JOBID         USER PARTITION                     NAME ST TIME_LIMIT       TIME  TIME_LEFT NODES NODELIST(REASON)
1612184 rgilham.bric     workq              moocow-test CG       5:00       0:02       4:58     1 nid010022

user.project@login41:~/sif-images> ls -lrth
total 68M
-rwxr-xr-x 1 user.project user.project 68M Nov 24 09:44 lolcow.sif
-rw-r--r-- 1 user.project user.project 215 Nov 24 10:05 moocow-batch.sh
-rw-r--r-- 1 user.project user.project 163 Nov 24 10:06 moocow-test.out

user.project@login41:~/sif-images> cat moocow-test.out
 ___________
< nid010022 >
 -----------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
user.project@login41:~/sif-images>

Looking at the output from squeue, you can see that the job completed so quickly it was already in the 'CG' (completing) state. Under normal circumstances, jobs that exit extremely quickly have likely failed due to an error.

Bonus material: Spack and beyond¶

Spack performs a similar role to conda introduced in the previous sections but is more weighted towards compiled languages. It provides another option for users assembling complex research software stacks and illustrative of the self-service approach taken on BriCS platforms. This is a more challenging extension activity for users who are already comfortable with the previous sections.

Install Spack and the Isambard buildit repository following the Spack guide. Create an environment and install osu-micro-benchmarks. Use srun to run osu_bw on two nodes to measure the inter-node bandwidth.

Now run osu_bw using the module described earlier. Further options include the container approach in the documentation, or building from source code.

How can you verify and control which version you are running? What are the risks and benefits of the options, flexibility and complexity demonstrated in this example, and how might this affect how you manage your own software stack on BriCS or other supercomputers?