Using Containers¶
Container engines can be used to deploy software packages in lightweight, standalone, and reproducible environments.
Isambard supercomputers provide 2 container engines:
Using Containers On Multiple Nodes (MPI & NCCL)¶
To achieve optimal multi-node performance with containers on Isambard-AI and Isambard 3 they must be able to communicate with the Network Interface Cards (NICs). These guides explain how to set this up. MPI (Message Passing Interface) is required for CPU workloads, while both MPI and NCCL (NVIDIA Collective Communications Library) are required for GPUs.
Prerequisites
These guides assumes familiarity with MPI and the specifications of Isambard-AI:
- MPI documentation.
- System specifications, mainly the interconnect.
Isambard-AI and Isambard 3 are HPE Cray systems based on the Slingshot 11 (SS11) interconnect. To achieve optimal latency and bandwidth the relevant dependencies are mounted inside the /host/
directory of the container to avoid interfering with other software in the container. They are then added to PATH
and LD_LIBRARY_PATH
.
Information for specific container engines:
Using Arm compatible container images¶
Isambard supercomputers mainly use the Arm 64 CPU architecture (see Specifications), also known as aarch64
. Due to this, only container images that support this architecture can be used. For instance, if you visit the Nvidia GPU Cloud (NGC) container website and look under the "Tags" page, you will see they support varying architectures:
If you expand the drop down you will see which architectures are supported:
You can select which container to pull by specifying the --arch
flag in singularity:
singularity pull --arch aarch64 pytorch_24.06-py3.sif docker://nvcr.io/nvidia/pytorch:24.06-py3
Or in podman you can use the --arch
command:
podman-hpc pull --arch aarch64 ubuntu nvcr.io/nvidia/pytorch:24.06-py3