Machine Learning Packages¶
GPU Accelerated Support¶
For Isambard-AI, this matrix shows which GPU-accelerated Machine Learning (ML) packages are supported under pip, Conda, or inside a container for Linux Arm64 (aarch64
).
We do not plan to provide pip wheels. The installation methods are detailed below.
ML Framework | Pip | Conda | Container | |
---|---|---|---|---|
PyTorch | ✅ | ✅ | ✅ | |
HuggingFace | ✅ | ✅ | ✅ | |
TensorFlow | ![]() |
❌ | ❌ | ✅ |
JAX | ![]() |
✅ | ❌ | ✅ |
Flash-Attention | ⚡ | ❌ | ✅ | ✅ |
VLLM | ![]() |
❌ | ❌ | ✅ |
Prerequisites
Please see the relevant documentation for using Conda or for running containers.The containers listed below will be the Nvidia optimised images available in Nvidia GPU Cloud. When using containers ensure that you are using images with support for the ARM64 architecture.
Tip
The Arm architecture and Hopper GPUs used in Isambard-AI typically require modern versions throughout the machine learning software stack. Where possible, prefer more recent releases of machine learning packages as this usually ensures easy installation and optimal performance for your applications.
Isambard-AI Phase 2 login nodes
Isambard-AI Phase 2 login nodes do not have GPUs. Please note that there are compatibility issues with some python packages when a GPU is not detected. For this reason it is recommended to install on compute nodes. You can enter an interactive compute node session with srun -N 1 --gpus 4 --pty bash
.
Click through the tabs below to find installation instructions for the respective package.
Pytorch provides pip support for CUDA and aarch64
since Pytorch 2.6:
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
$ srun --gpus 1 python3 -c "import torch; print(torch.cuda.is_available())"
True
Make sure you use Python >3.9. The release compatibility matrix can be found here.
conda-forge
provides a PyTorch package with aarch64
, Cuda (GPU), and Numpy support. Please run on the compute nodes with srun
to ensure cuda is available during installation.
$ srun --gpus 1 --pty conda install conda-forge::pytorch
We recommend these Pytorch images from Nvidia GPU Cloud.
For example, you can pull and run a container that provides pytorch GPU support like so:
$ singularity pull pytorch_25.05-py3.sif docker://nvcr.io/nvidia/pytorch:25.05-py3
$ srun --gpus 1 singularity run --nv pytorch_25.05-py3.sif python3 -c "import torch; print(torch.cuda.is_available())"
Tensorflow can currently be run with GPU support inside a container. The recommended container can be found here:
$ singularity pull docker://nvcr.io/nvidia/tensorflow:25.02-tf2-py3
$ srun --gpus 1 --pty singularity run --nv tensorflow_25.02-tf2-py3.sif
Singularity> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Jax provides aarch64
and GPU compatibility out-of-the-box. You can install it in pip:
pip install jax[cuda12]
We recommend these jax images from Nvidia GPU Cloud.
$ singularity pull docker://nvcr.io/nvidia/jax:25.04-py3
$ srun --gpus 1 --pty singularity run --nv jax_25.04-py3.sif
Singularity> python3 -c "import jax; print(jax.default_backend())"
Hugging Face can run many different frameworks as its backend. Please follow the instructions for the respective framework.
Flash attention is slow to build
It is recommended to build Flash-Attention on a compute node for faster builds, submitting the build as a job using sbatch
or srun
(see the Slurm guide). If you are building on a login node ensure you set MAX_JOBS=2
so you are not limited by resource contention.
Flash Attention data types for Hopper GPUs
The versions of Flash-Attention that support sm_90
(Hopper GPUs) do not support fp8
.
Please use either of the bfloat16
or fp16
floating point data types.
Flash-Attention can be built in a Conda environment.
$ srun -N 1 --gpus 4 --pty bash
$ conda install conda-forge::pytorch conda-forge::flash-attn
NGC pytorch containers have flash-attention pre-bundled.
$ singularity pull pytorch_25.05-py3.sif docker://nvcr.io/nvidia/pytorch:25.05-py3
$ singularity run --nv pytorch_25.05-py3.sif python3 -m pip list | grep flash
flash_attn 2.7.3
VLLM can be obtained using E4S containers. These also contain a wide selection of other AI-relevant packages.
The containers are large, therefore, we have downloaded them in the following public directory:
/projects/public/brics/containers/e4s/e4s-cuda90-aarch64-25.06.sif
To use the container, take a copy into your own project, and run using:
$ singularity run --nv e4s-cuda90-aarch64-25.06.sif
Singularity> vllm --version
INFO 06-19 07:06:13 [__init__.py:239] Automatically detected platform cuda.
0.8.3.dev0+g25f560a62.d20250520