Skip to content

Machine Learning Packages

GPU Accelerated Support

For Isambard-AI, this matrix shows which GPU-accelerated Machine Learning (ML) packages are supported under pip, Conda, or inside a container for Linux Arm64 (aarch64). We do not plan to provide pip wheels. The installation methods are detailed below.

ML Framework Pip Conda Container
PyTorch PyTorch
HuggingFace HuggingFace
TensorFlow TensorFlow
JAX JAX
Flash-Attention
VLLM VLLM

Prerequisites

Please see the relevant documentation for using Conda or for running containers.The containers listed below will be the Nvidia optimised images available in Nvidia GPU Cloud. When using containers ensure that you are using images with support for the ARM64 architecture.

Tip

The Arm architecture and Hopper GPUs used in Isambard-AI typically require modern versions throughout the machine learning software stack. Where possible, prefer more recent releases of machine learning packages as this usually ensures easy installation and optimal performance for your applications.

Isambard-AI Phase 2 login nodes

Isambard-AI Phase 2 login nodes do not have GPUs. Please note that there are compatibility issues with some python packages when a GPU is not detected. For this reason it is recommended to install on compute nodes. You can enter an interactive compute node session with srun -N 1 --gpus 4 --pty bash.

Click through the tabs below to find installation instructions for the respective package.

Pytorch provides pip support for CUDA and aarch64 since Pytorch 2.6:

$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
$ srun --gpus 1 python3 -c "import torch; print(torch.cuda.is_available())"
True

Make sure you use Python >3.9. The release compatibility matrix can be found here.

conda-forge provides a PyTorch package with aarch64, Cuda (GPU), and Numpy support. Please run on the compute nodes with srun to ensure cuda is available during installation.

$ srun --gpus 1 --pty conda install conda-forge::pytorch

We recommend these Pytorch images from Nvidia GPU Cloud.

For example, you can pull and run a container that provides pytorch GPU support like so:

$ singularity pull pytorch_25.05-py3.sif docker://nvcr.io/nvidia/pytorch:25.05-py3
$ srun --gpus 1 singularity run --nv pytorch_25.05-py3.sif python3 -c "import torch; print(torch.cuda.is_available())"

Tensorflow can currently be run with GPU support inside a container. The recommended container can be found here:

$ singularity pull docker://nvcr.io/nvidia/tensorflow:25.02-tf2-py3
$ srun --gpus 1 --pty singularity run --nv tensorflow_25.02-tf2-py3.sif 
Singularity> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Jax provides aarch64 and GPU compatibility out-of-the-box. You can install it in pip:

pip install jax[cuda12]

We recommend these jax images from Nvidia GPU Cloud.

$ singularity pull docker://nvcr.io/nvidia/jax:25.04-py3
$ srun --gpus 1 --pty singularity run --nv jax_25.04-py3.sif 
Singularity> python3 -c "import jax; print(jax.default_backend())"

Hugging Face can run many different frameworks as its backend. Please follow the instructions for the respective framework.

Flash attention is slow to build

It is recommended to build Flash-Attention on a compute node for faster builds, submitting the build as a job using sbatch or srun (see the Slurm guide). If you are building on a login node ensure you set MAX_JOBS=2 so you are not limited by resource contention.

Flash Attention data types for Hopper GPUs

The versions of Flash-Attention that support sm_90 (Hopper GPUs) do not support fp8.

Please use either of the bfloat16 or fp16 floating point data types.

Flash-Attention can be built in a Conda environment.

$ srun -N 1 --gpus 4 --pty bash
$ conda install conda-forge::pytorch conda-forge::flash-attn

NGC pytorch containers have flash-attention pre-bundled.

$ singularity pull pytorch_25.05-py3.sif docker://nvcr.io/nvidia/pytorch:25.05-py3
$ singularity run --nv pytorch_25.05-py3.sif python3 -m pip list | grep flash
flash_attn                    2.7.3

VLLM can be obtained using E4S containers. These also contain a wide selection of other AI-relevant packages.

The containers are large, therefore, we have downloaded them in the following public directory:

/projects/public/brics/containers/e4s/e4s-cuda90-aarch64-25.06.sif

To use the container, take a copy into your own project, and run using:

$ singularity run --nv e4s-cuda90-aarch64-25.06.sif 
Singularity> vllm --version
INFO 06-19 07:06:13 [__init__.py:239] Automatically detected platform cuda.
0.8.3.dev0+g25f560a62.d20250520