Machine Learning Packages¶

GPU Accelerated Support¶

For Isambard-AI, this matrix shows which GPU-accelerated Machine Learning (ML) packages are supported under pip, Conda, or inside a container for Linux Arm64 (aarch64). We do not plan to provide pip wheels. The installation methods are detailed below.

ML Framework		Pip	Conda	Container
PyTorch		✅	✅	✅
HuggingFace		✅	✅	✅
TensorFlow		❌	❌	✅
JAX		✅	❌	✅
Flash-Attention	⚡	❌	✅	✅
VLLM		✅	❌	✅

Prerequisites

Please see the relevant documentation for using Conda or for running containers.The containers listed below will be the Nvidia optimised images available in Nvidia GPU Cloud. When using containers ensure that you are using images with support for the ARM64 architecture.

Tip

The Arm architecture and Hopper GPUs used in Isambard-AI typically require modern versions throughout the machine learning software stack. Where possible, prefer more recent releases of machine learning packages as this usually ensures easy installation and optimal performance for your applications.

Isambard-AI Phase 2 login nodes

Isambard-AI Phase 2 login nodes do not have GPUs. Please note that there are compatibility issues with some python packages when a GPU is not detected. For this reason it is recommended to install on compute nodes. You can enter an interactive compute node session with srun -N 1 --gpus 4 --pty bash.

Click through the tabs below to find installation instructions for the respective package.

Pytorch 🔥TensorflowJaxHugging Face 🤗Flash-Attention ⚡vLLM

PipCondaContainer

Pytorch provides pip support for CUDA and aarch64 since Pytorch 2.6:

$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
$ srun --gpus 1 python3 -c "import torch; print(torch.cuda.is_available())"
True

Make sure you use Python >3.9. The release compatibility matrix can be found here.

conda-forge provides a PyTorch package with aarch64, Cuda (GPU), and Numpy support. Please run on the compute nodes with srun to ensure cuda is available during installation.

$ srun --gpus 1 --pty conda install conda-forge::pytorch

We recommend these Pytorch images from Nvidia GPU Cloud.

For example, you can pull and run a container that provides pytorch GPU support like so:

$ singularity pull pytorch_25.05-py3.sif docker://nvcr.io/nvidia/pytorch:25.05-py3
$ srun --gpus 1 singularity run --nv pytorch_25.05-py3.sif python3 -c "import torch; print(torch.cuda.is_available())"

Container

Tensorflow can currently be run with GPU support inside a container. The recommended container can be found here:

$ singularity pull docker://nvcr.io/nvidia/tensorflow:25.02-tf2-py3
$ srun --gpus 1 --pty singularity run --nv tensorflow_25.02-tf2-py3.sif 
Singularity> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

PipContainer

Jax provides aarch64 and GPU compatibility out-of-the-box. You can install it in pip:

pip install jax[cuda12]

We recommend these jax images from Nvidia GPU Cloud.

$ singularity pull docker://nvcr.io/nvidia/jax:25.04-py3
$ srun --gpus 1 --pty singularity run --nv jax_25.04-py3.sif 
Singularity> python3 -c "import jax; print(jax.default_backend())"

Hugging Face can run many different frameworks as its backend. Please follow the instructions for the respective framework.

Flash attention is slow to build from source

If you need to build Flash-Attention from source rather than using a prebuilt package/container then it is recommended to build Flash-Attention on a compute node for faster builds, submitting the build as a job using sbatch or srun (see the Slurm guide). If you are building on a login node ensure you set MAX_JOBS=2 so you are not limited by resource contention.

Flash Attention data types for Hopper GPUs

The versions of Flash-Attention that support sm_90 (Hopper GPUs) do not support fp8.

Please use either of the bfloat16 or fp16 floating point data types.

CondaContainer

Flash-Attention can be built in a Conda environment.

$ srun -N 1 --gpus 4 --pty bash
$ conda install conda-forge::pytorch conda-forge::flash-attn

NGC pytorch containers have flash-attention pre-bundled.

$ singularity pull pytorch_25.05-py3.sif docker://nvcr.io/nvidia/pytorch:25.05-py3
$ singularity run --nv pytorch_25.05-py3.sif python3 -m pip list | grep flash
flash_attn                    2.7.3

PipContainer

vLLM depends on uv pip for installation to resolve the dependencies.

vLLM currently requires specific versions of uv to be installed

Use uv version 0.8.16 which can be installed like so:

$ curl -LsSf https://astral.sh/uv/0.8.16/install.sh | sh

To install vLLM v0.10.2 inside a virtual environment with uv follow these instructions:

$ mkdir vllm_env
$ cd vllm_env
$ uv venv --seed -p=3.12
$ source .venv/bin/activate
(vllm_env) $ srun --gpus 1 uv pip install -U vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/0.10.2/vllm

vLLM can be obtained using E4S containers. These also contain a wide selection of other AI-relevant packages.

To use the container, download the "aarch64 - CUDA 90" version and take a copy into your own project. You can then run it like so:

$ srun -N 1 --gpus 4 --pty bash
$ singularity run --nv e4s-cuda90-aarch64-25.06.4.sif
Singularity> vllm --version
INFO 09-25 08:18:31 [__init__.py:239] Automatically detected platform cuda.
0.8.3.dev0+g25f560a62.d20250520

A newer version is available under the dedicated vLLM virtual environment:

Singularity> source /py3.10-vllm/bin/activate
(py3.10-vllm) Singularity> vllm --version
INFO 09-25 08:19:49 [__init__.py:241] Automatically detected platform cuda.
0.10.1.2.dev0+g1da94e673.d20250903