System Specifications¶
This page details the system specifications for the BriCS supercomputers Isambard-AI and Isambard 3
Isambard-AI Phase 1 consists of 42 nodes based on aarch64 architecture. Each node has 4 NVIDIA GH200 Grace Hopper Superchips. Each superchip contains one Grace CPU and one Hopper H100 GPU. The interconnect is based on Slingshot 11 where each node has 4 x Cassini NICs, each NIC operating at 200Gbps.
Per node specs:
Superchip | Processors | Cores | CPU Memory | GPUs | GPU Memory | Internal Interconnect |
---|---|---|---|---|---|---|
4 x NVIDIA GH200 Grace Hopper Superchip | 4 x Grace CPU | 4 x 72 | 4 x 120 GB | 4 x H100 Tensor Core GPU | 4 x 96 GB | NVIDIA NVLink-C2C |
Each node has 460 GB of usable CPU memory (115 GB is usable for each CPU), and 384 GB of GPU memory. In total, there is 844 GB of CPU + GPU memory per node.
The power management settings for Isambard-AI caps each GH200 to 660 Watts, shared dynamically between the CPU and GPU.
Grace uses the CPU architecture ARM (also known as aarch64). As such, code needs to be compiled for aarch64 and container images need to support aarch64. Existing code compiled for x86_64 (including conda environments) will not work.
A future expansion, Isambard-AI Phase 2 will include a further 1320 nodes (5280 NVIDIA GH200 Grace Hopper Superchips).
Isambard 3 consists of 384 nodes based on aarch64 architecture. Each node has a NVIDIA Grace CPU Superchip. Each superchip contains 2 Grace CPUs. The interconnect is based on Slingshot 11 where each node has a single Cassini NIC at 200Gbps.
Per node specs:
Superchip | Processors | Cores | CPU Memory | Internal Interconnect |
---|---|---|---|---|
1 x NVIDIA Grace CPU Superchip | 2 x Grace CPU | 2 x 72 | 2 x 120 GB | NVIDIA NVLink-C2C |
Grace uses the CPU architecture ARM (also known as aarch64). As such, code needs to be compiled for aarch64 and container images need to support aarch64. Existing code compiled for x86_64 (including conda environments) will not work.
The Isambard 3 MACS (Multi-Architecture Comparison System) contains nodes with different examples of architecture, currently all based on x86-64. The interconnect is based on Slingshot 11 where each node has a single Cassini NIC at 200Gbps. The Slurm partition names are included due to the subtle differences in each node.
Per Slurm partition specs:
Partition | Processors per Node | Number of Nodes | Cores per Node | CPU Memory | GPUs per Node | GPU Memory | Internal Interconnect |
---|---|---|---|---|---|---|---|
milan | 2 x AMD EPYC 7713 (Milan) | 12 | 2 x 64 | 256 GB | 0 | 0 GB | AMD Infinity Fabric |
genoa | 2 x AMD EPYC 9354 (Genoa) | 2 | 2 x 32 | 384 GB | 0 | 0 GB | AMD Infinity Fabric |
berg | 1 x AMD EPYC 9754 (Bergamo) | 2 | 1 x 128 | 192 GB | 0 | 0 GB | AMD Infinity Fabric |
spr | 2 x Intel(R) Xeon(R) Gold 6430 (Sapphire Rapids) | 2 | 2 x 32 | 256 GB | 0 | 0 GB | Intel Ultra Path Interconnect |
sprhbm | 2 x Intel(R) Xeon(R) CPU Max 9462 (Sapphire Rapids) | 2 | 2 x 32 | 128 GB (HBM) | 0 | 0 GB | Intel Ultra Path Interconnect |
instinct | 1 x AMD EPYC 7543P (Milan) | 2 | 1 x 32 | 256 GB | 4 x Instinct MI100 | 32 GB | AMD Infinity Fabric |
ampere | 1 x AMD EPYC 7543P (Milan) | 2 | 1 x 32 | 256 GB | 4 x Nvidia A100 SXM4 | 40 GB | AMD Infinity Fabric |
hopper | 1 x AMD EPYC 7543P (Milan) | 1 | 1 x 32 | 256 GB | 4 x Nvidia H100 PCIe | 80 GB | AMD Infinity Fabric |