Known Issues¶
This page provides information about current known issues that may affect users of BriCS services and facilities. If you have encountered an issue that is not listed on this page, or you are unable to resolve a known issue using a provided workaround, please get in touch with the BriCS team.
BriCS Services and Facilities¶
Use of environment variables in apptainer.¶
Last update: 2025-10-16
- Services affected
-
Isambard 3 MACS, Isambard-AI Phase 1, Isambard-AI Phase 2
- Description
-
Slurm, by default, inherits environment variables from the submission host (e.g. the login node). This can pass special environment variables such as
DBUS_SESSION_BUS_ADDRESSandXDG_RUNTIME_DIR. These variables can trigger different logic in applications and cause unexpected issues such as in apptainer. - Workaround
-
It is recommended to unset these variables if you experience issues with apptainer.
GPU specific environment variables¶
Last updated: 2025-09-18
- Services affected
-
Isambard 3 MACS, Isambard-AI Phase 1, Isambard-AI Phase 2
- Description
-
Slurm is configured to setup access to GPUs but has not been told the exact type of GPU (Nvidia, AMD, Intel etc.) and therefore sets many GPU specific environment variables for all types, such as
ROCR_VISIBLE_DEVICES,ZE_AFFINITY_MASK,GPU_DEVICE_ORDINAL. If your code depends on these variables to detect the type of GPU it may get confused. - Workaround
-
Unsetting the variables before running your code should work. This will be fixed at a future update.
quota command not available on Isambard 3¶
Last updated: 2025-07-16
- Services affected
-
Isambard 3
- Description
-
The
quotacommand is currently not available on Isambard 3 login or compute nodes. This command is used to check per-user usage and quotas for node-local scratch storage as described in Storage spaces.The command for checking per-user project storage usage and limits,
lfs quota, remains available.This issue will be fixed in a future node image update.
- Workaround
-
View usage of per-user node-local storage space using
du -hs $LOCALDIRThe limits for per-user node-local storage space are listed in Storage spaces.
clifton error "Could not get certificate"¶
Last updated: 2025-03-20
- Services affected
-
SSH certificate issuer/
clifton - Description
-
In some circumstances
clifton authwill fail with an error likeError: Could not get certificate.typically accompanied by a message likeClaim projects not presentorUser short name is empty - Workaround
-
First ensure that your UNIX user name ("short name") has been set according to the Setting your UNIX username guide. If the short name has been set, try the following steps:
- Go to https://account.isambard.ac.uk in a web browser and log in as usual with your identity provider.
- When logged in, select the drop down menu in the top right with your name and select "Sign out".
- Retry running
clifton auth.
Third Party¶
Issues with third party services
Issues with products not managed by BriCS are listed below. Where possible we will provide workarounds and report issues to the service provider, but will usually need to rely on the third party service provider to resolve the underlying issue.
Cray MPI for aarch64 in early stage of support¶
Last updated: 2025-02-25
- Services affected
-
Isambard-AI Phase 1, Isambard 3
- Description
-
Cray MPICH 8.1.30 for aarch64 is in the early stage of support and has some issues. See: CPE release notes
- Workaround
-
The current advice from the supplier HPE is that setting these variables will help circumvent most known issues
export MPICH_SMP_SINGLE_COPY_MODE=CMAandexport MPICH_MALLOC_FALLBACK=1. If program crashes occur,export FI_MR_CACHE_MONITOR=disabledorexport FI_MR_CACHE_MONITOR=memhooksmay help.
Multi-node Singularity/Apptainer: PyTorch NGC container image GLIBC mismatch¶
Last updated: 2025-07-25
- Services affected
-
Isambard-AI Phase 2
- Description
-
There are GLIBC mismatches when using multi-node containers. When using Singularity/Apptainer with the
brics/apptainer-multi-nodemodule, Podman-HPC with--openmpi-pmi2, the system GLIBC is incompatible with some versions of the PyTorch NGC container. - Workaround
-
Use one of the following versions which are known to work correctly:
- nvcr.io/nvidia/pytorch:24.12-py3
- nvcr.io/nvidia/pytorch:25.01-py3
- nvcr.io/nvidia/pytorch:25.02-py3
- nvcr.io/nvidia/pytorch:25.03-py3
- nvcr.io/nvidia/pytorch:25.04-py3
- nvcr.io/nvidia/pytorch:25.05-py3
Archived¶
For information on previous known issues that are now resolved, see the Archived Issues page.