Skip to content

Archived issues

Note

The below issues are resolved. If you still face them please get in touch with the BriCS team.

BriCS Services and Facilities

Access issues to Isambard-AI Phase 1

Last updated: 2025-08-27 (Archived)

Services affected

Isambard-AI Phase 1

Description

Isambard-AI Phase 1 is currently unable to be accessed and are undergoing investigations.

Workaround

No workaround available.

Multi-node Podman-HPC error: "Permission denied"

Last updated: 2025-08-07

Services affected

Isambard-AI Phase 1, Isambard 3

Description

Podman-HPC can sometimes get into a bad configuration when used for multi-node workloads (MPI, NCCL), resulting in errors of the form Permission denied: '/local/user/<user-id>/storage/overlay/<HASH>'. This can occur due to issues with user namespace mapping, used by podman-hpc to allow containers to run rootless. See NERSC/podman-hpc issue #116.

Workaround

Use podman-hpc unshare to enter a user namespace, then delete the directory $LOCALDIR/storage/overlay/<HASH> (with <HASH> as shown in the error message), and the files $LOCALDIR/storage/overlay-images/images.json and $LOCALDIR/storage/overlay-layers/layers.json. If the image has been migrated then the corresponding directories and files under $SCRATCHDIR will also need to be deleted. If Permission denied or FileNotFoundError issues continue, follow steps in the the podman-hpc Troubleshooting guide to clear stored data and reset podman-hpc.

Isambard 3 MACS "hopper" partition in extended maintenance

Last updated: 2025-06-27 (Archived)

Description

The "hopper" partition in the MACS cluster is currently undergoing a long maintenance session due to a hardware fault with the node and currently awaiting diagnosis and parts before returning to service.

Workaround

If possible, please use alternative GPU partition.

Third Party

Singularity internal error when pulling images from nvcr.io

Last updated: 2025-06-25 (Archived)

Description

There is a known issue with Go (a singularity dependency) and the download of large images from the NVIDIA Container Registry nvcr.io. On failure to pull the image, the error stream error: stream ID <NUM>; INTERNAL_ERROR; received from peer is thrown.

Workaround

Temporarily disable HTTP2 using export GODEBUG=http2client=0