Archived issues¶
Note
The below issues are resolved. If you still face them please get in touch with the BriCS team.
BriCS Services and Facilities¶
Access issues to Isambard-AI Phase 1¶
Last updated: 2025-08-27 (Archived)
- Services affected
-
Isambard-AI Phase 1
- Description
-
Isambard-AI Phase 1 is currently unable to be accessed and are undergoing investigations.
- Workaround
-
No workaround available.
Multi-node Podman-HPC error: "Permission denied"¶
Last updated: 2025-08-07
- Services affected
-
Isambard-AI Phase 1, Isambard 3
- Description
-
Podman-HPC can sometimes get into a bad configuration when used for multi-node workloads (MPI, NCCL), resulting in errors of the form
Permission denied: '/local/user/<user-id>/storage/overlay/<HASH>'
. This can occur due to issues with user namespace mapping, used bypodman-hpc
to allow containers to run rootless. See NERSC/podman-hpc issue #116. - Workaround
-
Use
podman-hpc unshare
to enter a user namespace, then delete the directory$LOCALDIR/storage/overlay/<HASH>
(with<HASH>
as shown in the error message), and the files$LOCALDIR/storage/overlay-images/images.json
and$LOCALDIR/storage/overlay-layers/layers.json
. If the image has been migrated then the corresponding directories and files under$SCRATCHDIR
will also need to be deleted. IfPermission denied
orFileNotFoundError
issues continue, follow steps in the the podman-hpc Troubleshooting guide to clear stored data and resetpodman-hpc
.
Isambard 3 MACS "hopper" partition in extended maintenance¶
Last updated: 2025-06-27 (Archived)
- Description
-
The "hopper" partition in the MACS cluster is currently undergoing a long maintenance session due to a hardware fault with the node and currently awaiting diagnosis and parts before returning to service.
- Workaround
-
If possible, please use alternative GPU partition.
Third Party¶
Singularity internal error when pulling images from nvcr.io¶
Last updated: 2025-06-25 (Archived)
- Description
-
There is a known issue with Go (a singularity dependency) and the download of large images from the NVIDIA Container Registry nvcr.io. On failure to pull the image, the error
stream error: stream ID <NUM>; INTERNAL_ERROR; received from peer
is thrown. - Workaround
-
Temporarily disable HTTP2 using
export GODEBUG=http2client=0