Known Issues¶
This page provides information about current known issues that may affect users of BriCS services and facilities. If you have encountered an issue that is not listed on this page, or you are unable to resolve a known issue using a provided workaround, please get in touch with the BriCS team.
BriCS Services and Facilities¶
quota
command not available on Isambard 3¶
Last updated: 2025-07-16
- Services affected
-
Isambard 3
- Description
-
The
quota
command is currently not available on Isambard 3 login or compute nodes. This command is used to check per-user usage and quotas for node-local scratch storage as described in Storage spaces.The command for checking per-user project storage usage and limits,
lfs quota
, remains available.This issue will be fixed in a future node image update.
- Workaround
-
View usage of per-user node-local storage space using
du -hs $LOCALDIR
The limits for per-user node-local storage space are listed in Storage spaces.
clifton
error "Could not get certificate"¶
Last updated: 2025-03-20
- Services affected
-
SSH certificate issuer/
clifton
- Description
-
In some circumstances
clifton auth
will fail with an error likeError: Could not get certificate.
typically accompanied by a message likeClaim projects not present
orUser short name is empty
- Workaround
-
First ensure that your UNIX user name ("short name") has been set according to the Setting your UNIX username guide. If the short name has been set, try the following steps:
- Go to https://account.isambard.ac.uk in a web browser and log in as usual with your identity provider.
- When logged in, select the drop down menu in the top right with your name and select "Sign out".
- Retry running
clifton auth
.
Third Party¶
Issues with third party services
Issues with products not managed by BriCS are listed below. Where possible we will provide workarounds and report issues to the service provider, but will usually need to rely on the third party service provider to resolve the underlying issue.
Multi-node Podman-HPC error: "Permission denied"¶
Last updated: 2024-10-03
- Services affected
-
Isambard-AI Phase 1, Isambard 3
- Description
-
Podman-HPC can sometimes get into a bad configuration when used for multi-node workloads (MPI, NCCL), resulting in errors of the form
Permission denied: '/local/user/<user-id>/storage/overlay/<HASH>'
. This can occur due to issues with user namespace mapping, used bypodman-hpc
to allow containers to run rootless. See NERSC/podman-hpc issue #116. - Workaround
-
Use
podman-hpc unshare
to enter a user namespace, then delete the directory$LOCALDIR/storage/overlay/<HASH>
(with<HASH>
as shown in the error message), and the files$LOCALDIR/storage/overlay-images/images.json
and$LOCALDIR/storage/overlay-layers/layers.json
. If the image has been migrated then the corresponding directories and files under$SCRATCHDIR
will also need to be deleted. IfPermission denied
orFileNotFoundError
issues continue, follow steps in the the podman-hpc Troubleshooting guide to clear stored data and resetpodman-hpc
.
Cray MPI for aarch64 in early stage of support¶
Last updated: 2025-02-25
- Services affected
-
Isambard-AI Phase 1, Isambard 3
- Description
-
Cray MPICH 8.1.30 for aarch64 is in the early stage of support and has some issues. See: CPE release notes
- Workaround
-
The current advice from the supplier HPE is that setting these variables will help circumvent most known issues
export MPICH_SMP_SINGLE_COPY_MODE=CMA
andexport MPICH_MALLOC_FALLBACK=1
. If program crashes occur,export FI_MR_CACHE_MONITOR=disabled
orexport FI_MR_CACHE_MONITOR=memhooks
may help.
Archived¶
For information on previous known issues that are now resolved, see the Archived Issues page.