Skip to content

Known Issues

This page provides information about current known issues that may affect users of BriCS services and facilities. If you have encountered an issue that is not listed on this page, or you are unable to resolve a known issue using a provided workaround, please get in touch with the BriCS team.

BriCS Services and Facilities

quota command not available on Isambard 3

Last updated: 2025-07-16

Services affected

Isambard 3

Description

The quota command is currently not available on Isambard 3 login or compute nodes. This command is used to check per-user usage and quotas for node-local scratch storage as described in Storage spaces.

The command for checking per-user project storage usage and limits, lfs quota, remains available.

This issue will be fixed in a future node image update.

Workaround

View usage of per-user node-local storage space using

du -hs $LOCALDIR

The limits for per-user node-local storage space are listed in Storage spaces.

clifton error "Could not get certificate"

Last updated: 2025-03-20

Services affected

SSH certificate issuer/clifton

Description

In some circumstances clifton auth will fail with an error like Error: Could not get certificate. typically accompanied by a message like Claim projects not present or User short name is empty

Workaround

First ensure that your UNIX user name ("short name") has been set according to the Setting your UNIX username guide. If the short name has been set, try the following steps:

  1. Go to https://account.isambard.ac.uk in a web browser and log in as usual with your identity provider.
  2. When logged in, select the drop down menu in the top right with your name and select "Sign out".
  3. Retry running clifton auth.

Third Party

Issues with third party services

Issues with products not managed by BriCS are listed below. Where possible we will provide workarounds and report issues to the service provider, but will usually need to rely on the third party service provider to resolve the underlying issue.

Multi-node Podman-HPC error: "Permission denied"

Last updated: 2024-10-03

Services affected

Isambard-AI Phase 1, Isambard 3

Description

Podman-HPC can sometimes get into a bad configuration when used for multi-node workloads (MPI, NCCL), resulting in errors of the form Permission denied: '/local/user/<user-id>/storage/overlay/<HASH>'. This can occur due to issues with user namespace mapping, used by podman-hpc to allow containers to run rootless. See NERSC/podman-hpc issue #116.

Workaround

Use podman-hpc unshare to enter a user namespace, then delete the directory $LOCALDIR/storage/overlay/<HASH> (with <HASH> as shown in the error message), and the files $LOCALDIR/storage/overlay-images/images.json and $LOCALDIR/storage/overlay-layers/layers.json. If the image has been migrated then the corresponding directories and files under $SCRATCHDIR will also need to be deleted. If Permission denied or FileNotFoundError issues continue, follow steps in the the podman-hpc Troubleshooting guide to clear stored data and reset podman-hpc.

Cray MPI for aarch64 in early stage of support

Last updated: 2025-02-25

Services affected

Isambard-AI Phase 1, Isambard 3

Description

Cray MPICH 8.1.30 for aarch64 is in the early stage of support and has some issues. See: CPE release notes

Workaround

The current advice from the supplier HPE is that setting these variables will help circumvent most known issues export MPICH_SMP_SINGLE_COPY_MODE=CMA and export MPICH_MALLOC_FALLBACK=1. If program crashes occur, export FI_MR_CACHE_MONITOR=disabled or export FI_MR_CACHE_MONITOR=memhooks may help.

Archived

For information on previous known issues that are now resolved, see the Archived Issues page.