Known Issues¶
This page provides information about current known issues that may affect users of BriCS services and facilities. If you have encountered an issue that is not listed on this page, or you are unable to resolve a known issue using a provided workaround, please get in touch with the BriCS team.
BriCS Services and Facilities¶
Service | Description | Workaround | Last updated |
---|---|---|---|
No known issues |
Third Party Services¶
Issues with third party services
Issues with services not managed by BriCS are listed below. Where possible we will provide workarounds and report issues to the service provider, but will usually need to rely on the third party service provider to resolve the underlying issue.
Service | Description | Workaround | Last updated |
---|---|---|---|
Podman-HPC | Podman-HPC can sometimes get into a bad configuration when used for multi-node workloads (MPI, NCCL), resulting in errors of the form Permission denied: '/local/user/<user-id>/storage/overlay/<HASH>' . This can occur due to issues with user namespace mapping, used by podman-hpc to allow containers to run rootless. See NERSC/podman-hpc issue #116. |
Use podman-hpc unshare to enter a user namespace, then delete the directory $LOCALDIR/storage/overlay/<HASH> (with <HASH> as shown in the error message), and the files $LOCALDIR/storage/overlay-images/images.json and $LOCALDIR/storage/overlay-layers/layers.json . If the image has been migrated then the corresponding directories and files under $SCRATCHDIR will also need to be deleted. If Permission denied or FileNotFoundError issues continue, follow steps in the the podman-hpc Troubleshooting guide to clear stored data and reset podman-hpc . |
03/10/24 |
Singularity/Go | There is a known issue with Go (a singularity dependency) and the download of large images from the NVIDIA Container Registry nvcr.io. On failure to pull the image, the error stream error: stream ID <NUM>; INTERNAL_ERROR; received from peer is thrown. |
Temporarily disable HTTP2 using export GODEBUG=http2client=0 |
02/08/24 |
Cray MPI | Cray MPICH 8.1.28 for aarch64 is in the early stage of support and has some issues. See: CPE release notes | The current advice from the supplier HPE is that setting these variables will help circumvent most known issues export MPICH_SMP_SINGLE_COPY_MODE=CMA and export MPICH_MALLOC_FALLBACK=1 |
02/08/24 |