Skip to content

Isambard-AI Phase 2 Frequently Asked Questions

This page contains FAQs specific to Isambard-AI Phase 2. Over time the content will be updated, merged with the main documentation and this page ultimately deleted.

Early Access

Please note that access to Phase 2 should be considered “early access” as the machine will still be being configured and brought to full capacity.

Storage

Due the continuing development of the storage systems, users should, where possible, store project data in /projects/{project name}. Home storage, ie /home, is subject to change and all its contents are likely to be moved and erased at some point in the future. Contents and location of project storage are not expected to change.

Users are directed to also consult the main documentation before raising a support request.

When will my project get access to Isambard-AI Phase 2 (AIP2)?

Access to AIP2 is being rolled out on a managed basis to both existing and new projects. Project PIs will be kept updated.

How do existing Isambard-AI Phase 1 users get access?

“Isambard-AI Phase 2” will appear as a new resource in your existing project.

If users have already logged into any of the Isambard supercomputers today, they will need to log out to trigger a fresh login. Do this by visiting keycloak.isambard.ac.uk and log out by clicking on your name.

Users will then need to rerun clifton auth as they do daily, and then clifton ssh-config write to enable access to the additional platform. To log in, they then need to slightly change their ssh incantation to ssh {project name}.aip2.isambard.

What happens with node hour (NHR) allocations for existing projects?

Node hours may be used on either Phase 1 or Phase 2. Depending on the policy of the allocator, projects already using Phase 1 may be given an additional NHR allocation. PIs will be kept up to date.

Note that Phase 1 is being heavily utilised and users will need to migrate to Phase 2 to make use of any additional NHR.

What will be similar/different in comparison to Isambard-AI Phase 1?

Broadly, the experience on AIP1 and AIP2 will be the same. There are some differences in versions of software and libraries but we do not anticipate these will cause significant issues.

Although users should take care to verify and test, user applications, software stacks, containers etc should be relatively easy to port. There may be a need to rebuild to pick up dependencies, depending on your specific use case.

Note that the login nodes differ. Phase 1 uses GH200 hardware identical to the compute nodes. On phase 2, the login nodes use Grace CPUs with no GPU, so care may be needed when building software. Specific guidance will be provided in due course.

Can I start a Jupyter notebook session on a compute node?

Yes, you can follow the Jupyter Notebooks guide, but a few small changes are required.

  1. Use the following submission script which employs an alternative method to obtain the compute node's HSN IP address:

    #!/bin/bash
    #SBATCH --job-name=jupyter_user_session
    #SBATCH --gpus=1  # this also allocates 72 CPU cores and 115GB memory
    #SBATCH --time=01:00:00
    
    source ~/miniforge3/bin/activate jupyter-user-env
    
    # Add pre-installed kernelspecs to the Jupyter data search path
    export JUPYTER_PATH="/tools/brics/jupyter/jupyter_data${JUPYTER_PATH:+:}${JUPYTER_PATH:-}"
    
    LISTEN_IP="$(ip -o -4 addr list hsn0 | sed -E 's/^.*inet (([0-9]{1,3}\.){3}[0-9]{1,3}).*$/\1/')"
    LISTEN_PORT=8888
    
    set -o xtrace
    jupyter lab --no-browser --ip="${LISTEN_IP}" --port="${LISTEN_PORT}"
    
  2. When setting up an SSH tunnel from your computer, use a host name like <PROJECT>.aip2.isambard.

Is JupyterHub available?

No, we have not yet completed the required technical work for this. Please continue to use Isambard-AI Phase 1.

How do I get NCCL?

Previous advice regarding the need to run module use no longer applies. Simply run:

$ module load brics/nccl
$ module load brics/aws-ofi-nccl

The documentation isn’t quite correct?

The User Experience team are in the process of updating the documentation to reflect the addition of AIP2, whist onboarding of users is running in parallel. If you notice a correction/clarification you think would be helpful, please let us know via a helpdesk ticket (preferred) or email.