Podman-HPC¶
Introduction¶
Podman is a daemonless, open source, Linux native tool designed to make it easy to find, run, build, share and deploy applications using Open Containers Initiative (OCI) Containers and Container Images.
For ease of management of OCI containers in an HPC environment, Podman-HPC has been adopted for use on Isambard systems. Podman-HPC (podman-hpc
) is a wrapper script around Podman (podman
), which provides HPC specific configuration and infrastructure for the Podman ecosystem.
Podman-HPC Specific Subcommands¶
Podman-HPC exposes all available Podman subcommands and arguments, and additionally provides specific subcommands for managing containers and container images in an HPC environment:
$ podman-hpc --help
Manage pods, containers and images ... on HPC!
Description:
The podman-hpc utility is a wrapper script around the podman container
engine. It provides additional subcommands for ease of use and
configuration of podman in a multi-node, multi-user high performance
computing environment.
Usage: podman-hpc [options] COMMAND [ARGS]...
Options:
--additional-stores TEXT Specify other storage locations
--squash-dir TEXT Specify alternate squash directory location
--help Show this message and exit.
Commands:
infohpc Dump configuration information for podman_hpc.
migrate Migrate an image to squashed.
pull Pulls an image to a local repository and makes a squashed...
rmsqi Removes a squashed image.
shared-run Launch a single container and exec many threads in it This is...
Preparing Locally Built Images to Run on Compute Nodes¶
When Podman-HPC builds a container image, it is written out to a local filesystem. To run a container image on a different node from where it was built, it must be moved to a shared filesystem.
System Storage
Ensure you are familiar with the System Storage specifications to support your understanding of how a container image is migrated from the login node to the compute node.
In particular it is important be aware that node-local scratch storage is not persistent.
This can be achieved with the podman-hpc migrate
subcommand (not to be confused the Podman podman system migrate
command) to convert it to a squashfs container image, which will be written out to your $SCRATCH
directory. Please see the example below for running a container on a compute node.
To see which images you have available as squashfs on $SCRATCH
you can run podman-hpc images
, which will list any such images.
Images which you pulled using podman-hpc pull
will already have a squashfs copy put onto $SCRATCH
.
Don't use podman
Once an image has been migrated it can be run on any compute node using the podman-hpc
wrapper. Remember to always use the podman-hpc
wrapper and not podman
directly as the wrapper is needed to run containers which have been migrated.
Example: Pulling and Running Images¶
$ podman-hpc pull quay.io/podman/hello
Trying to pull quay.io/podman/hello:latest...
Getting image source signatures
Copying blob 1ff9adeff444 done
Copying config 83fc7ce122 done
Writing manifest to image destination
Storing signatures
83fc7ce1224f5ed3885f6aaec0bb001c0bbb2a308e3250d7408804a720c72a32
INFO: Migrating image to /scratch/brics/<user-name>/storage
We can now run podman-hpc images
to list the currently available images. Since pulling an image will automatically migrate it to your $SCRATCH
directory, we can see true
on the R/O
(read-only) column:
$ podman-hpc images
REPOSITORY TAG IMAGE ID CREATED SIZE R/O
quay.io/podman/hello latest 83fc7ce1224f 4 months ago 580 kB false
quay.io/podman/hello latest 83fc7ce1224f 4 months ago 580 kB true
We can now run this container, which will automatically execute the entrypoint:
$ podman-hpc run quay.io/podman/hello
!... Hello Podman World ...!
.--"--.
/ - - \
/ (O) (O) \
~~~| -=(,Y,)=- |
.---. /` \ |~~
~/ o o \~~~~.----. ~~
| =(X)= |~ / (O (O) \
~~~~~~~ ~| =(Y_)=- |
~~~~ ~~~| U |~~
Example: Building images¶
To build a custom container, we can create a Containerfile
(or Dockerfile
). For example, here's a "Hello, World!" Containerfile based on the latest version of ubuntu:
FROM docker.io/ubuntu:latest
ENTRYPOINT ["echo", "Hello, World!"]
In the directory where your Containerfile exists, you can build the container and execute as following, adding the -t
flag to tag your container's name:
$ podman-hpc build . -t my_container
STEP 1/3: FROM ubuntu:latest
STEP 2/3: ENV "PODMANHPC_MODULES_DIR"="/etc/podman_hpc/modules.d"
--> 13a3c609ddd
STEP 3/3: ENTRYPOINT ["echo", "Hello, World!"]
COMMIT my_container
--> 4265376f573
Successfully tagged localhost/my_container:latest
4265376f5735af8eab16f06d204edb8be4d89bc1a520f3fcf13ab38569a03eb2
$ podman-hpc images
REPOSITORY TAG IMAGE ID CREATED SIZE R/O
quay.io/podman/hello latest 83fc7ce1224f 4 months ago 580 kB false
quay.io/podman/hello latest 83fc7ce1224f 4 months ago 580 kB true
localhost/my_container latest 4265376f5735 18 seconds ago 103 MB false
docker.io/library/ubuntu latest 2b1b17d5e5a2 4 weeks ago 103 MB false
$ podman-hpc run my_container:latest
Hello, World!
Note that your image currently only resides on the login node and needs to be migrated before running on the compute nodes.
Example: Running on a Compute node¶
Let's try and run our recently built image on a compute node using srun
:
$ srun -N 1 podman-hpc run localhost/my_container:latest
srun: job 23833 queued and waiting for resources
srun: job 23833 has been allocated resources
Resolving "my_container" using unqualified-search registries (/etc/containers/registries.conf)
Trying to pull registry.opensuse.org/my_container:latest...
Trying to pull registry.suse.com/my_container:latest...
Trying to pull docker.io/library/my_container:latest...
[...]
You can see that podman-hpc is trying to look for your image on remote registries. Let's migrate our image first then run our container:
$ podman-hpc migrate my_container
$ srun -N 1 podman-hpc run my_container:latest
srun: job 23835 queued and waiting for resources
srun: job 23835 has been allocated resources
Hello, World!
Example: Using a GPU¶
To expose GPUs to your container you can use the flag --device=nvidia.com/gpu=all
to pass all the available GPUs to the container
$ podman-hpc pull ubuntu:latest
$ podman-hpc run --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi --list-gpus
GPU 0: GH200 120GB (UUID: GPU-0f0087e6-4361-40a9-1238-b0315bbf3aba)
GPU 1: GH200 120GB (UUID: GPU-2d90c7dc-ef64-9248-a136-0c3e1a5510c9)
GPU 2: GH200 120GB (UUID: GPU-fecec81d-6144-b2f2-e8f9-28114a76df4a)
GPU 3: GH200 120GB (UUID: GPU-7efbc5d6-a5e2-0971-7e70-849491e05a02)
You can also simply use the --gpu
argument to podman-hpc.
$ podman-hpc run --gpu ubuntu:latest nvidia-smi --list-gpus
GPU 0: GH200 120GB (UUID: GPU-0f0087e6-4361-40a9-1238-b0315bbf3aba)
GPU 1: GH200 120GB (UUID: GPU-2d90c7dc-ef64-9248-a136-0c3e1a5510c9)
GPU 2: GH200 120GB (UUID: GPU-fecec81d-6144-b2f2-e8f9-28114a76df4a)
GPU 3: GH200 120GB (UUID: GPU-7efbc5d6-a5e2-0971-7e70-849491e05a02)
Multi-node Containers¶
See the guide on using Podman-HPC across multiple nodes for specific guidance on obtaining good performance when running containers over multiple nodes.
Resources¶
- NERSC's Podman-HPC beginners tutorial.