Python¶
On BriCS supercomputers, Python and Python packages can be installed and managed using tools like pip
and conda
. The recommended approach is to use Conda through Miniforge, which provides a lightweight and flexible solution for managing environments and packages, including non-Python dependencies. Virtual environments such as Conda environments and Python venv are recommended for isolating dependencies regardless of your chosen Python package management method.
Conda: Installing and Using Miniforge¶
Conda is a general purpose multi-platform package manager that installs and manages software packages.
conda-forge
Our recommended installation method is to install Conda Miniforge. This is due to the need of a license to use the mainline anaconda channel.
To install the latest version of Conda:
$ cd $HOME
$ curl --location --remote-name "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
$ bash Miniforge3-$(uname)-$(uname -m).sh
$ rm Miniforge3-$(uname)-$(uname -m).sh
We advise you don't initialise the shell with conda init
to avoid complications associated with modifying shell startup scripts. Instead we advise the use of the activate
script provided with Miniforge (see below) to perform shell initialisation as needed.
then to activate:
$ source ~/miniforge3/bin/activate
You should not install packages in your base
environment, instead create separate environments to manage your software packages.
E.g. to create an environment named test
with an installation of Python 3.10:
(base) $ conda create -n test python=3.10
(base) $ conda activate test
You can then install further packages in your test
environment using conda install
. E.g. to install the Python package scipy
:
(test) $ conda install scipy
It can be useful to specify the packages in a Conda environment in a Conda environment YAML file. This allows creation of a Conda environment using a single command:
$ conda env create -f environment.yml
You can list the installed packages in your environment using conda list
and you can deactivate your environment using conda deactivate
.
Finding aarch64
compatible Conda packages¶
Since Isambard clusters are mainly based on Linux Arm64 architecture (aarch64
), it is important to find packages built for this architecture. To find these packages, you can search the Anaconda.org website and filter the platform to linux-aarch64
.
Installing Python Packages¶
We recommended you use Conda environments to install and manage your Python packages as above.
Alternatively, Cray Python is available as a pre-installed module. It can be accessed as follows:
$ module avail # list available modules
...
$ module load cray-python
$ which python3
/opt/cray/pe/python/3.11.5/bin/python3
$ python3
Python 3.11.5 (main, Nov 29 2023, 20:19:53) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
Using Virtual Environments¶
When using Python we recommend working inside virtual environments using the Python module venv. Virtual environments isolate your pip installed dependencies for each unique project.
Let's create a venv:
$ mkdir ~/.virtualenvs/ # Create folder for virtual environments
$ python3 -m venv --upgrade-deps ~/.virtualenvs/test2
To activate our test2
environment and install the package scipy:
$ source ~/.virtualenvs/test2/bin/activate
(test2) $ which python3
$HOME/.virtualenvs/test2/bin/python3
(test2) $ python3 -m pip install scipy
To list the installed packages in your environment:
(test2) $ python3 -m pip list
pip
vs python3 -m pip
Note the use of the module approach when using pip
, i.e. python3 -m pip
, rather than pip
. This helps avoid confusion around which Python installation/virtual environment pip
is acting on. You can exit your environment using deactivate
.
Installing Python Packages for Arm 64 on Linux (aarch64
)¶
Isambard supercomputers mainly use the Arm 64 CPU architecture (see Specifications). Many python packages don't build wheels (precompiled binaries) for aarch64
. If your package isn't available through conda or a container, this would necessitate building the package from source. This section will help you with understanding the usual Python build systems.
Machine Learning Packages 🤗
Support for aarch64
by popular machine learning packages is listed in our applications page.
Compilers¶
The default compilers on Isambard supercomputers are older versions which form part of the underlying operating system build toolchain, and are typically unsuitable for building research software. It is recommended to set the CC
and CXX
environment variables to ensure pip
uses modern C and C++ compilers. Often this can allow a package to be installed using pip
on Arm.
In the following example the available GCC C++ compilers are listed and the CC
and CXX
variables are used to specify a specific version to use for installing a python package:
user@nid001040> ls /usr/bin/g++*
/usr/bin/g++ /usr/bin/g++-12 /usr/bin/g++-13 /usr/bin/g++-7
user@nid001040> CC=/usr/bin/gcc-12 CXX=/usr/bin/g++-12 pip install <PACKAGE_NAME>
setup.py
and pyproject.toml
¶
Looking through a package's setup.py
(or pyproject.toml
) should be the first port of call. The setup.py
file is the traditional build script for Python packages. Check for conditional logic based on sys.platform
or platform
. For example:
import sys
import platform
if sys.platform == "linux" and platform.uname().machine == "aarch64":
extra_compile_args = ["-march=armv8-a"]
Modern Python packages often use pyproject.toml
instead of setup.py
. It is important to review that the [build-system]
section specifies a compatible build backend like setuptools
or poetry
.
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
Understanding sys.platform
and platform()
¶
When building Python packages, the system's platform and architecture are often used to determine compatibility. Python provides tools like sys.platform
and platform()
for this purpose:
-
sys.platform
: A string that identifies the operating system. Common values include:"linux"
for Linux systems."darwin"
for macOS."win32"
for Windows.
-
platform.uname()
: Provides detailed system information, including the machine architecture (e.g.,x86_64
,aarch64
).
For example, here is the result of running these on Isambard-AI:
>>> import sys
>>> sys.platform
'linux'
>>> import platform
>>> platform.uname()
uname_result(system='Linux', node='nid001041', release='5.14.21-150500.55.31_13.0.53-cray_shasta_c_64k',
version='#1 SMP Mon Dec 4 22:56:47 UTC 2023 (03d3f83)', machine='aarch64')
>>> platform.uname().machine
'aarch64'
This is particularly useful when inspecting a package's setup.py
or pyproject.toml
to ensure compatibility with aarch64
.
Dependencies¶
In setup.py
look for install_requires()
or requirements.txt
references. Ensure all dependencies are available for aarch64
.
While for pyproject.toml
check for dependencies under [project.dependencies]
[tool.poetry.dependencies]
or equivalent sections.
CI/CD or Github actions
In a Python package's repository, GitHub Actions workflows often indicate which platforms and architectures the package is built for. You will see wheels are often packaged with a name in the form
{PACKAGE_NAME}-linux_x86_64.whl
Look for .github/workflows
in the repository. For example:
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: "3.8"
- name: Build wheel
run: python3 setup.py bdist_wheel
This can help you determine if aarch64
is supported or if modifications are needed.