Running Singularity containers with GPUs

Overview

Teaching: 30 min
Exercises: 20 min

Questions

How do I set up and run with GPUs from a Singularity container?

Objectives

Learn how GPU applications within Singularity containers can be run on HPC platforms

Running Singularity containers with GPUs

GPU overview

GPUs are dedicated pieces of hardware that can perform certain jobs very quickly. On our supercomputer we have a number of GPU nodes that are based on Nvidia Tesla P100 and V100 technology. To access these cards there is a kernel driver to interface with the hardware and libraries that communicate between application and kernel.

Singularity support of GPUs includes Nvidia and AMD.

GPU codes with Singularity containers

We’ve already seen that building Singularity containers can be impractical without root access. Since we’re highly unlikely to have root access on a large institutional, regional or national cluster, building a container directly on the target platform is not normally an option.

Singularity attempts to make available in the container all the files and devices required for GPU support. However, when building the image we can use some knowledge and install, for example, Nvidia CUDA libraries if we know it will support the system we are targetting. Anaconda for example installs CUDA runtime package for code that uses it.

Building and running a Singularity image with GPU support

Let us try running Pytorch with GPU support inside an Anaconda environment. The key option to supply singularity is the --nv argument to bring into the container all the required files to access the GPU. For example

$ singularity run --nv my_image.sif

Building and testing an image

Lets create a directory and within the directory create a .def file with the following contents.

Bootstrap:docker
From:continuumio/miniconda3:4.9.2

%labels
MAINTAINER Thomas Green

%environment

%runscript
. /etc/profile
conda activate pytorch
exec python3

%post
# Create some common mountpoints for systems without overlayfs
mkdir /scratch
mkdir /apps

. /etc/profile
conda create --name pytorch
conda activate pytorch
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

A quick overview of what the above definition file is doing:

The image is being bootstrapped from the miniconda Docker image.
In the %post section:
- Create some local bind locations. Backwards support for systems not supporting overlayFS.
- Source the /etc/profile to bring in the environment.
- Run the conda commands.
In the %runscript section: A runscript is set up to run Python.

Build and test the Pytorch image

Using the above definition file, build a Singularity image named pytorch.sif.

Once you have built the image, use it to run python and try using Pytorch and run example code
Solution

You should be able to build an image from the definition file as follows:
$ singularity build --remote pytorch.sif pytorch.def
If successfully built we can try running the container.
$ singularity run --nv pytorch.sif

This has demonstrated that we can successfully run GPU code from within a Singularity container.

Singularity wrap-up

This concludes the 5 episodes of the course covering Singularity. We hope you found this information useful and that it has inspired you to use Singularity to help enhance the way you build/work with research software.

As a new set of material, we appreciate that there are likely to be improvements that can be made to enhance the quality of this material. We welcome your thoughts, suggestions and feedback on improvements that could be made to help others making use of these lessons.

Key Points

Singularity images require special access to GPUs due to dependency on libraries on the host operating system.

Understand the dependency between host operating system files and GPU runtime libraries in the container.

previous episode

Introduction to containers and Singularity

next episode

Running Singularity containers with GPUs

Overview