Difference between revisions of "Singularity"
(→run) |
(→run) |
||
| Line 418: | Line 418: | ||
gfontana@mufasa2:~$ | gfontana@mufasa2:~$ | ||
</pre> | |||
== <code>shell</code> == | == <code>shell</code> == | ||
Revision as of 15:29, 25 November 2025
This page is a simple guide to the creation of Singularity-based containers.
Containers are essential for Mufasa users, since user processes must always be run within containers. This is not only due to the way Mufasa is set up, but also a good practice worth learning for HPC applications and adopted both in academia and in the enterprise.
In a nutshell, containers are complete, self-contained development environments that allow users to reproduce the exact same feature set they need on any machine, including a specific operating system, libraries, applications, and packages. They are, essentially, lightweight virtual machines you can deploy on any system featuring a container runtime (e.g., Singularity, Docker, or Podman) by simply bringing a definition file with you. A definition file is an intuitive script containing the instructions required to create (build) your container image, making it particularly convenient to migrate containers across machines and share them with your coworkers.
Though the syntax of the definition files can change depending on the specific runtime (e.g., Docker and Podman share the same syntax for their Dockerfile, while Singularity's def file varies a little), the principles and main sections are generally the same, meaning that learning how to create good quality and clear Singularity def files is a valuable skill you can reuse with any other container runtime. Likewise, the time spent learning the command syntax of Singularity to build, run, shell into, or exec commands on an interactive container or running instance (or service) is definitely worth the price.
On Mufasa, containers must be built and run via the SLURM scheduling system, either interactively or non-interactively, using Singularity as the only available container runtime.
Basic concepts
Images
In Singularity, an image is a single file representing the full container filesystem. Images can be:
- a compressed file (usually,
.siffiles, the standard Singularity format); - or an unpacked directory for development (a sandbox).
Often, an image is based on another image, with some additional customization. For example, you can instruct Singularity to build an image starting from the Ubuntu base image, adding any other applications and configurations you need to have the resulting container provide a specific service.
You might create your own images or use those created by others and published in a registry. To build your own image, you first create a definition file, using a simple syntax that defines the steps needed to create the image; then you build the image from this file and run it.
In general, it is possible to build an image on any machine (e.g., your laptop) and then move it to another machine featuring Singularity as a container runtime, including Mufasa. However, since containers you wish to complement with GPU support require access to the Nvidia libraries installed on the system where you want to run them, it is recommended to build images needing GPU resources on Mufasa itself.
Mufasa provides a specific QOS for users that need to build an image.
Containers
A Singularity container is simply a filesystem image being executed. Singularity containers:
- are unprivileged by default, i.e. they run with the same privileges possessed by the calling user (instead of those of the host's
root), which makes them safe and practical in multi-user environments; (more about this later) - if run from SIF files, they cannot modify the filesystem image during execution. Any change is applied only to the container instance being executed (i.e., a temporary sandbox);
- the filesystem image can be modified, instead, if they are created starting from images built as sandbox (
--sandbox), made writable (--writable), and run as container-exclusiveroot(--fakeroot).
These features make Singularity much more suitable for an HPC environment such as Mufasa with respect to many alternatives (such as Docker).
Mufasa does not include software libraries: to use them, users need to install the libraries in the same Singularity image where the user software runs.
Installation of software libraries in a Singularity image is possible either at building time, by including all those you need in the def file specifications, or at run time if you built the image as a sandbox and applied the above-mentioned flags when running it. The main advantage of the def file-based approach is that you can easily recreate the same image whenever you need by running a single command, while with the second approach, you can simply run commands within the container itself when it gets executed and modify it (e.g., update/install libraries and applications) interactively from the command line.
Creation of a custom Singularity image
Custom Singularity images are built using a definition file, usually stored with a .def extension (and no spaces inside the file name). This file describes how your image will be built and should be placed in an empty folder containing only the files needed to perform the building process.
For all the available specifications and options that can be provided in a def file, see the official documentation. A minimal structure comprises the following elements:
Bootstrap-
- Specifies the base image source, which is the registry from which the base image will be pulled (e.g.,
dockerfor the Docker Hub orlibraryfor the standard repository of Singularity-native images).
- Specifies the base image source, which is the registry from which the base image will be pulled (e.g.,
From-
- The path to the base image in the specified repository (e.g.,
ubuntu:22.04for the base image of Ubuntu, available from bothdockerandlibraryrepositories). Usually, you can start from a base image already including most of the libraries you need (e.g., PyTorch or TensorFlow). The last part of this path (i.e., the image name) usually takes the formname:version, as inubuntu:22.04. If the version tag is omitted in the image name, that is assumed to belatest. - Suggestion: to make your def file and builds more explicit and resilient, never omit the version tag.
- The path to the base image in the specified repository (e.g.,
%environment-
- Defines environment variables set at runtime (not at build time).
%files-
- Copies files from the host into the container.
%post-
- Commands executed inside the container during build (executed as
root). E.g., in this section, you should include all commands to install the Linux and Python packages you need in the container. Also, in this section, you can also define variables needed at build time. - Note: commands issued at build time should not involve interaction with the user, since such interaction is not possible (e.g.,
apt install <package names>should be executed with the-yoption.
- Commands executed inside the container during build (executed as
%runscript-
- Commands executed when the container is run (i.e., when using the
singularity runorsingularity instance runcommands).
- Commands executed when the container is run (i.e., when using the
In a definition file, lines beginning with # are comments.
Example of definition file
This is an example definition file to create a TensorFlow-ready image. Lines starting with # are comments.
# Base image of the container from which we start to create our custom one
Bootstrap: docker
From: tensorflow/tensorflow:2.16.1-gpu
# Note: if you do not need GPU support, you can use this alternative path instead:
tensorflow/tensorflow:2.16.1
%files
- # We copy the 'requirements.txt' file contained in the host build directory to the container.
- # This file contains all the Python libraries we wish to include in our container.
- requirements.txt
%post
- # Set the desired Python version (environment variable)
- # Note: it is suggested to set the same version of Python already installed in the container pulled at the beginning. For example, the container "tensorflow/tensorflow:2.16.1-gpu" runs Python 3.11
- python_version=3.11
- # Install the desired Python version and the other applications you need (the current TF image is based on Ubuntu, that's why we use
aptas the package manager)
- apt-get update
- apt-get install -y python${python_version}-full graphviz libgl1-mesa-glx
- # Set the default Python executable on the container, so you will not need to call it in its extended form (e.g., "python3.11") when executing scripts in the container.
- # Set default version for root user - modified version of this solution: https://jcutrer.com/linux/upgrade-python37-ubuntu1810
- update-alternatives --install /usr/local/bin/python python /usr/bin/python${python_version} 1
- # Clean pip cache before updating pip
- python -m pip cache purge
- # Update pip, setuptools, and wheel
- python -m pip install --upgrade pip setuptools wheel
- # Install the Python libraries we need
- python -m pip install -r requirements.txt
- # Clean pip cache again to free up space
- python -m pip cache purge
Examples of requirements files can be found in this GitHub repo.
When your def file is ready, follow the instructions in the next section to build the image.
Building Singularity images
With Singularity, users can either download and run ready-to-use container images from several preset repositories or customize such images to their needs, as seen in the previous section. Unlike Docker or other OCI-based container runtimes, which can simply pull images from OCI repositories, Singularity always requires building compatible filesystem images even when users don't need to customize them. Consequently, the pull command, typical of OCI container runtimes, is always replaced by the build command in Singularity, whether users want to run preassembled images or build a custom image starting from a pre-existing one. In this section, both kinds of building operations are described.
Useful options
The most useful options available when building Singularity images are the following:
--sandbox-
- enables the filesystem image to be built as a directory that the user can modify, instead of a single SIF file
--writable-
- makes the filesystem image writable
--fakeroot-
- lets the user be
root(i.e., to have full administrator privileges) within the container
- lets the user be
--nv-
- enables the image to make use of system GPUs
The following sections provide examples of use of these options.
Using SLURM to build Singularity images
As previously mentioned, image-building operations must be performed as SLURM jobs on Mufasa. The recommended QOS for such operations is the build one.
The general suggestion is to submit a non-interactive SLURM job with a sbatch execution script (named, for example, singularity_build.sbatch) placed in the directory of Mufasa where you want to perform the build and structured as follows:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=16g
#SBATCH --partition=jobs
#SBATCH --qos=build
#SBATCH --output=./logs/build_sbatch-%j.out
#SBATCH --error=./logs/build_sbatch-%j.err
#---Load the Singularity modulemodule load amd/singularity
#---Build the container imagesingularity build [options] <destination_image_folder_OR_file> <path_to_image_in_public_repo_OR_definition_file>
After adapting the build command to your needs, you can simply launch the building process with
sbatch singularity_build.sbatch
(make sure to adapt the name of the sbatch script to the one you used).
The build commands to adopt in the most common situations are explained in the next subsections.
Build a SIF or sandbox image from a pre-assembled image in Docker Hub (or other repos)
To build an SIF image from a readily available image in a public repository, use this command:
singularity build <image_name>.sif <path_to_image_in_public_repo>
Example:
singularity build ubuntu.sif docker://ubuntu:22.04
SIF images are only convenient when you want an immutable or easy-to-share image. If you prefer an uncompressed, fast-to-access, and easy-to-modify sandbox image, use the following form of the build command:
singularity build --sandbox <image_folder>/ <path_to_image_in_public_repo>
Example:
singularity build --sandbox ubuntu/ docker://ubuntu:22.04
Note that, although images built in sandbox mode can be interactively personalized when accessed with the appropriate flags (see Running Singularity images), leveraging def files as documented in the next section is always recommended to create custom images. In particular, in systems where storage per user is limited, as in Mufasa, def files allow recreating the exact same images when needed and deleting them when they are not useful in the short term.
If you wish to run the resulting SIF or sandbox image with GPU support, you must include the --nv flag, as shown below.
When building the image as an SIF file:
singularity build --nv <image_name>.sif <path_to_image_in_public_repo>
When building the image as a sandbox:
singularity build --nv --sandbox <image_folder>/ <path_to_image_in_public_repo>
Example:
singularity build --nv --sandbox ollama/ docker://ollama/ollama
Build a SIF or sandbox custom image from a definition file
As explained in the dedicated section above, a definition file lists all the operations that Singularity must perform to create a custom image. As many of these operations (e.g., apt install ...) need the user to have root permissions on the container, building an image from a definition file always requires the --fakeroot flag. This option ensures the user appears as root inside the container while remaining an unprivileged user outside (i.e., on the host system, Mufasa).
The following syntax allows building an image from a definition file. Of course, it can be combined with the --nv option to enable GPU support in the resulting image.
As a SIF file:
singularity build --fakeroot <image_name>.sif <path_to_def_file>
As a sandbox:
singularity build --fakeroot --sandbox <image_folder>/ <path_to_def_file>
Example (using the definition file described in the previous section):
singularity build --nv --fakeroot --sandbox tf_custom/ Singularity.def
Running Singularity images
Once a Singularity image has been built, you can access or use it with various Singularity sub-commands. The most common ones, namely exec, run, shell, and instance run, are described in this section.
Useful options
The most useful options available when running Singularity images are the following:
--nv-
- This option is needed when access to GPU resources, allocated through a properly set up SLURM interactive or non-interactive session, should be granted to the container. As explained previously, only containers built with the same
--nvflag are expected to work properly in this modality.
- This option is needed when access to GPU resources, allocated through a properly set up SLURM interactive or non-interactive session, should be granted to the container. As explained previously, only containers built with the same
--fakeroot-
- As mentioned in regard to the build process, this flag is needed only when we have to perform operations causing changes to the container's operating system (OS) or accessing container resources requiring administrative privileges. Examples include interactively upgrading or installing system packages in the container, or mounting host paths on privileged container directories using the --bind option, as sometimes required by specific container configurations (e.g.,
--bind /home/username/appdata:/root/.ollama). The fakeroot option should be avoided in any other case, as it is usually unnecessary, and, as a general security measure, any process must be run with the lowest privileges it requires. E.g.,--fakerootis not required and must be avoided when mounting a specific home path on an unprivileged container path, such as/home(e.g.,--bind /home/username/data:/home/data). Besides, the--fakerootoption is not required to leverage host GPU support, for which the--nvflag suffices.
- As mentioned in regard to the build process, this flag is needed only when we have to perform operations causing changes to the container's operating system (OS) or accessing container resources requiring administrative privileges. Examples include interactively upgrading or installing system packages in the container, or mounting host paths on privileged container directories using the --bind option, as sometimes required by specific container configurations (e.g.,
--writable-
- This flag is needed only if we need to modify the content of paths owned by the container's
rootuser (e.g., to install new applications interactively or modify system files), in which case it is usually combined with the--fakerootoption. The--writableoption is not needed if we just need to mount a local path on a privileged directory within the container, as explained earlier. Moreover, it generally makes sense to use this flag only when running containers created from sandbox images. Limitation:--writableand--nvoptions cannot be applied simultaneously (the container starts properly, but GPU resources cannot be accessed).
- This flag is needed only if we need to modify the content of paths owned by the container's
The following sections provide examples of use of these options.
exec
The singularity exec command executes a specific command within a container without opening a shell, hence non-interactively.
The major advantage of this modality is that when the execution of the program is terminated, the SLURM job and the related container are stopped, freeing up the related resources. This makes exec the preferred method to run Singularity images on Mufasa, i.e. the one that uses less resources and thus maximises the priority of your future jobs.
exec is especially suitable to execute machine learning (ML) model training, validation, and testing, as it ensures that the resources locked for a specific job are released and made available to other Mufasa users as soon as the job completes. More generally, exec should be preferred, in combination with sbatch, to execute any job not requiring interaction with the user or real-time supervision.
Errors and outputs from your scripts can be comprehensively captured using the specific #SBATCH --output and #SBATCH --error directives documented for running a non-interactive build job.
The general syntax to use the exec sub-command is:
singularity exec <image_folder_OR_file_OR_running_instance> <command_to_be_executed_in_the_container>
This is an example using the basic Ubuntu image built previously:
singularity exec --sandbox ubuntu/ uname -a
The following is another example, making use of the TensorFlow custom container built previously, with GPU support.
Note that the current home path you are in is mounted by default when running Singularity containers. Thus, the following will work if a 'main.py' script is provided in the same folder (or subfolder) where the Singularity command is executed.
singularity exec --nv tf_custom/ python main.py
A third example, below, uses the TensorFlow custom container built previously, with GPU support, but mounting a selected home folder (containing code and data subfolders) on a specific unprivileged path within the container:
singularity exec --nv --bind /home/username/testProject:/home/project tf_custom/ python /home/project/code/main.py
Example script for model training
Below is and example of how a sbatch script for running an ML model training task should look.
In this example, we assume the Singularity image called in the script has already been built according to the instructions provided in the relevant sections above. Details on possible and suggested values for the various #SBATCH directives can be found in the related section of this Wiki.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=<<number of CPU cores your container needs> #---4 cores are usually sufficient for GPU-intensive tasks: ask for more only if actually needed
#SBATCH --mem=<total amount of RAM your container needs>
#SBATCH --partition=jobs
#SBATCH --qos=<the QOS of your choice> #---a QOS with GPU complement is required if the --nv flag is applied below
#SBATCH --gres=<requested GPU resources (provided the selected QOS supports such a request)>
#SBATCH --output=./logs/train_sbatch-%j.out
#SBATCH --error=./logs/train_sbatch-%j.err
#---Load the Singularity modulemodule load amd/singularity
#---Run your Python training script in a container created from the previously built image WITH GPU SUPPORT.# Notes:
# - The --bind option can also be added if needed in your specific case.
# - If you don't need GPU support, build the image and 'exec' it without the --nv flag.
singularity exec --nv tf_custom/ python main_train.py
run
The singularity run command is similar to the exec sub-command, as it executes a single command (or a sequence of them) in a container before exiting. However, with the run sub-command, only the commands indicated in the %runscript section of the definition file get executed, while it is not possible to pass additional arbitrary commands as command line arguments.
For example, building and running the helloworld Docker container in a SLURM interactive session with commands
module load amd/singularity singularity build helloworld.sif docker://hello-world singularity run helloworld.sif
prints out the typical welcome message but doesn't allow further interaction with the container, as shown below.
gfontana@mufasa2:~$ module load amd/singularity
Loading amd/singularity
Loading requirement: amd/go/go-1.25.3
gfontana@mufasa2:~$ singularity build helloworld.sif docker://hello-world
INFO: Starting build...
INFO: Fetching OCI image...
INFO: Extracting OCI image...
INFO: Inserting Singularity configuration...
INFO: Creating SIF file...
INFO: Build complete: helloworld.sif
gfontana@mufasa2:~$ singularity run helloworld.sif
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
gfontana@mufasa2:~$
shell
The singularity shell command starts a container image and opens an interactive shell in it. Similar to the exec sub-command, the shell sub-command allows executing commands in a container, but interactively. As a consequence, it is generally used in interactive SLURM sessions to, for example, launch interactive jobs, personalize existing containers, or connect to containers running as instances.
The general syntax for the shell sub-command closely resembles that of the exec sub-command, with the only difference being that we can't append at the end, as arguments, commands to be executed in the container:
singularity shell <image_folder_OR_file_OR_running_instance>
For example, from within an interactive SLURM session, we can start a container from a previously built sandbox image and open a terminal in it to install a desired Ubuntu package:
# Starting the container in interactive mode. # Note: Remember the appropriate flags to make the container filesystem writable and connect to the terminal as # the container's root user. singularity shell --fakeroot --writable ubuntu/ # From within the container, update 'apt' package list and install a package of interest (e.g., htop) apt-get update apt-get install htop
Running an image as a service
The singularity instance run command executes the instructions written in the %runscript section of an image definition file, like the run sub-command. However, instead of exiting when finished, it keeps the container running in the background (i.e., detached, as a service or instance) and accessible after the last command in the sequence has been executed. This command is used to run containers providing services that should remain continuously accessible for a specific time span, even when idle (e.g., web services or LLM runners such as Ollama). This time span can be specified using the #SBATCH --time=hh:mm:ss directive when the singularity instance run command is executed as a non-interactive SLURM job through sbatch (as it should be done in production).
The general syntax to run a container as an instance is as follows (to be run through an interactive/non-interactive SLURM session/job):
singularity instance run <image_folder_OR_file> <instance_name>
While an instance is running under a SLURM job launched from the Mufasa login subsystem, we can open another SSH session and log in directly to the Mufasa host (through its dedicated IP). From there, we can:
# Load the Singularity module, as usual, to access Singularity commands module load amd/singularity # Check which Singularity instances are currently running singularity instance list # Attach the terminal to a specific instance to run commands in it singularity shell instance://<instance_name> # Send single commands to a specific instance singularity exec instance://<instance_name> <command_to_be_executed_in_the_instance>
When we don't need the running instance anymore, we can close the SSH session on the Mufasa host and simply stop the instance by terminating the associated SLURM job from the Mufasa login subsystem:
# Check the identification number of the SLURM job supporting the running Singularity instance squeue # Terminate the corresponding SLURM job scancel <slurm_job_number>
IMPORTANT: This usage modality is generally not needed for typical Mufasa users' use cases. Given the limited resources available on the system, this modality must be restricted to cases that strictly require it, as it locks up resources for a certain time period, subtracting them from other potential users even if the service running is completely inactive. Whenever possible, the exec sub-command must always be preferred!
This example shows how to run a container as an instance (i.e., sometimes called detached mode) and attach to it to perform operations supported by services running in the same container. Note: If you really need this modality, first try this out in an interactive SLURM session (requesting only a few CPU cores and little memory) to get familiar with the process before launching your actual instances as non-interactive SLURM jobs.
# The following command builds the Ollama container image (it should be run only once) singularity build --sandbox ollama_container/ docker://ollama/ollama # The following must be run in an interactive SLURM session / non-interactive SLURM job to launch the container as an instance, # to provide the Ollama main service (i.e., 'ollama serve') and associate a persistent storage for local LLMs. # Notes: # - 'ollama_data' is an empty folder located in a path of your choice within your home. # - If launched from a non-interactive SLURM job, the 'singularity instance run' command must be followed by an appropriate 'sleep' instruction # to ensure the container doesn't stop after the Ollama main service has been started. A possible instruction of this kind: # sleep 24h module load amd/singularity singularity instance run --fakeroot \ --bind /home/username/singularity_ollama/ollama_data:/root/.ollama \ ollama_container/ ollama # The following must be run directly on the Mufasa host, while the SLURM session/job is running, to connect to the Ollama instance module load amd/singularity singularity shell instance://ollama # Once we are connected to the instance, we can, for example, download a small LLM and run it: ollama pull gemma3n:e2b ollama run gemma3n:e2b