Docker

From Mufasa (BioHPC)
Jump to navigation Jump to search

This page is a simple guide to the creation of Docker containers.

Docker containers are important for Mufasa users since user processes must always be run within Docker containers.

On Mufasa, Docker containers must be run via the SLURM scheduling system.


Precondition

In order to prepare Docker containers for Mufasa, you will need to install on your own computer the following software packages:

  • Docker, i.e. the package necessary to create docker containers
  • Nvidia enroot, i.e. the package used to store docker containers in the .sqsh (“squash”) format that can be run via SLURM

The procedure for installation varies according to the operating system. For instance, for Linux Ubuntu installation is done with commands

sudo apt install docker.io

and

sudo apt install enroot

Basic concepts

From Docker's documentation:

Images

An image is a read-only template with instructions for creating a Docker container.

Often, an image is based on another image, with some additional customization. For example, you may build an image which is based on the ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.

You might create your own images or you might only use those created by others and published in a registry. To build your own image, you create a Dockerfile with a simple syntax for defining the steps needed to create the image and run it. [...]


Containers

A container is a runnable instance of an image. [...]

A container is defined by its image as well as any configuration options you provide to it when you create or start it.

A Docker image does not get modified whenever a container is created from it. When the container is run, it too does not get modified: i.e., the original file describing the container remains the same, and any change is applied only to the container instance being executed.

Usually the container instance is kept read-only, i.e. the only part(s) of its internal filesystem that are writable are those specified with a WORKDIR directive (see below).

As a consequence, installation of software libraries in the Docker container is better managed by running commands within the container itself when it gets executed, not by installing the libraries within the original Docker image: this way it is possible to change the version of the libraries without having to re-create the original image file every time a new version of the libraries is released.

Creation of a Docker image

Preparation

The first step in the preparation of a new Docker image is to create a work directory where you will put all the elements to be used to create the image.

Within such work directory, you will put:

  • subdirectories for all the things you need to create your container (e.g., a subdirectory “code” for the code of the job you will run on Mufasa)
  • a text file called Dockerfile (this is the full filename: it has no extension), where you specify how to create the container


The Dockerfile contains directives that Docker uses to create the Docker container. Possible directives are:


FROM <container>
tells Docker that your container will be created on the basis of an already available container (created by you or by someone else; for instance, a container from dockerhub). This is useful because you can start from a basis container already including the libraries you need (e.g., Pytorch or Tensorflow).
<container> is the name of the basis container; this usually takes the form name:version (e.g. python:3.6)
The FROM directive must be the first in the Dockerfile.
Example:
FROM python:3.6


WORKDIR <path/to/dir>
Directory in the filesystem of Mufasa that you want to import into the container. When the container is being executed, this directory will appear as if it is internal to the container, while in practice it is a link to the chosen directory in the host machine. It is used to exchange files between the host and the environment of the container. WORKDIR “mounts” a part of the host machine's filesystem onto the filesystem of the container.
Example:
WORKDIR /opt


COPY <sourcedir> <destdir>
where <destdir> is the directory specified by the WORKDIR directive or one of its subdirectories. This copies all contents of the directory in the host machine's filesystem specified by <sourcedir> to the container's own directory specified by <destdir>. Note that the syntax of Docker's COPY directive is not the same of Linux's copy command.
Example:
COPY ./code ./opt
copies all the files contained in ./code (i.e. subdirectory “code” of the parent directory of the directory where the Dockerfile resides) to ./opt in the container's own filesystem


For instance, if the host machine's filesystem is this (where /home/my_user/for_Docker is the work directory)

/
...
/home
/my_user
/for_Docker
/code
main.py
requirements.txt
run.sh
Dockerfile
...

the COPY directive of the example copies files main.py, requirements.txt and run.sh from /home/my_user/for_Docker/code into the /opt directory of the filesystem internal to the Docker container.


RUN <command>
where <command> is any command you can issue via a bash shell. Once the container is in execution, the commands specified by RUN directives are executed in the container. The commands are run (within the container) by the root user (of the container), i.e. they enjoy full administrative privileges. Being executed by root, RUN directives have full access to the container: they can, for instance, install software packages.
<command> should not involve interaction with a user, since such interaction is not possible.
Example:
RUN pip install -r requirements.txt
pip is the program used to install Python libraries: here it is used to install all the libraries specified in an external file called requirements.txt containing statements of the form
<name_of_package>==<version>
For instance, such a file may contain the following lines:
opencv-contrib-python==4.3.0.3
opencv-python==4.3.0.36


ENTRYPOINT [ “<command>”, “argument1”, “argument2”, “argument3”, ... ]
where <command> is a command and “argumentk” are the arguments to be passed to it on the command line (this syntax is due to the fact that spaces cannot be used in an ENTRYPOINT directive). So “argumentk” the is the k-th command line argument passed to <command>.
The “entrypoint”, specified by this directive, is the command that is executed as soon as the container is in execution. Tipically the entrypoint command launches a bash shell and uses it to run a script.
The Docker container remains in execution only until the entrypoint command is in execution. If the entrypoint terminates or fails, the container gets terminated as well.
Example:
ENTRYPOINT [ “/bin/bash”, “/opt/run.sh” ]

Creation of the Docker image

Once the Dockerfile is completed and all the material required by the image is in place in the work directory, it is time to create the image. The container image describes the container and can be subsequently used to create a container file: for instance, one formatted using the .sqsh format accepted by SLURM.

To create the container image, use command

docker build -t <name_of_image> .

where <name_of_image> is the name to be assigned to the image. This name is arbitrary, but usually is structured like

<name>:vXX

where <name> is any name and XX is a version number. The “.” in the docker command tells docker that the components of the container are in the current directory.

An example of command for the creation of an image is the following:

docker build -t docker_test:v1 .

During image creation, all the commands specified in the Dockerfile are executed (e.g., for the installation of libraries).

Image library

Docker maintains a local repository of (compressed) images that are available on the local computer (i.e. the one used for image creation). Every time a new image is created on the machine, it gets automatically added to the local repository.

To get a list of available images in the local repository, use command

docker image list

The local repository is in a system directory managed by Docker.

In addition to local repositories, Docker allows access to remote repositories; the main one among these is Docker Hub.

Creation of a Docker container from an image

In order to be run on Mufasa, Docker containers must take the form of a single .sqsh compressed file (it's pronounced "squash").

Creation of a .sqsh container file can be done from a local or remote image. Command

enroot import docker://<remote_container_image>

creates a container file called

<remote_container_image>.sqsh

from a remote Docker image called <remote_container_image> downloaded from the Dockerhub remote image repository.

Example:

enroot import docker://python:3.6

To create a Docker container from a local Docker image (i.e., one stored in the local image library on your own computer) the command to use is instead

enroot import dockerd://<local_container_image>

(note that we are now using import dockerd instead of import docker).

Running the command above creates a container file called

<local_container_image>.sqsh

from a local image called <local_container_image>.

Example:

enroot import dockerd://docker_test:v1

Additional resources

See the Docker section of the Resources page for other resources (e.g., HOWTOs) about Docker.