Difference between revisions of "Docker"

From Mufasa (BioHPC)
Jump to navigation Jump to search
Line 92: Line 92:
: For instance, if the host machine's filesystem is this
: For instance, if the host machine's filesystem is this


<code>
: /
: /
:: ...
:: ...
Line 104: Line 105:
:: ...
:: ...
:: /opt
:: /opt
 
</code>


:the <code>COPY</code> directive of the example copies files <code>main.py</code>, <code>requirements.txt</code> and <code>run.sh</code> from <code>/opt</code> into the <code>/opt</code> directory of the filesystem internal to the Docker container.
:the <code>COPY</code> directive of the example copies files <code>main.py</code>, <code>requirements.txt</code> and <code>run.sh</code> from <code>/opt</code> into the <code>/opt</code> directory of the filesystem internal to the Docker container.

Revision as of 14:17, 7 February 2022

[Note: this page is actively under construction]

This page intends to provide a very simple guide to the creation of Docker containers.

Docker containers are important for Mufasa users since user processes must always be run within Docker containers.

On Mufasa, Docker containers are run via the SLURM scheduling system.

Precondition

In order to run Docker containers on Mufasa, you will need to install on your own computer (not on Mufasa) the following software packages:

  • Docker, i.e. the package necessary to create docker containers
  • Nvidia Enroot, i.e. the package used to store docker containers in the .sqsh (“squash”) format that can be run via SLURM

The procedure for installation varies according to the operating system. For instance, for Linux Ubuntu installation is done with commands

sudo apt install docker.io

and

sudo apt install enroot

Basic concepts

From Docker's own documentation:

Images

An image is a read-only template with instructions for creating a Docker container.

Often, an image is based on another image, with some additional customization. For example, you may build an image which is based on the ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.

You might create your own images or you might only use those created by others and published in a registry. To build your own image, you create a Dockerfile with a simple syntax for defining the steps needed to create the image and run it. [...]


Containers

A container is a runnable instance of an image. [...]

A container is defined by its image as well as any configuration options you provide to it when you create or start it.

A Docker image does not get modified whenever a container is created from it. When the container is run, it too does not get modified: i.e., the original file describing the container remains the same, and any change is applied only to the container instance being executed.

Usually the container instance is kept read-only, i.e. the only part(s) of its internal filesystem that are writable are those specified with a WORKDIR directive (see below).

As a consequence, installation of software libraries in the Docker container is better managed by running commands within the container itself when it gets executed, not by installing the libraries within the original Docker image: this way it is possible to change the version of the libraries without having to re-create the original image file every time a new version of the libraries is released.

Creation of a Docker image

The first step in the preparation of a new Docker image is to create a work directory where you will put all the elements to be used to create the image.

Within such work directory, you will put:

  • subdirectories for all the things you need to create your container (e.g., a subdirectory “code” for the code of the job you will run on Mufasa)
  • a text file called Dockerfile (this is the full filename: it has no extension), where you specify how to create the container


The Dockerfile contains directives that Docker uses to create the Docker container. Possible directives are:


FROM <container>
tells Docker that your container will be created on the basis of an already available container (created by you or by someone else; for instance, a container from dockerhub). This is useful because you can start from a basis container already including the libraries you need (e.g., Pytorch or Tensorflow).
<container> is the name of the basis container; this usually takes the form name:version (e.g. python:3.6)
The FROM directive must be the first in the Dockerfile.
Example:
FROM python:3.6


WORKDIR <path/to/dir>
Directory in the filesystem of Mufasa that you want to import into the container. When the container is being executed, this directory will appear as if it is internal to the container, while in practice it is a link to the chosen directory in the host machine. It is used to exchange files between the host and the environment of the container. WORKDIR “mounts” a part of the host machine's filesystem onto the filesystem of the container.
Example:
WORKDIR /opt


COPY <sourcedir> <destdir>
where <destdir> is the directory specified by the WORKDIR directive or one of its subdirectories. This copies all contents of the directory in the host machine's filesystem specified by <sourcedir> to the container's own directory specified by <destdir>. Note that the syntax of Docker's COPY directive is not the same of Linux's copy command.
Example:
COPY ./code ./opt
copies all the files contained in ./code (i.e. subdirectory “code” of the parent directory of the directory where the Dockerfile resides) to ./opt in the container's own filesystem


For instance, if the host machine's filesystem is this

/
...
/home
/my_user
/code
main.py
requirements.txt
run.sh
/for_Docker
Dockerfile
...
/opt

the COPY directive of the example copies files main.py, requirements.txt and run.sh from /opt into the /opt directory of the filesystem internal to the Docker container.


RUN <command>
where <command> is any command you can issue via a bash shell. Once the container is in execution, the commands specified by RUN directives are executed in the container. The commands are run (within the container) by the root user (of the container), i.e. they enjoy full administrative privileges. Being executed by root, RUN directives have full access to the container: they can, for instance, install software packages.
<command> should not involve interaction with a user, since such interaction is not possible.
Example:
RUN pip install -r requirements.txt
pip is the program used to install Python libraries: here it is used to install all the libraries specified in an external file called requirements.txt containing statements of the form
<name_of_package>==<version>
For instance, such a file may contain the following lines:
opencv-contrib-python==4.3.0.3
opencv-python==4.3.0.36


ENTRYPOINT [ “<command>”, “argument1”, “argument2”, “argument3”, ... ]
where <command> is a command and “argumentk” are the arguments to be passed to it on the command line (this syntax is due to the fact that spaces cannot be used in an ENTRYPOINT directive). So “argumentk” the is the k-th command line argument passed to <command>.
The “entrypoint”, specified by this directive, is the command that is executed as soon as the container is in execution. Tipically the entrypoint command launches a bash shell and uses it to run a script.
The Docker container remains in execution only until the entrypoint command is in execution. If the entrypoint terminates or fails, the container gets terminated as well.
Example:
ENTRYPOINT [ “/bin/bash”, “/opt/run.sh”]

Creation of the Docker container image

Once the Dockerfile is completed and all the material required to create the container is in place in the work directory, it is time to create the container image. The container image describes the container and can be subsequently used to create a container file, for instance in the .sqsh format accepted by SLURM.

To create the container image, use command

docker build -t <name_of_image> .

where <name_of_image> is arbitrary but usually has a structure like

<name>:vXX

where <name> is any name and XX is a version, number. The “.” in the docker command tells docker that the components of the container are in the current directory. An example of name for an image created with docker may be

docker_test:v1

During the creation all the commands specified in the Dockerfile is executed (e.g., for the installation of libraries).

docker maintains a local repository of (compressed) docker images that are available on the local computer (i.e. the one used for image creation). Every time a new container image is created on the machine, it is added to the local repository. To get a list of available images in the local repository, use

docker image list

The local repository is in a system directory managed by docker. Additionally, docker allows access to remote repositories.

Every time a new .sqsh container file is created, it is possible to do so from a local or remote image. With

enroot import docker://<container_image>

you create a container file called

<container_image>.sqsh

from a remote image called <container_image> downloaded from the dockerhub remote repository. Example:

enroot import docker://python:3.6

With

enroot import dockerd://<container_image>

you create a container file called

<container_image>.sqsh

from a local image called <container_image> downloaded from the dockerhub remote repository. Example:

enroot import dockerd://docker_test:v1