Docker
[Note: this page is actively under construction]
This page intends to provide a very simple guide to the creation of Docker containers. Docker containers are important for Mufasa users since user processes must always be run within Docker containers.
Precondition
Install the docker and enroot software package. For instance, for Linux Ubuntu this is done with commands
sudo apt install docker.io (the docker package necessary to create docker containers)
sudo apt install enroot (to store docker containers in the .sqsh (“squash”) that can be run via SLURM
Preparation
Create a work directory where you will put all the elements to be used to create the container
Within the work directory, you will put:
• subdirectories for all the things you need to create your container (e.g., a subdirectory “code” for your code) • a text file called Dockerfile, where you specify how to create the container
The Dockerfile contains directives that Docker uses to create the Docker container. Possible directives are:
FROM <container> tells Docker that your container will be created on the basis of an already available container (created by you or by someone else; for instance, a container from dockerhub). This is useful because you can start from a basis container already including the libraries you need (e.g., Pytorch or Tensorflow). <container> is the name of the basis container; this usually takes the form name:version (e.g. python:3.6) The FROM directive must be the first in the Dockerfile.
Example: FROM python:3.6
WORKDIR <path/to/dir> Directory in the filesystem of Mufasa that you want to import into the container. When the container is being executed, this directory will appear as if it is internal to the container, while in practice it is a link to the chosen directory in the host machine. It is used to exchange files between the host and the environment of the container. WORKDIR “mounts” a part of the host machine's filesystem onto the filesystem of the container.
Example: WORKDIR /opt
COPY <sourcedir> <destdir> where <destdir> is the directory specified by the WORKDIR directive or one of its subdirectories. This copies all contents of the directory in the host machine's filesystem specified by <sourcedir> to the container's own directory specified by <destdir>. Note that the syntax of Docker's COPY directive is not the same of Linux's copy command.
Example: COPY ./code ./opt copies all the files contained in ./code (i.e. subdirectory “code” of the parent directory of the directory where the Dockerfile resides) to ./opt in the container's own filesystem
For instance, if the host machine's filesystem is this
/ ... /code main.py requirements.txt run.sh ... /opt ... /for_Docker Dockerfile
the COPY directive of the example copies files main.py, requirements.txt and run.sh from /opt into the /opt directory of the filesystem internal to the Docker container.
RUN <command> where <command> is any command you can issue via a bash shell. Once the container is in execution, the commands specified by RUN directives are executed in the container. The commands are run (within the container) by the root user (of the container). Being executed by root, RUN directives have full access to the container: they can, for instance, install software packages. <command> should not involve interaction with a user, since such interaction is not possible.
Example: RUN pip install -r requirements.txt
( pip is the program used to install Python libraries: here it is used to install all the libraries specified in an external file called requirements.txt containing statements of the form <name_of_package>==<version> For instance, such a file may contain the following lines: opencv-contrib-python==4.3.0.3 opencv-python==4.3.0.36 )
( NOTE. A Docker container does not get modified whenever it is put into execution: the original file remains the same, and any change is applied only to the instance being executed. Usually the container is even kept read-only, i.e. the only part(s) of its filesystem that are writable are those specified with a WORKDIR directive. As a consequence, installation of software libraries is better managed by running commands within the container when it gets executed, not by installing the libraries within the original Docker container file: this way it is possible to change the version of the libraries without having to re-create the original file every time a new version of the libraries is released.
ENTRYPOINT [ “<command>”, “argument1”, “argument2”, “argument3”, ... ] where <command> is a c ommand and “argumentk” are the arguments to be passed to it on the command line (this syntax is due to the fact that spaces cannot be used in an ENTRYPOINT directive). So “argumentk” the is the k-th command line argument passed to <command>. The “entrypoint”, specified by this directive, is the command that is executed as soon as the container is in execution. Tipically the entrypoint command launches a bash shell and uses it to run a script. The Docker container remains in execution only until the entrypoint command is in execution. If the entrypoint terminates or fails, the container gets terminated as well.
Example: ENTRYPOINT [ “/bin/bash”, “/opt/run.sh”]
Creation of the Docker container image
Once the Dockerfile is completed and all the material required to create the container is in place in the work directory, it is time to create the container image. The container image describes the container and can be subsequently used to create a container file, for instance in the .sqsh format accepted by SLURM.
To create the container image, use command
docker build -t <name_of_image> .
where <name_of_image> is arbitrary but usually has a structure like
<name>:vXX
where <name> is any name and XX is a version, number. The “.” in the docker command tells docker that the components of the container are in the current directory. An example of name for an image created with docker may be
docker_test:v1
During the creation all the commands specified in the Dockerfile is executed (e.g., for the installation of libraries).
docker maintains a local repository of (compressed) docker images that are available on the local computer (i.e. the one used for image creation). Every time a new container image is created on the machine, it is added to the local repository. To get a list of available images in the local repository, use
docker image list
The local repository is in a system directory managed by docker. Additionally, docker allows access to remote repositories.
Every time a new .sqsh container file is created, it is possible to do so from a local or remote image. With
enroot import docker://<container_image>
you create a container file called
<container_image>.sqsh
from a remote image called <container_image> downloaded from the dockerhub remote repository. Example:
enroot import docker://python:3.6
With
enroot import dockerd://<container_image>
you create a container file called
<container_image>.sqsh
from a local image called <container_image> downloaded from the dockerhub remote repository. Example:
enroot import dockerd://docker_test:v1