Latest revision as of 16:42, 20 July 2026

Running jobs with SLURM

Users of Mufasa must use SLURM to run resource-heavy processes, i.e. computing jobs that require one or more of the following:

GPUs
multiple CPUs
powerful CPUs
a significant amount of RAM.

In fact, only processes run via SLURM have access to all the resources of Mufasa. Processes run outside SLURM are executed by the login server virtual machine, which has minimal resources and no GPUs. Using SLURM is therefore the only way to execute resource-heavy jobs on Mufasa. This is a key difference between Mufasa 1.0 and Mufasa 2.0.

`srun` and `sbatch`

SLURM provides two commands to run jobs, called srun and sbatch:

srun [options] <command_to_be_run_via_SLURM>

sbatch [options] <command_to_be_run_via_SLURM>

In both cases, <command_to_be_run_via_SLURM> can be any Linux program (including shell scripts). By using srun or sbatch, the command or script specified by <command_to_be_run_via_SLURM> (including any programs launched by it) are added to SLURM's execution queues.

The main difference between srun and sbatch is that the first locks the shell from which it has been launched, so it is only really suitable for interactive jobs: i.e., processes that use the console to interact with their user during job execution. sbatch, on the other side, does not lock the shell and simply adds the job to the queue, but does not allow the user to interact with the process while it is running.

sbatch provides an additional possibility: <command_to_be_run_via_SLURM> can in fact be an execution script, i.e. a special (and SLURM-specific) type of Linux shell script that includes SBATCH directives. SBATCH directives can be used to specify the values of some of the parameters that would otherwise have to be set using the [options] part of the sbatch command. This is handy because it allows to write down the parameters in an execution script instead of having to write them in the command line while launching a job, which greatly reduces the possibility of mistakes. Also, an execution script is easy to keep and reuse.

Immediately after a srun or sbatch command is launched by a user, SLURM outputs a message informing the user that the job has been queued. The output is similar to this:

srun: job 10849 queued and waiting for resources

The shell is now locked while SLURM prepares the execution of the user program (if you are using screen you can detach from that shell and come back later).

When SLURM is ready to run the program, it prints a message similar to

srun: job 10849 has been allocated resources

and then executes the program.

Options of `srun` and `sbatch`

The [options] part of srun and sbatch commands is used to tell SLURM what resources the job needs to be executed the job and how much time it will need to complete its execution.

For what concerns resources, the most important option is --qos <qos_name>, specifying which SLURM SLURM QOS the job will use. A job run with a given QOS has access to all and only the resources available to that QOS. As a consequence, all options that define how many resources to assign the job will only be able to provide the job with resources that are available to the chosen QOS. Jobs that require resources that are not available to the chosen QOS do not get executed.

If the user forgets to use option --qos <qos_name>, the job is run on the default qos (normal) which has access to zero resources. Therefore it is always necessary to specify option --qos <qos_name> when launching a SLURM job on Mufasa.

More generally, the most relevant among the [options] are:

‑-qos=<qos__name>: specifies the SLURM QOS that the job will use. It is mandatory to specify one.

Important! The chosen QOS limits the resources that can be requested, since it is not allowed to request resources (type or quantity) that exceed what is available to the chosen QOS.

Important! If ‑‑qos <qos_name> is used and options that specify how many resources to assign to the job (such as ‑‑mem=<mem_resources>, ‑‑cpus‑per‑task=<cpu_amount> or ‑‑time=<duration>) are omitted, the job is assigned the default amount of the resource (as defined by the chosen QOS. A notable exception concerns option ‑‑gres=<gpu_resources>, which is always required (see below) if the job uses a QOS with access to GPUs.

--job-name=<jobname>: Specifies a name for the job. The specified name will appear along with the JOBID number when querying running jobs on the system with squeue. The default job name (i.e., the one assigned to the job when --job-name is not used) is the executable program's name.

‑‑gres=<gpu_resources>: specifies what GPUs to assign to the job. gpu_resources is a comma-delimited list where each element has the form gpu:<Type>:<amount>, where <Type> is one of the types of GPU available on Mufasa (see gres syntax) and <amount> is an integer between 1 and the number of GPUs of such type available to the partition. For instance, <gpu_resources> may be gpu:40gb:1,gpu:3g.20gb:1, corresponding to asking for one "full" GPU and 1 "small" GPU.

Important! The ‑‑gres parameter is mandatory if the job is run with a QOS that allows access to the system's GPUs. Differently from other resources (where unspecified requests lead to the assignment of a default amount), GPUs must always be explicitly requested.

‑‑mem=<mem_resources>: specifies the amount of RAM to assign to the job; for instance, <mem_resources> may be 200G

‑‑cpus-per-task=<cpu_amount>: specifies how many CPUs to assign to the job; for instance, <cpu_amount> may be 2

‑‑time=<duration>: specifies the maximum time allowed to the job to complete, in the format days-hours:minutes:seconds, where days is optional; for instance, <d-hh:mm:ss> may be 72:00:00. When the time expires, the job (if still running) gets killed by SLURM.

‑‑pty: specifies that the job will be interactive (this is necessary when <command_to_run_within_container> is /bin/bash: see Interactive jobs)

Note that GPU resources (if needed) must always be requested explicitly. For instance, in order to execute program ./my_program which needs one GPU of type 3g.20gb with QOS gpulight we can use the SLURM command

srun --qos=gpulight --gres=gpu:3g.20gb:1 ./my_program

Interactive jobs

An interactive job is a process that use the console to interact with their user during job execution. Such a process is manually run by the user from a bash shell (i.e. a terminal session) provided by SLURM.

In order to ask SLURM to schedule the execution of a shell where the user can subsequently run the interactive job, it is necessary to use option --pty.

For instance, to ask SLURM to run a shell with QOS nogpu, the user should use command

srun --qos=nogpu --pty /bin/bash

By not specifying any other options, the user is telling SLURM that they want the shell spawned by SLURM to be provided with the default amount of resources associated to QOS nogpu. More generally, any combination of the other options of srun can be used together with --pty.

As every other job request to SLURM, the request to run a shell must be done from the login server. As soon as possible (i.e., as soon as the necessary resources are available) SLURM will open (in the same terminal that the user used to launch the srun command) a bash shell, where the user will be able to run their interactive programs.

To the user, this corresponds to the fact that the shell they were using to interact with the login server changes into a shell opened directly on Mufasa. This corresponds to the command prompt changing from

<username>@mufasa2-login:~$

to

<username>@mufasa2:~$

Another way to know if the current shell is the “base” shell or one run via SLURM is to execute command

echo $SLURM_JOB_ID

If no number gets printed, this means that the shell is the “base” one. If a number is printed, it is the SLURM job ID of the /bin/bash process.

When the user does not need the SLURM-spawned shell anymore, they should close it with command (the same used for any other Linux shell)

exit

to make the resources reserved for the interactive shell free again.

Non-interactive jobs

srun commands are very complex, and it's easy to forget some option or make mistakes while using them. For non-interactive jobs, there is a solution to this problem.

When the user job is non-interactive, in fact, the srun command can be substituted with a much simpler sbatch command. As already explained, sbatch can make use of an execution script to specify all the parts of the command to be run via SLURM. So the command becomes

sbatch <execution_script>

An execution script is a special type of Linux script that includes SBATCH directives. SBATCH directives are used to specify the values of the parameters that are otherwise set in the [options] part of an srun command.

Note on Linux shell scripts
A shell script is a text file that will be run by the bash shell. In order to be acceptable as a bash script, a text file must: have the “executable” flag set (see here for details) have `#!/bin/bash` as its very first line Usually, a Linux shell script is given a name ending in .sh, such as my_execution_script.sh, but this is not mandatory. Within any shell script, lines preceded by `#` are comments (with the notable exception of the initial `#!/bin/bash` line). Use of blank lines as spacers is allowed.

An execution script is a Linux shell script composed of two parts:

a preamble, composed of directives using which the user specifies the values to be given to parameters, each preceded by the keyword SBATCH
[optionally] one or more srun commands that launch jobs with SLURM using the parameter values specified in the preamble

Below is an execution script template to be copied and pasted into your own execution script text file.

The template includes all the options already described above, plus a few additional useful ones (for instance, those that enable SLURM to send email messages to the user in correspondence to events in the lifecycle of their job). Information about all the possible options can be found in [SLURM's own documentation].

In the template below, #SBATCH directives are requests made to SLURM. Notice that, though #SBATCH directives have a leading "#", that does not mean that they are comments: exactly as the #!/bin/bash at the beginning of a shell script, while starting with "#", is not a comment as well.

Other lines in the script that begin with # not followed by SBATCH are comments.

For what concerns directive that ask for a given amount of a resource (including time), if they are missing from the execution script (or commented out) the job will be assigned the default amount of the resource.

#!/bin/bash

#----------------start of preamble----------------
#SBATCH ‑-nodes=1
#SBATCH ‑‑ntasks=1
#SBATCH ‑-partition=jobs
#SBATCH ‑-qos=<qos_name>
#SBATCH ‑‑gres=<gpu_resources>
#SBATCH ‑‑mem=<mem_resources>
#SBATCH ‑‑cpus-per-task=<cpu_amount>
#SBATCH ‑‑time=<d-hh:mm:ss>
#SBATCH ‑‑output=./<filename>-%j.out

# the text file where the output of the job gets written (i.e., standard output gets redirected onto the file). "%j" is the current time.

#SBATCH ‑‑error=./<filename>-error-%j.out

# the text file where any error messages generated by the job gets written (i.e., standard error gets redirected onto the file). "%j" is the current time.

#SBATCH --job-name=<name>
#----------------end of preamble----------------

<command_to_run>
<command_to_run>
...

Executing jobs on Mufasa

Key concept: use containers!

All computation on Mufasa must occur within containers.

A container is a “sandbox” containing the environment where the user's application operates. Parts of Mufasa's filesystem can be made visible (and writable, if the user has writing permission on them: e.g., the user's /home directory) to the environment of the container. This allows the containerized user application to read from, and write to, Mufasa's filesystem: for instance, to read data and write results.

The system used by Mufasa to create and execute containers is Singularity. This wiki includes directions on preparing containers with Singularity.

The container where a user job runs must contain all the libraries needed by the job. In fact (for maintainability and safety reasons) no software and no libraries are installed on Mufasa 2.0.

Interactive and non-interactive user jobs

This section explains how to execute a user job contained in a container. It considers two types of user jobs, i.e.:

Interactive user jobs

as already explained, these are jobs that require interaction with the user while they are running, via a bash shell running within the container. The shell is used to receive commands from the user and/or print output messages. For interactive user jobs, the job is usually launched manually by the user (with a command issued via the shell) after the container is in execution.

Non-interactive user jobs

are the most common variety. The user prepares the container in such a way that, when in execution, the container autonomously puts the user's jobs into execution. The user does not have any communication with the container while it is in execution. Executing the container and running the required programs within the container's environment is done via execution scripts.

Using SLURM to run an interactive job on Mufasa

The first step to run an interactive user job on Mufasa is to run the container where the job will take place. Each user is in charge of preparing the container(s) where the user's jobs will be executed.

In order to run a container via SLURM by hand, i.e. via an interactive shell, a user must first open the shell with command

srun [general_SLURM_options] ‑‑pty /bin/bash

where [general_SLURM_options] are those already described above.

Then the user must run the container: this is done as follows.

First, it is necessary to load the Singularity software module with

module load amd/singularity

(if needed, the list of software modules available in the system can be obtained with command module av).

Then, the user must use Singularity to run the container with command (see the section about Singularity for further details)

singularity run <repository>://<name_of_container>

which pulls the container from the specified repository and executes it. Possible values for <repository> are:

docker (Docker Hub)

library (Singularityhub)

path/to/container if the container is local, i.e. located in the filesystem of Mufasa

As soon as the container is in execution, the terminal window used, so far, to interact with Mufasa becomes a shell in the container. This shell belongs to the software environment of the container, and the user can use it to interact with the container's own software environment and filesystem.

It is easy to understand if a shell is open to Mufasa or to the container because in a container shell the system prompt becomes

singularity>

Interaction between container filesystem and local filesystem

The filesystem inside the container and the local one, i.e. Mufasa's, can interact. This means that the container can access the local filesystem to read and/or write files. However, the only parts of Mufasa's filesystem that can be accessed by the container are those that the user running the container has access rights to.

As a default, the user's /home/username directory on Mufasa is automatically mapped onto /home/username into the filesystem of the container. Whatever is done to that container directory, the changes are actually applied to the local /home/username directory on Mufasa.

The mapping of the home directory does not need to be explicitly requested. However, if the user needs (in addition to the home directory) other parts of the local filesystem of Mufasa to be mapped onto the container's filesystem, this is possible by using this modified version of the singularity run command:

singularity run --bind </path/to/local/directory>:<path/to/container/directory> <repository>://<name_of_container>

If <path/to/container/directory> does not exist in the container's filesystem, it gets created by Singularity.

How to know if your shell is a SLURM job

The first way to know is to look at the shell prompt: for the login server (i.e., if your shell is not being run by SLURM) it should look like

your_username@mufasa2-login:~$

while for Mufasa (i.e., if the shell is being run by SLURM) it should look like

your_username@mufasa2:~$

Sometimes you may need another way to check: for instance, you may have changed your shell prompt so that it does not print the machine name. In these cases, to know if the shell you are using is run by SLURM or not you can run command

echo $SLURM_JOB_ID

If the command provides an output, your shell is a SLURM job and that output is the ID of the job. If the command doesn't provide any output, your shell is not a SLURM job.

Using SLURM to run a non-interactive job on Mufasa

When the user job to be executed into a container is non-interactive, the mechanism based on an execution script already described in Non-interactive jobs is employed. The command to run the script which in turn will run the container where the user job takes place is therefore

sbatch <execution_script>

The general features of a SLURM execution script and the SBATCH directives used for generic jobs have already been described. Here we focus, therefore, on the SBATCH directives specifically used when SLURM is used to run a non-interactive job within a container.

Below is an execution script template to be copied and pasted into your own execution script text file.

#!/bin/bash
#----------------start of preamble----------------
#SBATCH directives already described above
#----------------end of preamble----------------
module load amd/singularity
singularity run <repository>://<name_of_container> <command_to_run>

In the last line of the script, <command_to_run> is the command (e.g., the name of an executable script), complete with path within the container's filesystem, of the program to be run into the container. Please refer to the section about Singularity for details about its commands.

The interactions between container filesystem and local filesystem in non-interactive jobs are exactly the same already described for interactive jobs. In particular, the user's home directory is mapped by default onto the filesystem of the container.

If, in addition to that, the user needs another part of the filesystem of Mufasa are to be mapped onto the container's filesystem, this is possible using this modified version of the singularity run command at the end of the script:

singularity run --bind </path/to/local/directory>:<path/to/container/directory> <repository>://<name_of_container>

If <path/to/container/directory> does not exist in the container's filesystem, it gets created by Singularity.

Job output

The whole point of running a user job is to collect its output. Usually, such output takes the form of one or more files generated within the filesystem of Mufasa by the container where the computation takes place.

As explained below, SLURM includes a mechanism to mount a part of Mufasa's own filesystem onto the container's filesystem: so when the job running within the container writes to this mounted part, it actually writes to Mufasa's filesystem. This means that when the container ends its execution, its output files persist in Mufasa's filesystem (usually in a subdirectory of the user's own /home directory) and can be retrieved by the user at a later time.

The same mechanism can be used to allow user jobs running into a container to read their input data from Mufasa's filesystem (usually a subdirectory of the user's own /home directory).

Cancelling completed jobs

When a user process run via SLURM has completed its execution and is not needed anymore, it is important to close it with scancel. Especially if much time remains to the end of the execution time requested by the job.

Cancelling a SLURM job makes the resources reserved by SLURM free again for other users, and thus speeds up the execution of the jobs still queued.

Typically, one doesn't know how long a piece of code will take to complete its work. So please make sure to check from time to time if that happened, and -if there's still time before the duration of your SLURM job ends- just scancel the job. Other users will be grateful :-)

Please note that job priority for your user depends (also) on the overall duration of the jobs that you ran on Mufasa. Therefore, cancelling jobs that are not needed anymore improves your future jobs' priority.

Looking for unused GPUs

GPUs are usually the most limited resource on Mufasa. So, if your job requires a GPU, the best way to get it executed quickly is to use a QOS associated to a type of GPU of which there are one or more that aren't currently in use. This command

sinfo -O Gres:100

provides a summary of all the Gres (i.e., GPU) resources possessed by Mufasa. It provides an output similar to the following:

GRES                                                                                                
gpu:40gb:3,gpu:4g.20gb:5,gpu:3g.20gb:5

To know which of the GPUs are currently in use, use command

sinfo -O GresUsed:100

which provides an output similar to this:

GRES_USED
gpu:40gb:2(IDX:0-1),gpu:4g.20gb:2(IDX:5,8),gpu:3g.20gb:3(IDX:3-4,6)

By comparing the two lists (GRES and GRES_USED) you can easily spot unused GPUs.

Detaching from a running job with `screen`

A consequence of the way srun operates is that if you launch an interactive user job, the shell where the command is running must remain open: if it closes, the job terminates. That shell runs in the terminal of your own PC where the SSH connection to Mufasa exists.

If you do not plan to keep the SSH connection to Mufasa open (for instance because you have to turn off or suspend your PC), there is a way to keep your interactive job alive. Namely, you should use command srun inside a screen session (often simply called "a screen"), then detach from the screen (here is one of many tutorials about screen available online).

Once you have detached from the screen session, you can close the SSH connection to Mufasa without damage. When you need to reach your (still running) job again, you can can open a new SSH connection to Mufasa and then reattach to the screen.

A use case for screen is writing your program in such a way that it prints progress advancement messages as it goes on with its work. Then, you can check its advancement by periodically reconnecting to the screen where the program is running and reading the messages it printed.

Basic usage of screen is explained below.

Creating a screen session, running a job in it, detaching from it

Connect to the login server with SSH
From the login server shell, run
```
screen
```
In the screen session ("screen") thus created (it has the look of an empty shell), launch your job with srun
Detach from the screen by pressing ctrl + A followed by D: you will come back to the original login server shell, while your process will go on running in the screen
You can now close the SSH connection to the login server without damaging your running job

Reattaching to an active screen session

Connect to the login server with SSH
In the login server shell, run
```
screen -r
```
You are now back to the screen where you launched your job

Closing (i.e. destroying) a screen session

When you do not need a screen session anymore:

reattach to the active screen session as explained above
destroy the screen by pressing ctrl + A followed by \ (i.e., backslash), then confirming that you really want to proceed

Of course, any program (including SLURM jobs) running within the screen gets terminated when the screen is destroyed.

Using `salloc` to reserve resources

What is `salloc`?

salloc is a SLURM command that allows a user to reserve a set of resources (e.g., a 40 GB GPU) for a given time in the future.

The typical use of salloc is to "book" an interactive session where the user enjoys complete control of a set of resources. The resources that are part of this set are chosen by the user. Within the "booked" session, any job run by the user that relies on the reserved resources is immediately put into execution by SLURM.

More precisely:

the user, using salloc, specifies what resources they need and the time when they will need them;
when the delivery comes, SLURM creates an interactive shell session for the user;
within such session, the user can use srun and sbatch to run programs, enjoying full (i.e. not shared with anyone else) and instantaneous access to the resources.

Resource reservation using salloc is only possible if the request is done in advance wrt the delivery time. The more the resources that the user wants to reserve are in high demand, the more anticipated the request should be to ensure that SLURM is able to fulfill it.

When a user makes a request for resources with salloc, the request (called an allocation) gets added to the job queue of SLURM of the requisite partition as a job in pending (PD) state (job states are described here). Indeed, resource allocation is the first part of SLURM's process of executing a user job, while the second part is running the program and letting it use the allocated resources. Using salloc actually corresponds to having SLURM perform the first part of the process (resource allocation) while leaving the second part (running programs) to the user.

Until the delivery time specified by the user comes, the allocation remains in state PD, and other jobs requesting the same resources, even if submitted later, are executed. While the request waits for the delivery time, however, it accumulates a priority that increases over time. The longer the allocation stays in the PD state, the stronger this accumulation of priority: so, by requesting resources with salloc well in advance of the delivery time, users can ensure that the resources they need will be ready for them at the requested delivery time, even if these resources are highly contended.

`salloc` commands

salloc commands use a similar syntax to srun commands. In particular, salloc lets a user specify what resources they need and -importantly- a delivery time for the requested resources (delivery time can also be specified with srun, but in that case it is not very useful).

The typical salloc command has this form:'

salloc [general_SLURM_options] --begin=<time>

where

[general_SLURM_options]: represents the options already described in Options of srun and sbatch

--begin=<time>: specifies the delivery time of the resources reserved with salloc, according to the syntax described below. The delivery time must be a future time.

Syntax of parameter `--begin`

If the allocation is for the current day, you can specify <time> as hours and minutes in the form

HH:MM

If you want to specify a time of a different day, the form for is YYYY-MM-DDTHH:MM, where the uppercase 'T' separates date from time.

It is also possible to specify as relative to the current time, in one of the following forms:

now+Kminutes

now+Khours

now+Kdays

where K is a (positive) integer.

Examples:

--begin=16:00

--begin=now+1hours

--begin=now+1days

--begin=2030-01-20T12:34:00

Note that Mufasa's time zone is GMT, so <time> must be expressed in GMT as well. If you want to know Mufasa's current time, use command

date

It provides an output similar to the following:

Thu Nov 10 16:43:30 UTC 2022

How to use `salloc`

In the typical scenario, the user of salloc will make use of screen. Command screen creates a shell session (called "a screen") that it is possible to abandon without closing it (detaching from the screen). It is then possible to reach again the screen at a later time (reattaching to the screen). This means that a user can create a screen, run salloc within it to create an allocation for time X, detach from the screen and reattach to it just before time X to use the reserved resources from the interactive session created by salloc.

More precisely, the operations needed to do this are the following:

Connect to the login server.
From the login server shell, run
```
screen
```
In the screen session ("screen") thus created run the salloc command, specifying via its options the resources you need and the time at which you want them delivered.
SLURM will respond with a message similar to
```
salloc: Pending job allocation XXXX
```
Detach from the screen by pressing ctrl + A followed by D: you will come back to the original login server shell.
You can now close the SSH connection to the login server without damaging your resource allocation request.
At the delivery time you specified in the salloc command, connect to the login server with SSH.
Once you are in the login server shell, reattach to the screen with command
```
screen -r
```
You are now back to the screen where you used salloc; as soon as SLURM provides to you with the resources you reserved, message "salloc: Pending job allocation XXXX" changes to the shell prompt.
You are now in the interactive shell session you booked with salloc. From here, you can run any programs you want, including srun and sbatch. For the whole duration of the allocation, your programs have unrestricted use of all the resources you reserved with salloc.
Important! Any job run within the shell session is subject to the time limit (i.e., maximum duration) imposed by the partition it is running on! Therefore, if the job reaches the time limit, it gets forcibly terminated by SLURM. Termination depends exclusively from the time limit: so it occurs even if the end time for the allocation has not been reached yet. (Of course, the job also gets terminated if the allocation ends.)
Once the interactive shell session is not needed anymore, cancel it by exiting from the session with
```
exit
```
(Note that if you get to the end of the time period you specified in your request without closing the shell session, SLURM does it for you, killing any programs still running.)
You are now back to your screen. Destroy it by pressing ctrl + A followed by \ (i.e., backslash) to get back to the login server shell.

Cancelling a resource request made with `salloc`

To cancel a request for resources made as explained in How to use salloc, follow these steps:

Connect to the the login server with SSH.
Once you are in the login server shell, reattach to the screen where you used command salloc with command
```
screen -r
```
You should see the message "salloc: Pending job allocation XXXX" (if the allocation is still pending) or ""salloc: job XXXX queued and waiting for resources" (if the allocation is done and waiting for its start time). Now just press Ctrl + C. This communicates to SLURM your intention to cancel your request for resources.
SLURM will communicate the cancellation with message
```
salloc: Job allocation XXXX has been revoked.
```
Destroy the screen by pressing ctrl + A followed by \ (i.e., backslash) to get back to the login server shell.

Monitoring and managing jobs

SLURM provides Job Users with tools to inspect and manage jobs. While a Job User is able to see all users' jobs, they are only allowed to interact with their own.

The main commands used to interact with jobs are squeue to inspect the scheduling queues and scancel to terminate queued or running jobs.

Inspecting jobs with `squeue`

Running command

squeue

provides an output similar to the following:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  520       fat     bash acasella  R 2-04:10:25      1 gn01
  523       fat     bash amarzull  R    1:30:35      1 gn01
  522       gpu     bash    clena  R   20:51:16      1 gn01

This output comprises the following information:

JOBID: Numerical identifier of the job assigned by SLURM; This identifier is used to intervene on the job, for instance with scancel

PARTITION: the partition that the job is run on

NAME: the name assigned to the job; can be personalised using the --job-name option

USER: username of the user who launched the job

ST: job state (see Job state for further information)

TIME: time that has passed since the beginning of job execution

NODES: number of nodes where the job is being executed (for Mufasa, this is always 1 as it is a single machine)

NODELIST (REASON): name of the nodes where the job is being executed: for Mufasa it is always gn01, which is the name of the node corresponding to Mufasa.

To limit the output of squeue to the jobs owned by user <username>, it can be used like this:

squeue -u <username>

Interpreting Job state as provided by `squeue`

Jobs typically pass through several states in the course of their execution. Job state is shown in column "ST" of the output of squeue as an abbreviated code (e.g., "R" for RUNNING).

The most relevant codes and states are the following:

PD PENDING

Job is awaiting resource allocation.

R RUNNING

Job currently has an allocation.

S SUSPENDED

Job has an allocation, but execution has been suspended and CPUs have been released for other jobs.

CG COMPLETING

Job is in the process of completing. Some processes on some nodes may still be active.

CD COMPLETED

Job has terminated all processes on all nodes with an exit code of zero.

Beyond these, there are other (less frequent) job states. The SLURM doc page for squeue provides a complete list of them.

Knowing when jobs are expected to end or start

If you are interested in understanding when jobs are expected to start or end, use command

squeue -o "%5i %8u %10P %.2t |%19S |%.11L|"

which provides an output is similar to the following:

JOBID USER     PARTITION  ST |START_TIME          |  TIME_LEFT|
5307  thuynh   fat        PD |2022-11-11T17:55:54 | 3-00:00:00|
5308  thuynh   fat        PD |2022-11-11T17:55:54 | 3-00:00:00|
5296  cziyang  fat         R |2022-11-08T16:58:03 | 1-00:48:14|
5306  thuynh   fat         R |2022-11-10T08:13:30 | 2-16:03:41|
5297  gnannini fat         R |2022-11-08T17:55:54 | 1-01:46:05|
5336  ssaitta  gpu         R |2022-11-10T08:13:00 |    6:03:11|
5358  dmilesi  gpulong     R |2022-11-10T15:11:32 | 2-23:01:43|
5338  cziyang  gpulong     R |2022-11-10T09:45:01 | 1-17:35:12|

For running jobs (state R)

column "START_TIME" tells you when the job started its execution

column "TIME_LEFT" tells you how much remains of the running time requested by the job

For pending jobs (state PD)

column "START_TIME" tells you when the job is expected to start its execution

column "TIME_LEFT" tells you how much running time has been requested by the job

Important! Start and end times are forecasts based on the features of current jobs in the queues, and may change if running jobs end prematurely and/or if new jobs with higher priority are added to the queues. So these times should never be considered as certain.

If you simply want to know when pending jobs (state PD) are expected to begin execution, use

squeue --start

which lists pending jobs in order of increasing START_TIME (the job on top is the one which will be run first). For each pending job the command provides an output similar to the example below:

JOBID PARTITION     NAME     USER ST          START_TIME  NODES SCHEDNODES           NODELIST(REASON)
 5090       fat training   thuynh PD 2022-10-27T09:28:01      1 (null)               (Resources)

Getting detailed information about a job

If needed, complete information about a job (either pending or running) can be obtained using command

scontrol show job <JOBID>

where <JOBID> is the number from the first column of the output of squeue. The output of this command is similar to the following:

JobId=65 JobName=test_script.sh
   UserId=gfontana(10003) GroupId=gfontana(10004) MCS_label=N/A
   Priority=14208 Nice=0 Account=admin QOS=nogpu
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:55 TimeLimit=01:00:00 TimeMin=N/A
   SubmitTime=2025-11-06T10:31:10 EligibleTime=2025-11-06T10:31:10
   AccrueTime=2025-11-06T10:31:10
   StartTime=2025-11-06T10:31:10 EndTime=2025-11-06T11:31:10 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-11-06T10:31:10 Scheduler=Main
   Partition=jobs AllocNode:Sid=mufasa2-login:42020
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=gn01
   BatchHost=gn01
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   ReqTRES=cpu=1,mem=4G,node=1,billing=1
   AllocTRES=cpu=1,mem=4G,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=4G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) LicensesAlloc=(null) Network=(null)
   Command=./test_script.sh
   WorkDir=/home/gfontana

In particular, the line beginning with "StartTime=" provides expected times for the start and end of job execution. As explained in Knowing when jobs are expected to end or start, start time is only a prediction and subject to change.

Cancelling a job with `scancel`

It is possible to cancel a job using command scancel, either while it is waiting for execution or when it is in execution (in this case you can choose what system signal to send the process in order to terminate it).

Please note that job priority for your user depends (also) on the overall duration of the jobs that you ran on Mufasa. Therefore, cancelling jobs that are not needed anymore improves your future jobs' priority.

The following are some examples of use of scancel adapted from SLURM's documentation.

scancel <JOBID>

removes queued job <JOBID> from the execution queue.

scancel --signal=TERM <JOBID>

terminates execution of job <JOBID> with signal SIGTERM (request to stop).

scancel --signal=KILL <JOBID>

terminates execution of job <JOBID> with signal SIGKILL (force stop).

scancel --state=PENDING --user=<username> --partition=<partition_name>

cancels all pending jobs belonging to user <username> in partition <partition_name>.

Knowing what jobs you ran today

Command

sacct -X --format=User%-10,Start,End,SubmitLine%-100

provides a list of all srun or sbatch commands executed by your user after yesterday's midnight. For each job, the command provides the start and end time and the command you used to run it (if needed, you can increase the value after SubmitLine%- to show longer commands in full).

Difference between revisions of "User Jobs"

Latest revision as of 16:42, 20 July 2026

Contents

Running jobs with SLURM

`srun` and `sbatch`

Options of `srun` and `sbatch`

Interactive jobs

Non-interactive jobs

Executing jobs on Mufasa

Key concept: use containers!

Interactive and non-interactive user jobs

Using SLURM to run an interactive job on Mufasa

Interaction between container filesystem and local filesystem

How to know if your shell is a SLURM job

Using SLURM to run a non-interactive job on Mufasa

Job output

Cancelling completed jobs

Looking for unused GPUs

Detaching from a running job with `screen`

Creating a screen session, running a job in it, detaching from it

Reattaching to an active screen session

Closing (i.e. destroying) a screen session

Using `salloc` to reserve resources

What is `salloc`?

`salloc` commands

Syntax of parameter `--begin`

How to use `salloc`

Cancelling a resource request made with `salloc`

Monitoring and managing jobs

Inspecting jobs with `squeue`

Interpreting Job state as provided by `squeue`

Knowing when jobs are expected to end or start

Getting detailed information about a job

Cancelling a job with `scancel`

Knowing what jobs you ran today

Navigation menu

Search

@@ Line 1: / Line 1: @@
-This page presents the features of Mufasa that are most relevant to Mufasa's [[Roles|Job Users]]. Job Users can submit jobs for execution, cancel their own jobs, and see other users' jobs (but not intervene on them).
+= Running jobs with SLURM =
+Users of Mufasa '''must''' use SLURM to run resource-heavy processes, i.e. computing jobs that require one or more of the following:
+* GPUs
+* multiple CPUs
+* powerful CPUs
+* a significant amount of RAM.
-= System resources subjected to limitations =
+In fact, only processes run via SLURM have access to all the resources of Mufasa. Processes run outside SLURM are executed by the [[System#Login server|login server]] virtual machine, which has minimal resources and no GPUs. Using SLURM is therefore the only way to execute resource-heavy jobs on Mufasa. This is a key difference between Mufasa 1.0 and [[System#Mufasa 2.0|Mufasa 2.0]].
-The hardware resources of Mufasa are limited. For this reason, some of them are subjected to limitations, i.e. (these are SLURM's own terms):
+== <code>srun</code> and <code>sbatch</code> ==
-; cpu
+SLURM provides two commands to run jobs, called [https://slurm.schedmd.com/srun.html srun] and [https://slurm.schedmd.com/sbatch.html sbatch]:
-: the number of processor cores that a job can use
-; mem
+<pre style="color: lightgrey; background: black;">
-: the amount of RAM that a job can use
+srun [options] <command_to_be_run_via_SLURM>
+</pre>
-;gres
+<pre style="color: lightgrey; background: black;">
-: the amount of ''generic resources'' that a job can use: in Mufasa, the only resources belonging to this set are the GPUs (the [[System#CPUs_and_GPUs|virtual GPUs defined by Nvidia MIG]], not the physical GPUs)
+sbatch [options] <command_to_be_run_via_SLURM>
+</pre>
-These are some of the TRES (Trackable RESources) defined by SLURM. From [https://slurm.schedmd.com/tres.html SLURM's documentation]: "''A TRES is a resource that can be tracked for usage or used to enforce limits against.''"
+In both cases, <code><command_to_be_run_via_SLURM></code> can be any Linux program (including shell scripts). By using <code>srun</code> or <code>sbatch</code>, the  command or script specified by <code><command_to_be_run_via_SLURM></code> (including any programs launched by it) are added to SLURM's execution queues.
+The main difference between <code>srun</code> and <code>sbatch</code> is that the first locks the shell from which it has been launched, so it is only really suitable for '''interactive jobs''': i.e., processes that use the console to interact with their user during job execution. <code>sbatch</code>, on the other side, does not lock the shell and simply adds the job to the queue, but does not allow the user to interact with the process while it is running.
+<code>sbatch</code> provides an additional possibility: <code><command_to_be_run_via_SLURM></code> can in fact be an [[#Using execution scripts to run jobs|'''execution script''']], i.e. a special (and SLURM-specific) type of Linux shell script that includes '''SBATCH directives'''. SBATCH directives can be used to specify the values of some of the parameters that would otherwise have to be set using the <code>[options]</code> part of the <code>sbatch</code> command. This is handy because it allows to write down the parameters in an execution script instead of having to write them in the command line while launching a job, which greatly reduces the possibility of mistakes. Also, an execution script is easy to keep and reuse.
+Immediately after a <code>srun</code> or <code>sbatch</code> command is launched by a user, SLURM outputs a message informing the user that the job has been queued. The output is similar to this:
+<pre style="color: lightgrey; background: black;">
+srun: job 10849 queued and waiting for resources
+</pre>
-SLURM provides jobs with access to resources only for a limited time: i.e., '''execution time''' is itself a limited resource.
+The shell is now locked while SLURM prepares the execution of the user program ([[#Detaching from a running job with screen|if you are using <code>screen</code> you can detach from that shell and come back later]]).
-When a resource is limited, a job cannot use arbitrary quantities of it. On the contrary, the job must specify how much of the resource it requests. Requests are done either by running the job on a [[User Jobs#SLURM Partitions|partition]] for which a default amount of resources has been defined, or through the options of the srun command that executes the job via SLURM.
+When SLURM is ready to run the program, it prints a message similar to
-== <code>gres</code> syntax ==
+<pre style="color: lightgrey; background: black;">
+srun: job 10849 has been allocated resources
+</pre>
-Whenever it is necessary to specify the quantity of <code>gres</code>, i.e. generic resources, a special syntax must be used. In Mufasa <code>gres</code> resources are GPUs, so this syntax applies to GPUs. Number and type of Mufasa's GPUs is described [[System#CPUs and GPUs|here]].
+and then executes the program.
-The name of each GPU resource takes the form
+=== Options of <code>srun</code> and <code>sbatch</code> ===
-'''<code>Name:Type</code>'''
+The <code>[options]</code> part of <code>srun</code> and <code>sbatch</code> commands is used to tell SLURM what resources the job needs to be executed the job and how much time it will need to complete its execution.
-where <code>Name</code> is '''<code>gpu</code>''' and <code>Type</code> takes the following values:
+For what concerns resources, the most important option is <code>--qos <qos_name></code>, specifying which SLURM [[#SLURM Quality of Service (QOS)|SLURM QOS]] the job will use. A job run with a given QOS has access to all and only the resources available to that QOS. As a consequence, all options that define how many resources to assign the job will only be able to provide the job with resources that are available to the chosen QOS. Jobs that require resources that are not available to the chosen QOS do not get executed.
-* '''<code>40gb</code>''' for GPUs with 40 Gbytes of onboard RAM
+If the user forgets to use option <code>--qos <qos_name></code>, the job is run on the ''default qos'' (<code>normal</code>) which has access to ''zero'' resources. Therefore it is always necessary to specify option <code>--qos <qos_name></code> when launching a SLURM job on Mufasa.
-* '''<code>20gb</code>''' for GPUs with 20 Gbytes of onboard
-* '''<code>10gb</code>''' for GPUs with 10 Gbytes of onboard RAM
-So, for instance,
+More generally, the most relevant among the <code>[options]</code> are:
-<code>gpu:20gb</code>
+:;‑-qos=<qos__name>
+:: specifies the [[SLURM#SLURM Quality of Service (QOS)|SLURM QOS]] that the job will use. It is mandatory to specify one.
-identifies the resource corresponding to GPUs with 20 GB of RAM. Of this resource Mufasa has [[System#CPUs and GPUs|a given number]], of which a job can request to use some (or all).
+:: ''Important! The chosen QOS limits the resources that can be requested, since it is not allowed to request resources (type or quantity) that exceed what is available to the chosen QOS.''
-When asking for a <code>gres</code> resource (e.g., in an <code>srun</code> command or an <code>SBATCH</code> directive of an [[User Jobs#Using execution scripts to run jobs|execution script]]), the syntax required by SLURM is
+:: ''Important! If <code>‑‑qos <qos_name></code> is used and options that specify how many resources to assign to the job (such as <code>‑‑mem=<mem_resources></code>, <code>‑‑cpus‑per‑task=<cpu_amount></code> or <code>‑‑time=<duration></code>) are omitted, the job is assigned the default amount of the resource (as defined by the chosen QOS. A notable exception concerns option <code>‑‑gres=<gpu_resources></code>, which is always required (see below) if the job uses a QOS with access to GPUs.''
-'''<code><Name>:<Type>:<quantity></code>'''
+:; --job-name=<jobname>
+:: Specifies a name for the job. The specified name will appear along with the JOBID number when querying running jobs on the system with <code>squeue</code>. The default job name (i.e., the one assigned to the job when <code>--job-name</code> is not used) is the executable program's name.
-where <code>quantity</code> is an integer value specifying how many items of the resource are requested. So, for instance, to ask for 2 GPUs of type <code>20gb</code> the syntax is
+:;‑‑gres=<gpu_resources>
+:: specifies what GPUs to assign to the job. <code>gpu_resources</code> is a comma-delimited list where each element has the form <code>gpu:<Type>:<amount></code>, where <code><Type></code> is one of the types of GPU available on Mufasa (see [[SLURM#gres syntax|<code>gres</code> syntax]]) and <code><amount></code> is an integer between 1 and the number of GPUs of such type available to the partition. For instance, <code><gpu_resources></code> may be <code>gpu:40gb:1,gpu:3g.20gb:1</code>, corresponding to asking for one "full" GPU and 1 "small" GPU.
-<code>gpu:20gb:2</code>
+:: ''Important! The <code>‑‑gres</code> parameter is '''mandatory''' if the job is run with a QOS that allows access to the system's GPUs. Differently from other resources (where unspecified requests lead to the assignment of a default amount), GPUs must always be explicitly requested.''
-SLURM's ''generic resources'' are defined in <code>/etc/slurm/gres.conf</code>. In order to make GPUs available to SLURM's <code>gres</code> management, Mufasa makes use of Nvidia's [https://developer.nvidia.com/nvidia-management-library-nvml NVML library]. For additional information see [https://slurm.schedmd.com/gres.html SLURM's documentation].
+:;‑‑mem=<mem_resources>
+:: specifies the amount of RAM to assign to the job; for instance, <code><mem_resources></code> may be <code>200G</code>
-= SLURM Partitions =
+:;‑‑cpus-per-task=<cpu_amount>
+:: specifies how many CPUs to assign to the job; for instance, <code><cpu_amount></code> may be <code>2</code>
-Several execution queues for jobs have been defined on Mufasa. Such queues are called '''partitions''' in SLURM terminology. Each partition has features (in term of resources available to the jobs on that queue) that make the partition suitable for a certain category of jobs. SLURM command
+:;<nowiki>‑‑time=<duration></nowiki>
+:: specifies the maximum time allowed to the job to complete, in the format <code>days-hours:minutes:seconds</code>, where <code>days</code> is optional; for instance, <code><d-hh:mm:ss></code> may be <code>72:00:00</code>. When the time expires, the job (if still running) gets killed by SLURM.
-<pre style="color: lightgrey; background: black;">
+:;‑‑pty
-sinfo
+:: specifies that the job will be interactive (this is necessary when <code><command_to_run_within_container></code> is <code>/bin/bash</code>: see [[#Interactive jobs|Interactive jobs]])
-</pre>
-([https://slurm.schedmd.com/sinfo.html link to SLURM docs]) provides a list of available partitions. Its output is similar to this:
+Note that GPU resources (if needed) must always be requested explicitly. For instance, in order to execute program <code>./my_program</code> which needs one GPU of type <code>3g.20gb</code> with QOS <code>gpulight</code> we can use the SLURM command
 <pre style="color: lightgrey; background: black;">
-PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
+srun --qos=gpulight --gres=gpu:3g.20gb:1 ./my_program
-debug         up   infinite      1    mix gn01
-small*        up   12:00:00      1    mix gn01
-normal        up 1-00:00:00      1    mix gn01
-longnormal    up 3-00:00:00      1    mix gn01
-gpu           up 1-00:00:00      1    mix gn01
-gpulong       up 3-00:00:00      1    mix gn01
-fat           up 3-00:00:00      1    mix gn01
 </pre>
-In this example, available partitions are named “debug”, “small”, “normal”, “longnormal”, “gpu”, “gpulong”, “fat”. The asterisk beside "small" indicates that this is the default partition, i.e. the one that SLURM selects to run a job when no partition has been specified. (On Mufasa, partition names have been chosen to reflect the type of job that they are dedicated to.)
+== Interactive jobs ==
-The columns in the standard output of <code>sinfo</code> shown above correspond to the following information:
+An '''interactive job''' is a process that use the console to interact with their user during job execution. Such a process is manually run by the user from a ''bash shell'' (i.e. a terminal session) provided by SLURM.
-; PARTITION
+In order to ask SLURM to schedule the execution of a shell where the user can subsequently run the interactive job, it is necessary to use option <code>--pty</code>.
-: name of the partition
-; AVAIL
+For instance, to ask SLURM to run a shell with QOS <code>nogpu</code>, the user should use command
-: state/availability of the partition: see [[User Jobs#Partition availability|below]]
-; TIMELIMIT
+<pre style="color: lightgrey; background: black;">
-: maximum runtime of a job allowed by the partition
+srun --qos=nogpu --pty /bin/bash
+</pre>
-; NODES
+By not specifying any other options, the user is telling SLURM that they want the shell spawned by SLURM to be provided with the default amount of resources associated to QOS <code>nogpu</code>. More generally, any combination of the other [[#Options of srun and sbatch|options of srun]] can be used together with <code>--pty</code>.
-: number of nodes available to jobs run on the partition: for Mufasa, this is always 1 since [[System#The SLURM job scheduling system|there is only 1 node in the computing cluster]]
-; STATE
+As every other job request to SLURM, the request to run a shell must be done from the [[System#Login server|login server]]. As soon as possible (i.e., as soon as the necessary resources are available) SLURM will open (in the same terminal that the user used to launch the <code>srun</code> command) a bash shell, where the user will be able to run their interactive programs.
-: state of the node (using [https://slurm.schedmd.com/sinfo.html#SECTION_NODE-STATE-CODES these codes]); typical values are <code>mixed</code> - meaning that some of the resources of the node are busy executing jobs while other are free, and <code>allocated</code> - meaning that all of the resources of the node are busy
-; NODELIST
+To the user, this corresponds to the fact that the shell they were using to interact with the login server changes into a shell opened ''directly on Mufasa''. This corresponds to the command prompt changing from
-: list of nodes available to the partition: for Mufasa this field always contains <code>gn01</code> since [[System#The SLURM job scheduling system|Mufasa is the only node in the computing cluster]]
-One information that the standard output of <code>sinfo</code> doesn't provide is if there are partitions that can only be used by the root user of Mufasa. To know which partiions are root-only, you can use command
 <pre style="color: lightgrey; background: black;">
-sinfo -o "%.10P %.4r"
+<username>@mufasa2-login:~$
 </pre>
-Its output is
+to
 <pre style="color: lightgrey; background: black;">
- PARTITION ROOT
+<username>@mufasa2:~$
-     debug  yes
-    small*   no
-    normal   no
-longnormal   no
-       gpu   no
-   gpulong   no
-       fat   no
 </pre>
-and shows that partition "debug" is reserved for root.
+Another way to know if the current shell is the “base” shell or one run via SLURM is to execute command
-For what concerns hardware resources (such as CPUs, GPUs and RAM) the amounts of each resource available to Mufasa's partitions are set by SLURM's accounting system, and are not visible to <code>sinfo</code>. See [[User Jobs#Partition features|Partition features]] for a description of these amounts.
+<pre style="color: lightgrey; background: black;">
+echo $SLURM_JOB_ID
+</pre>
-== Partition features ==
+If no number gets printed, this means that the shell is the “base” one. If a number is printed, it is the SLURM job ID of the /bin/bash process.
-The output of <code>sinfo</code> ([[User Jobs#SLURM Partitions|see above]]) provides a list of available partitions, but (except for time) it does not provide information about the amount of resources that a partition makes available to the user jobs which are run on it. The amount of resources is visible through command
+When the user does not need the SLURM-spawned shell anymore, they should close it with command (the same used for any other Linux shell)
 <pre style="color: lightgrey; background: black;">
-sacctmgr list qos format=name,priority,maxtres,maxwall -p
+exit
 </pre>
-which provides the following (very badly formatted) output:
+to make the resources reserved for the interactive shell free again.
-<pre style="color: lightgrey; background: black;">
-Name|Priority|MaxTRES|MaxWall|
-normal|200|cpu=16,gres/gpu:10gb=0,gres/gpu:20gb=0,gres/gpu:40gb=0,mem=128G|1-00:00:00|
-small|500|cpu=2,gres/gpu:10gb=1,gres/gpu:20gb=0,gres/gpu:40gb=0,mem=16G|12:00:00|
-longnormal|100|cpu=16,gres/gpu:10gb=0,gres/gpu:20gb=0,gres/gpu:40gb=0,mem=128G|3-00:00:00|
-gpu|200|cpu=8,gres/gpu:10gb=2,gres/gpu:20gb=2,mem=64G|1-00:00:00|
-gpulong|100|cpu=8,gres/gpu:10gb=2,gres/gpu:20gb=2,mem=64G|3-00:00:00|
-fat|50|cpu=32,gres/gpu:10gb=2,gres/gpu:20gb=2,gres/gpu:40gb=2,mem=256G|3-00:00:00|
-</pre>
-Using the <code>sed</code> Linux utility to make the output a bit more legible, the command becomes
+== Non-interactive jobs ==
-<pre style="color: lightgrey; background: black;">
+<code>srun</code> commands are very complex, and it's easy to forget some option or make mistakes while using them. For non-interactive jobs, there is a solution to this problem.
-sacctmgr list qos format=name,priority,maxtres,maxwall -p | sed 's/|/\t /g'
-</pre>
-and provides an output similar to the following:
+When the user job is non-interactive, in fact, the <code>srun</code> command can be substituted with a much simpler '''<code>sbatch</code> command'''. As [[#Running jobs with SLURM|already explained]], <code>sbatch</code> can make use of an '''execution script''' to specify all the parts of the command to be run via SLURM. So the command becomes
 <pre style="color: lightgrey; background: black;">
-Name	 Priority	 MaxTRES	 MaxWall
+sbatch <execution_script>
-normal	 200	 cpu=16,gres/gpu:10gb=0,gres/gpu:20gb=0,gres/gpu:40gb=0,mem=128G	 1-00:00:00
-small	 500	 cpu=2,gres/gpu:10gb=1,gres/gpu:20gb=0,gres/gpu:40gb=0,mem=16G	 12:00:00
-longnormal	 100	 cpu=16,gres/gpu:10gb=0,gres/gpu:20gb=0,gres/gpu:40gb=0,mem=128G	 3-00:00:00
-gpu	 200	 cpu=8,gres/gpu:10gb=2,gres/gpu:20gb=2,mem=64G	 1-00:00:00
-gpulong	 100	 cpu=8,gres/gpu:10gb=2,gres/gpu:20gb=2,mem=64G	 3-00:00:00
-fat	 50	 cpu=32,gres/gpu:10gb=2,gres/gpu:20gb=2,gres/gpu:40gb=2,mem=256G	 3-00:00:00
 </pre>
-Its elements are the following (for more information, see [https://slurm.schedmd.com/qos.html SLURM's documentation]):
+An execution script is a special type of Linux script that includes '''SBATCH directives'''. SBATCH directives are used to specify the values of the parameters that are otherwise set in the [options] part of an <code>srun</code> command.
-; Name
+:{|class="wikitable"
-: name of the partition
+|'''''Note on Linux shell scripts'''''
+|-
+|''A shell script is a text file that will be run by the bash shell. In order to be acceptable as a bash script, a text file must:
-; Priority
+* ''have the “executable” flag set'' (see [[System#Changing file/directory ownership and permissions|here]] for details)
-: priority assigned to jobs run on the partition
+* ''have'' <code>#!/bin/bash</code> ''as its very first line''
-; MaxTRES
+''Usually, a Linux shell script is given a name ending in ''.sh,'' such as ''my_execution_script.sh'', but this is not mandatory.''
-: maximum amount of resources ("''Trackable RESources''") available to a job running on the partition, where
-: <code>'''cpu=''K'''''</code> means that the maximum number of processor cores is ''K''
-: <code>'''gres/''Name:Type''=''K'''''</code> means that the maximum number of GPUs of class <code>''Name:Type''</code> (see [[User Jobs#gres syntax|<code>gres</code> syntax]]) is ''K''
-: <code>'''mem=''K''G'''</code> means that the maximum amount of system RAM is ''K'' GBytes
-; MaxWall
+''Within any shell script, lines preceded by <code>#</code> are comments (with the notable exception of the initial'' <code>#!/bin/bash</code> ''line)''.
-: maximum wall clock duration of the jobs run on the partition (after which they are killed by SLURM), in format ''[days-]hours:minutes:seconds''
-== Partition availability ==
+''Use of blank lines as spacers is allowed.''
+|}
-An important information that ''sinfo'' provides (column "AVAIL") is the ''availability'' (also called ''state'') of partitions. Possible partition states are:
+An execution script is a Linux shell script composed of two parts:
-; up
+# a '''preamble''',  composed of directives using which the user specifies the values to be given to parameters, each preceded by the keyword <code>SBATCH</code>
-: The partition is available
+# [optionally] one or more '''<code>srun</code> commands''' that launch jobs with SLURM using the parameter values specified in the preamble
-: Running jobs will be completed
-: Currently queued jobs will be executed as soon as resources allow
-; drain
+Below is an '''execution script template''' to be copied and pasted into your own execution script text file.
-: The partition is in the process of becoming unavailable (''down'')
-: Running jobs will be completed
-: Queued jobs will be executed only when the partition becomes available again (''up'')
-; down
+The template includes all the options [[#Using SLURM to run a container|already described above]], plus a few additional useful ones (for instance, those that enable SLURM to send email messages to the user in correspondence to events in the lifecycle of their job). Information about all the possible options can be found in [SLURM's own documentation].
-: The partition is unavailable
-: There are no running jobs
-: Queued jobs will be executed only when the partition becomes available again (''up'')
+In the template below, '''#SBATCH directives''' are requests made to SLURM. Notice that, though #SBATCH directives have a leading "#", that does ''not'' mean that they are comments: exactly as the <code>#!/bin/bash</code> at the beginning of a shell script, while starting with "#", is not a comment as well.
-When a partition passes from ''up'' to ''drain'' no harm is done to running jobs. When a partition passes from any other state to ''down'', running jobs (if any) get killed.
+Other lines in the script that begin with <code>#</code> not followed by SBATCH are comments.
-A partition in state ''drain'' or ''down'' requires intervention by a [[Roles|Job Administrator]] to be restored to ''up''. Jobs waiting for that partition are paused unless the partition returns available.
+For what concerns directive that ask for a given amount of a resource (including time), if they are missing from the execution script (or commented out) the job will be assigned the default amount of the resource.
-== Choosing the partition on which to run a job ==
+<blockquote>
+'''<nowiki>#</nowiki>!/bin/bash'''
-When launching a job (as explained in [[User Jobs#Executing jobs on Mufasa|Executing jobs on Mufasa]]) a user should select the partition that is most suitable for it according to the job's features. Launching a job on a partition avoids the need for the user to specify explicitly all of the resources that the job requires, relying instead (for unspecified resources) on the default amounts defined for the partition. [[User Jobs#Partition features|Partition features]] explains how to find out how many of Mufasa's resources are associated to each partition.
-The fact that by selecting the right partition for their job a user can pre-define the requirements of the job without having to specify them makes partitions very handy, and avoids possible mistakes. However, users can -if needed- change the resource requested by their jobs wrt the default values associated to the chosen partition. Any element of the default assignment of resources provided by a specific partition can be overridden by specifying an option when launching the job, so users are not forced to accept the default value. However, it makes sense to choose the most suitable partition for a job in the first place, and then to specify the job's requirements only for those resources that have an unsuitable default value.
+<nowiki>#</nowiki>----------------start of preamble----------------
-Resource requests by the user launching a job can be both lower and higher than the default value of the partition for that resource. However, they cannot exceed the maximum value that the partition allows for requests of such resource, if set. If a user tries to run on a partition a job that requests a higher value of a resource than the partition‑specified maximum, the run command is refused.
+'''<nowiki>#</nowiki>SBATCH ‑-nodes=1'''
-In general, the larger the fraction of system resources that a job asks for, the heavier the job becomes for Mufasa's limited capabilities. Since SLURM prioritises lighter jobs over heavier ones (in order to maximise the number of completed jobs) it is a very bad idea for a user to ask for their job more resources than it actually needs: this, in fact, witl have the effect of delaying (possibly for a long time) job execution.
+'''<nowiki>#</nowiki>SBATCH ‑‑ntasks=1'''
+'''<nowiki>#</nowiki>SBATCH ‑-partition=jobs'''
-= Running jobs with SLURM: generalities =
+'''<nowiki>#</nowiki>SBATCH ‑-qos=<qos_name>'''
-'''''Note''': these are general considerations. See [[User Jobs#Executing jobs on Mufasa|Executing jobs on Mufasa]] for instructions about running your own processing jobs on Mufasa.''
+'''<nowiki>#</nowiki>SBATCH ‑‑gres=<gpu_resources>'''
-The commands that SLURM provides to run jobs are
+'''<nowiki>#</nowiki>SBATCH ‑‑mem=<mem_resources>'''
-<pre style="color: lightgrey; background: black;">
+'''<nowiki>#</nowiki>SBATCH ‑‑cpus-per-task=<cpu_amount>'''
-srun [options] <command_to_be_run_via_SLURM>
-</pre>
-and
+'''<nowiki>#</nowiki>SBATCH ‑‑time=<d-hh:mm:ss>'''
-<pre style="color: lightgrey; background: black;">
+'''<nowiki>#</nowiki>SBATCH ‑‑output=./<filename>-%j.out'''
-sbatch [options] <command_to_be_run_via_SLURM>
-</pre>
-(see SLURM documentation: [https://slurm.schedmd.com/srun.html srun], [https://slurm.schedmd.com/sbatch.html sbatch]).
+: <nowiki>#</nowiki> the text file where the output of the job gets written (i.e., standard output gets redirected onto the file). "%j" is the current time.
-In both cases, <code><command_to_be_run_via_SLURM></code> can be any program or Linux shell script. By using <code>srun</code> or <code>sbatch</code>, the  command or script specified by <code><command_to_be_run_via_SLURM></code> (including any programs launched by it) are added to SLURM's execution queues.
+'''<nowiki>#</nowiki>SBATCH ‑‑error=./<filename>-error-%j.out'''
-The main difference between <code>srun</code> and <code>sbatch</code> is that the first locks the shell from which it has been launched, so it is only really suitable for processes that use the console for interaction with their user. ([[User Jobs#Detaching from a running job with screen|You can, though, detach from that shell and come back later using <code>screen</code>]].) <code>sbatch</code>, on the other side, does not lock the shell and simply adds the job to the queue, but does not allow the user to interact with the process while it is running.
+: <nowiki>#</nowiki> the text file where any error messages generated by the job gets written (i.e., standard error gets redirected onto the file). "%j" is the current time.
-Additionally, with <code>sbatch</code> <command_to_be_run_via_SLURM> can be an [[User Jobs#Using execution scripts to run jobs|'''execution script''']], i.e. a special (and SLURM-specific) type of Linux shell script that includes '''SBATCH directives'''. SBATCH directives can be used to specify the values of some of the parameters that would otherwise have to be set using the <code>[options]</code> part of the <code>sbatch</code> command. This is handy because it allows to write down the parameters in an execution script instead of having to write them in the command line while launching a job, which greatly reduces the possibility of mistakes. Also, an execution script is easy to keep and reuse.
+'''<nowiki>#</nowiki>SBATCH --job-name=<name>'''
-The <code>[options]</code> part of <code>srun</code> and <code>sbatch</code> commands is used to tell SLURM the conditions under which it has to execute the job; in particular, it is used to specify what system resources SLURM should reserve for the job.
+<nowiki>#</nowiki>----------------end of preamble----------------
-A quick way to define the set of resources that a program will be provided with is to use [[User Jobs#SLURM Partitions|SLURM partitions]]. This is done with option <code>-p <partition_name></code>. This option specifies that SLURM will run the program on a specific partition, and therefore that it will have access to all and only the resources available to that partition. As a consequence, all options that define how many resources to assign the job will only be able to provide the job with resources that are available to the chosen partition. Jobs that require resources that are not available to the chosen partition do not get executed.
-For instance, running
+'''<command_to_run>'''
-<pre style="color: lightgrey; background: black;">
-srun -p small ./my_program
-</pre>
-makes SLURM run <code>my_program</code> on the partition named “small”. Running the program this way means that the resources associated to this partition will be available to it for use.
+'''<command_to_run>'''
-== Running interactive jobs via SLURM ==
+'''...'''
+</blockquote>
-As [[User Jobs#Running jobs with SLURM: generalities|explained]], SLURM command <code>srun</code> is suitable for launching ''interactive'' user jobs, i.e. jobs that use the terminal output and the keyboard to exchange information with a human user. If a user needs this type of interaction, they must run a ''bash shell'' (i.e. a terminal session) with a command similar to
+= Executing jobs on Mufasa =
-<pre style="color: lightgrey; background: black;">
+== Key concept: use containers! ==
-srun --pty /bin/bash
-</pre>
-and subsequently use the bash shell to run the interactive program. To close the SLURM-spawned bash shell, run (as with any other shell)
+'''[[System#Containers|All computation on Mufasa must occur within containers]]'''.
-<pre style="color: lightgrey; background: black;">
+A container is a “sandbox” containing the environment where the user's application operates. Parts of Mufasa's filesystem can be made visible (and writable, if the user has writing permission on them: e.g., the user's <code>/home</code> directory) to the environment of the container. This allows the containerized user application to read from, and write to, Mufasa's filesystem: for instance, to read data and write results.
-exit
-</pre>
-Of course, also the “base” shell (i.e. the one that opens when an SSH connection to Mufasa is established) can be used to run programs: however, programs launched this way are not being run via SLURM and are not able to access most of the resources of the machine (in particular, there is no way to make GPUs accessible to them, and they can only access 2 CPUs). On the contrary, running programs with <code>srun</code> or <code>sbatch</code> ensures that they can access all the resources managed by SLURM.
+The system used by Mufasa to create and execute containers is '''[[System#Singularity|Singularity]]'''. This wiki includes [[Singularity|directions]] on preparing containers with Singularity.
-GPU resources (if needed) must always be requested explicitly with parameter <code>--gres=gpu:<10|20|40>gb:K</code>, where <code>K</code> is an integer between 1 and the maximum number of GPUs of that type available to the partition (see [[User Jobs#gres syntax|<code>gres</code> syntax]]). For instance, in order to run an interactive program which needs one GPU we may first run a bash shell via SLURM with command
+The container where a user job runs must contain all the libraries needed by the job. In fact (for maintainability and safety reasons) '''no software and no libraries are installed on Mufasa 2.0'''.
-<pre style="color: lightgrey; background: black;">
+== Interactive and non-interactive user jobs ==
-srun --gres=gpu:10gb:1 --pty /bin/bash
-</pre>
-an then run the interactive program from the newly opened shell.
+This section explains how to execute a user job contained in a container. It considers two types of user jobs, i.e.:
+;: Interactive user jobs
+::: as [[#Interactive jobs|already explained]], these are jobs that require interaction with the user while they are running, via a bash shell running within the container. The shell is used to receive commands from the user and/or print output messages. For interactive user jobs, the job is usually launched manually by the user (with a command issued via the shell) after the container is in execution.
-An alternative to explicitly specifying what resources to assign to the bash shell run via SLURM is to run <code>/bin/bash</code> on one of the available partitions. For instance, to run the shell on partition “small” the command is
+;: Non-interactive user jobs
+::: are the most common variety. The user prepares the container in such a way that, when in execution, the container autonomously puts the user's jobs into execution. The user does not have any communication with the container while it is in execution. Executing the container and running the required programs within the container's environment is done via [[#Interactive jobs|execution scripts]].
-<pre style="color: lightgrey; background: black;">
+== Using SLURM to run an interactive job on Mufasa ==
-srun -p small --pty /bin/bash
-</pre>
-Mufasa is configured to show, as part of the command prompt of a bash shell run via SLURM, a message such as <code>(SLURM ID xx)</code> (where <code>xx</code> is the ID of the <code>/bin/bash</code> process within SLURM). When you see this message, you know that the bash shell you are interacting with is a SLURM-run one.
+The first step to run an interactive user job on Mufasa is to run the [[System#Containers|container]] where the job will take place. Each user is in charge of preparing the container(s) where the user's jobs will be executed.
-Another way to know if the current shell is the “base” shell or one run via SLURM is to execute command
+In order to run a container via SLURM by hand, i.e. via an interactive shell, a user must first open the shell with command
 <pre style="color: lightgrey; background: black;">
-echo $SLURM_JOB_ID
+srun [general_SLURM_options] ‑‑pty /bin/bash
 </pre>
-If no number gets printed, this means that the shell is the “base” one. If a number is printed, it is the SLURM job ID of the /bin/bash process.
+where [general_SLURM_options] are those [[#Options of srun and sbatch|already described above]].
+Then the user must run the container: this is done as follows.
-= Executing jobs on Mufasa =
+First, it is necessary to load the Singularity software module with
-The main reason for a user to interact with Mufasa is to execute jobs that require resources not available to standard desktop-class machines. Therefore, launching jobs is the most important operation for Mufasa users: what follows explains how it is done.
+<pre style="color: lightgrey; background: black;">
+module load amd/singularity
+</pre>
-Considering that [[System#Docker Containers|all computation on Mufasa must occur within Docker containers]], the jobs run by Mufasa users are always containers except for menial, non-computationally intensive jobs. This wiki includes [[Docker|directions about preparing Docker containers]].
+(if needed, the list of software modules available in the system can be obtained with command <code>module av</code>).
-The process of launching a user job on Mufasa involves the following steps:
+Then, the user must use Singularity to run the container with command (see the [[Singularity|section about Singularity]] for further details)
+<pre style="color: lightgrey; background: black;">
+singularity run <repository>://<name_of_container>
+</pre>
-<big>
+which pulls the container from the specified repository and executes it. Possible values for <code><repository></code> are:
-;: Step 1 --- [[User Jobs#Using SLURM to run a Docker container|Use SLURM to run the Docker container where the job will take place]]
-::: [for interactive and non-interactive user jobs]
-;: Step 2 --- [[User Jobs#Launching a user job from within a Docker container|Manually launch the user job from within the container]]
+:: <code>docker</code> (Docker Hub)
-::: [for interactive user jobs only]
+:: <code>library</code> (Singularityhub)
-</big>
+:: <code>path/to/container</code> if the container is local, i.e. located in the filesystem of Mufasa
-== Interactive and non-interactive user jobs ==
+As soon as the container is in execution, the terminal window used, so far, to interact with Mufasa becomes a shell ''in the container''. This shell belongs to the software environment of the container, and the user can use it to interact with the container's own software environment and filesystem.
-:; Interactive user jobs
+It is easy to understand if a shell is open to Mufasa or to the container because in a container shell the system prompt becomes
-:: are jobs that require interaction with the user while they are running, via a bash shell running within the Docker container. The shell is used to receive commands from the user and/or print output messages. For interactive user jobs, the job is usually launched manually by the user (with a command issued via the shell) after the Docker container is in execution.
-:; Non-interactive user jobs
+<pre style="color: lightgrey; background: black;">
-:: are the most common variety. The user prepares the Docker container in such a way that, when in execution, the container autonomously puts the user's jobs into execution. The user does not have any communication with the Docker container while it is in execution.
+singularity>
+</pre>
-Both interactive and non-interactive user jobs can be run via a [[User Jobs#Using SLURM to run a Docker container|(quite complex) command]] directly issued from the [[System#Accessing Mufasa|terminal opened via SSH]]. To reduce the possibility of mistakes, it is usually preferable to define an [[User Jobs#Using execution scripts to run jobs|execution script]] that takes care of launching the job.
+=== Interaction between container filesystem and local filesystem ===
-== Job output ==
+The filesystem inside the container and the local one, i.e. Mufasa's, can interact. This means that the container can access the local filesystem to read and/or write files. However, the only parts of Mufasa's filesystem that can be accessed by the container are those that the user running the container has access rights to.
-The whole point of running a user job is to collect its output. Usually, such output takes the form of one or more files generated within the filesystem of the Docker container.
+As a default, the user's <code>/home/username</code> directory on Mufasa is automatically mapped onto <code>/home/username</code> into the filesystem of the container. Whatever is done to that container directory, the changes are actually applied to the local <code>/home/username</code> directory on Mufasa.
-As [[User Jobs#Using SLURM to run a Docker container|explained below]], SLURM includes a mechanism to mount a part of Mufasa's own filesystem onto the container's filesystem: so when the job running within the container writes to this mounted part, it actually writes to Mufasa's filesystem. This means that when the Docker container ends its execution, its output files persist in Mufasa's filesystem (usually in a subdirectory of the user's own <code>/home</code> directory) and can be retrieved by the user at a later time.
+The mapping of the home directory does not need to be explicitly requested. However, if the user needs (in addition to the home directory) other parts of the local filesystem of Mufasa to be mapped onto the container's filesystem, this is possible by using this modified version of the <code>singularity run</code> command:
-The same mechanism can be used to allow user jobs running into a Docker container to read their input data from Mufasa's filesystem (usually a subdirectory of the user's own <code>/home</code> directory).
+<pre style="color: lightgrey; background: black;">
+singularity run --bind </path/to/local/directory>:<path/to/container/directory> <repository>://<name_of_container>
+</pre>
-== Using SLURM to run a Docker container ==
+If <code><path/to/container/directory></code> does not exist in the container's filesystem, it gets created by Singularity.
-The first step to run a user job on Mufasa is to run the [[System#Docker Containers|Docker container]] where the job will take place. A container is a “sandbox” containing the environment where the user's application operates. Parts of Mufasa's filesystem can be made visible (and writable, if they belong to the user's <code>/home</code> directory) to the environment of the container. This allows the containerized user application to read from, and write to, Mufasa's filesystem: for instance, to read data and write results. This wiki includes [[Docker|directions about preparing Docker containers]]
+=== How to know if your shell is a SLURM job ===
+The first way to know is to look at the shell prompt: for the [[System#Login_server|login server]] (i.e., if your shell is not being run by SLURM) it should look like
-Each user is in charge of preparing the Docker container(s) where the user's jobs will be executed. In most situations the user can simply select a suitable ready-made container from the many which are already available for use.
+<pre style="color: lightgrey; background: black;">
+your_username@mufasa2-login:~$
+</pre>
-In order to run a Docker container via SLURM, a user must use a command similar to the following ones:
+while for Mufasa (i.e., if the shell is being run by SLURM) it should look like
-For [[User Jobs#Interactive and non-interactive user jobs|interactive user jobs]]:
 <pre style="color: lightgrey; background: black;">
-srun [‑p <partition_name>] ‑‑container-image <container_path.sqsh> [--job-name=<jobname>] [‑‑no‑container‑entrypoint] ‑‑container‑mounts=<mufasa_dir>:<docker_dir> [‑‑gres=<gpu_resources>] [‑‑mem=<mem_resources>] [‑‑cpus‑per‑task <cpu_amount>] [‑‑time=<hh:mm:ss>] ‑‑pty <command_to_run_within_container>
+your_username@mufasa2:~$
 </pre>
-For [[User Jobs#Interactive and non-interactive user jobs|non-interactive user jobs]]:
+Sometimes you may need another way to check: for instance, you may have changed your shell prompt so that it does not print the machine name. In these cases, to know if the shell you are using is run by SLURM or not you can run command
 <pre style="color: lightgrey; background: black;">
-srun [‑p <partition_name>] ‑‑container-image <container_path.sqsh> [--job-name=<jobname>] [‑‑no‑container‑entrypoint] ‑‑container‑mounts=<mufasa_dir>:<docker_dir> [‑‑gres=<gpu_resources>] [‑‑mem=<mem_resources>] [‑‑cpus‑per‑task <cpu_amount>] [‑‑time=<hh:mm:ss>] [command_to_run_within_container]
+echo $SLURM_JOB_ID
 </pre>
-The parts of the above commands within <code>[square brackets]</code> are optional.
+If the command provides an output, your shell is a SLURM job and that output is the ID of the job. If the command doesn't provide any output, your shell is not a SLURM job.
+== Using SLURM to run a non-interactive job on Mufasa ==
-Below, the elements of these commands are explained.
+When the user job to be executed into a container is non-interactive, the mechanism based on an ''execution script'' already described in [[#Non-interactive jobs|Non-interactive jobs]] is employed. The command to run the script which in turn will run the container where the user job takes place is therefore
+<pre style="color: lightgrey; background: black;">
+sbatch <execution_script>
+</pre>
-:;‑p <partition_name>
+The general features of a SLURM execution script and the SBATCH directives used for generic jobs have [[#Non-interactive jobs|already been described]]. Here we focus, therefore, on the SBATCH directives specifically used when SLURM is used to run a non-interactive job within a container.
-:: specifies the [[User Jobs#SLURM partitions|SLURM partition]] on which the job will be run.
-:: ''Important! If <code>‑‑p <partition_name></code> is used, options that specify how many resources to assign to the job (such as <code>‑‑mem=<mem_resources></code>, <code>‑‑cpus‑per‑task <cpu_number></code> or <code>‑‑time=<hh:mm:ss></code>) can be omitted, greatly simplyfying the command. If an explicit amount is not requested for a given resource, the job is assigned the default amount of the resource (as defined by the chosen partition). A notable exception concerns option <code>‑‑gres=<gpu_resources></code>, which is always required (see below) if the job needs access to GPUs.''
+Below is an '''execution script template''' to be copied and pasted into your own execution script text file.
-:; --job-name=<jobname>
+<blockquote>
-:: Specifies a name for the job. The specified name will appear along with the JOBID number when querying running jobs on the system with <code>squeue</code>. The default job name (i.e., the one assigned to the job when <code>--job-name</code> is not used) is the executable program's name.
+'''<nowiki>#</nowiki>!/bin/bash'''
-:;‑‑container-image <container_path.sqsh>
+<nowiki>#</nowiki>----------------start of preamble----------------
-:: specifies the container to be run
-:;‑‑no‑container‑entrypoint
+[[#Non-interactive jobs|#SBATCH directives already described above]]
-:: specifies that the ''entrypoint'' defined in the container image should not be executed ([[Docker#Preparation|ENTRYPOINT in the Dockerfile that defines the container]]). The entrypoint is an element of a Docker container: a command that gets executed as soon as the container is in execution. Option <code>‑‑no‑container‑entrypoint</code> is useful when -for some reason- the user does not want the entrypoint in the container to be run.
-:;<nowiki>‑‑container‑mounts=<mufasa_dir>:<docker_dir></nowiki>
+<nowiki>#</nowiki>----------------end of preamble----------------
-:: specifies what parts of Mufasa's filesystem will be available within the container's filesystem, and where they will be mounted. This is necessary to let the container [[User Jobs#Job output|get input data from Mufasa and/or write output data to Mufasa]]. For instance, if <code><mufasa_dir>:<docker_dir></code> takes the value <code>/home/mrossi:/data</code> this tells srun to mount Mufasa's directory <code>/home/mrossi</code> in position <code>/data</code> within the filesystem of the Docker container. When the docker container reads or writes files in directory <code>/data</code> of its own (internal) filesystem, what actually happens is that files in <code>/home/mrossi</code> get manipulated instead. <code>/home/mrossi</code> is the only part of the filesystem of Mufasa that is visible to, and changeable by, the Docker container.
-:;‑‑gres=<gpu_resources>
+'''module load amd/singularity'''
-:: specifies what GPUs to assign to the container. <code>gpu_resources</code> is a comma-delimited list where each element has the form <code>gpu:<Type>:<amount></code>, where <code><Type></code> is one of the types of GPU available on Mufasa (see [[User Jobs#gres syntax|<code>gres</code> syntax]]) and <code><amount></code> is an integer between 1 and the number of GPUs of such type available to the partition. For instance, <code><gpu_resources></code> may be <code>gpu:40gb:1,gpu:10gb:3</code>, corresponding to asking for one "full" GPU and 3 "small" GPUs.
-:: ''Important! The <code>‑‑gres</code> parameter is '''mandatory''' if the job needs to use the system's GPUs. Differently from other resources (where unspecified requests lead to the assignment of a default amount), GPUs must always be explicitly requested.''
+'''singularity run <repository>://<name_of_container> <command_to_run>'''
-:;‑‑mem=<mem_resources>
+</blockquote>
-:: specifies the amount of RAM to assign to the container; for instance, <code><mem_resources></code> may be <code>200G</code>
-:;‑‑cpus-per-task <cpu_amount>
+In the last line of the script, <code><command_to_run></code> is the command (e.g., the name of an executable script), complete with path within the container's filesystem, of the program to be run into the container. Please refer to the [[Singularity|section about Singularity]] for details about its commands.
-:: specifies how many CPUs to assign to the container; for instance, <code><cpu_amount></code> may be <code>2</code>
-:;<nowiki>‑‑time=<d-hh:mm:ss></nowiki>
+The interactions between container filesystem and local filesystem in non-interactive jobs are exactly the same [[#Interaction between container filesystem and local filesystem|already described]] for interactive jobs. In particular, the user's home directory is mapped by default onto the filesystem of the container.
-:: specifies the maximum time allowed to the job to run, in the format <code>days-hours:minutes:seconds</code>, where <code>days</code> is optional; for instance, <code><d-hh:mm:ss></code> may be <code>72:00:00</code>
-:;‑‑pty
+If, in addition to that, the user needs another part of the filesystem of Mufasa are to be mapped onto the container's filesystem, this is possible using this modified version of the <code>singularity run</code> command at the end of the script:
-:: specifies that the job will be interactive (this is necessary when <code><command_to_run_within_container></code> is <code>/bin/bash</code>: see [[User Jobs#Running interactive jobs via SLURM|Running interactive jobs via SLURM]])
-:;<command_to_run_within_container>, [command_to_run_within_container]
+:<code>singularity run --bind </path/to/local/directory>:<path/to/container/directory> <repository>://<name_of_container></code>
-:: the command that will be put into execution within the Docker container as soon as it is operative. Note that this is mandatory for interactive user jobs and optional for non-interactive user jobs.
-:: ''Important! This command will be run in the environment created by Docker.''
+If <code><path/to/container/directory></code> does not exist in the container's filesystem, it gets created by Singularity.
-For interactive user jobs, a typical value for <code><command_to_run_within_container></code> is <code>/bin/bash</code>. This instructs srun to open an interactive shell session (i.e. a command-line terminal interface) within the container, from which the user will then run their job. Another typical value for <code><command_to_run_within_container></code> is <code>python</code>, which launches an interactive Python session from which the user will then run their job.
+== Job output ==
-For non-interactive user jobs, using <code>[command_to_run_within_container]</code> is one of the two available methods to run the program(s) that the user wants to be executed within the Docker container. The other available method to run the user job(s) is to use the ''entrypoint'' of the container. The use of <code>[command_to_run_within_container]</code> is therefore optional.
+The whole point of running a user job is to collect its output. Usually, such output takes the form of one or more files generated within the filesystem of Mufasa by the container where the computation takes place.
-== Using execution scripts to run jobs ==
+As [[#Using SLURM to run a container|explained below]], SLURM includes a mechanism to mount a part of Mufasa's own filesystem onto the container's filesystem: so when the job running within the container writes to this mounted part, it actually writes to Mufasa's filesystem. This means that when the container ends its execution, its output files persist in Mufasa's filesystem (usually in a subdirectory of the user's own <code>/home</code> directory) and can be retrieved by the user at a later time.
-The <code>srun</code> commands described in [[User Jobs#Using SLURM to run a Docker container|Using SLURM to run a Docker container]] are very complex, and it's easy to forget some option or make mistakes while using them. For non-interactive jobs, there is a solution to this problem.
+The same mechanism can be used to allow user jobs running into a container to read their input data from Mufasa's filesystem (usually a subdirectory of the user's own <code>/home</code> directory).
-When the user job is non-interactive, in fact, the <code>srun</code> command can be substituted with a much simpler '''<code>sbatch</code> command'''. As [[User Jobs#Running jobs with SLURM: generalities|already explained]], <code>sbatch</code> can make use of an '''execution script''' to specify all the parts of the command to be run via SLURM. So the command to run the Docker container where the user job will take place becomes
+== Cancelling completed jobs ==
-<pre style="color: lightgrey; background: black;">
+When a user process run via SLURM has completed its execution and is not needed anymore, it is important to [[#Cancelling a job with scancel|close it with scancel]]. Especially if much time remains to the end of the execution time requested by the job.
-sbatch <execution_script>
-</pre>
-An execution script is a special type of Linux script that includes SBATCH directives. SBATCH directives are used to specify the values of the parameters that are otherwise set in the [options] part of an <code>srun</code> command.
+Cancelling a SLURM job makes the resources reserved by SLURM free again for other users, and thus speeds up the execution of the jobs still queued.
-:{|class="wikitable"
+Typically, one doesn't know how long a piece of code will take to complete its work. So please make sure to check from time to time if that happened, and -if there's still time before the duration of your SLURM job ends- just ''scancel'' the job. Other users will be grateful :-)
-|'''''Note on Linux shell scripts'''''
-|-
-|''A shell script is a text file that will be run by the bash shell. In order to be acceptable as a bash script, a text file must:
-* ''have the “executable” flag set''
+Please note that [[System#Job priority|job priority]] for your user depends (also) on the overall duration of the jobs that you ran on Mufasa. Therefore, '''cancelling jobs that are not needed anymore improves your future jobs' priority'''.
-* ''have <code>#!/bin/bash</code> as its very first line''
-''Usually, a Linux shell script is given a name ending in ''.sh,'' such as ''my_execution_script.sh'', but this is not mandatory.''
+= Looking for unused GPUs =
-''Within any shell script, lines preceded by <code>#</code> are comments (with the notable exception of the initial <code>#!/bin/bash</code> line). Use of blank lines as spacers is allowed.''
+GPUs are usually the most limited resource on Mufasa. So, if your job requires a GPU, the best way to get it executed quickly is to use a QOS associated to a type of GPU of which there are one or more that aren't currently in use. This command
-|}
-An execution script is a Linux shell script composed of two parts:
+<pre style="color: lightgrey; background: black;">
+sinfo -O Gres:100
+</pre>
-# a '''preamble''',  composed of directives using which the user specifies the values to be given to parameters, each preceded by the keyword <code>SBATCH</code>
+provides a summary of all the Gres (i.e., GPU) resources possessed by Mufasa. It provides an output similar to the following:
-# [optionally] one or more '''<code>srun</code> commands''' that launch jobs with SLURM using the parameter values specified in the preamble
-The <code>srun</code> commands are optional because jobs can also be launched by the Docker container's own entrypoint.
+<pre style="color: lightgrey; background: black;">
+GRES
+gpu:40gb:3,gpu:4g.20gb:5,gpu:3g.20gb:5
+</pre>
-Below is an '''execution script template''' to be copied and pasted into your own execution script text file.
+To know which of the GPUs are currently in use, use command
-The template includes all the options [[User Jobs#Using SLURM to run a Docker container|already described above]], plus a few additional useful ones (for instance, those that enable SLURM to send email messages to the user in correspondence to events in the lifecycle of their job). Information about all the possible options can be found in [SLURM's own documentation].
+<pre style="color: lightgrey; background: black;">
+sinfo -O GresUsed:100
+</pre>
-All the SBATCH directives in the script template below are inactive because commented out. To enable a directive, just uncomment it by removing the leading "#". To make them stand out more visibly, in the template the comments corresponding to actual instructions are in bold.
+which provides an output similar to this:
-<blockquote>
+<pre style="color: lightgrey; background: black;">
-'''<nowiki>#</nowiki>!/bin/bash'''
+GRES_USED
+gpu:40gb:2(IDX:0-1),gpu:4g.20gb:2(IDX:5,8),gpu:3g.20gb:3(IDX:3-4,6)
+</pre>
-<nowiki>#</nowiki>----------------start of preamble----------------
+By comparing the two lists (GRES and GRES_USED) you can easily spot unused GPUs.
-'''<nowiki>#</nowiki> SBATCH ‑p <partition_name>'''
+= Detaching from a running job with <code>screen</code> =
-'''<nowiki>#</nowiki> SBATCH ‑‑container-image <container_path.sqsh>'''
+A consequence of the way <code>srun</code> operates is that if you launch an [[#Interactive and non-interactive user jobs|interactive user job]], the shell where the command is running must remain open: if it closes, the job terminates. That shell runs in the terminal of your own PC where the [[System#Accessing Mufasa|SSH connection to Mufasa]] exists.
-'''<nowiki>#</nowiki> SBATCH --job-name=<name>'''
+If you do not plan to keep the SSH connection to Mufasa open (for instance because you have to turn off or suspend your PC), there is a way to keep your interactive job alive. Namely, you should use command <code>srun</code> inside a ''screen session'' (often simply called "a screen"), then ''detach'' from the ''screen'' ([https://linuxize.com/post/how-to-use-linux-screen/ here] is one of many tutorials about <code>screen</code> available online).
-'''<nowiki>#</nowiki> SBATCH ‑‑no‑container‑entrypoint'''
+Once you have detached from the screen session, you can close the SSH connection to Mufasa without damage. When you need to reach your (still running) job again, you can can open a new SSH connection to Mufasa and then ''reattach'' to the ''screen''.
-'''<nowiki>#</nowiki> SBATCH ‑‑container‑mounts=<mufasa_dir>:<docker_dir>'''
+A use case for screen is writing your program in such a way that it prints progress advancement messages as it goes on with its work. Then, you can check its advancement by periodically reconnecting to the screen where the program is running and reading the messages it printed.
-'''<nowiki>#</nowiki> SBATCH ‑‑gres=<gpu_resources>'''
+Basic usage of <code>screen</code> is explained below.
-'''<nowiki>#</nowiki> SBATCH ‑‑mem=<mem_resources>'''
+== Creating a screen session, running a job in it, detaching from it ==
-'''<nowiki>#</nowiki> SBATCH ‑‑cpus-per-task <cpu_amount>'''
+# Connect to the [[System#Login server|login server]] with SSH
+# From the login server shell, run <pre style="color: lightgrey; background: black;">screen</pre>
+# In the ''screen session'' ("screen") thus created (it has the look of an empty shell), launch your job with <code>srun</code>
+# ''Detach'' from the screen by pressing '''''ctrl + A''''' followed by '''''D''''': you will come back to the original login server shell, while your process will go on running in the screen
+# You can now close the SSH connection to the login server without damaging your running job
-'''<nowiki>#</nowiki> SBATCH ‑‑time=<d-hh:mm:ss>'''
+== Reattaching to an active screen session ==
-: <nowiki>#</nowiki> The following directives (not described [[User Jobs#Using SLURM to run a Docker container|so far]]) activate SLURM's email notifications:
+# Connect to the [[System#Login server|login server]] with SSH
+# In the login server shell, run <pre style="color: lightgrey; background: black;">screen -r</pre>
+# You are now back to the screen where you launched your job
-: <nowiki>#</nowiki> the first specifies where they are sent; the following 3 set up notifications start/end/failure of job execution
+== Closing (i.e. destroying) a screen session ==
-'''<nowiki>#</nowiki> SBATCH --mail-user <email_address>'''
+When you do not need a screen session anymore:
-'''<nowiki>#</nowiki> SBATCH --mail-type BEGIN'''
+# reattach to the active screen session as explained [[#Reattaching to an active screen session|above]]
+# destroy the screen by pressing '''ctrl + A''' followed by '''\''' (i.e., backslash), then confirming that you really want to proceed
-'''<nowiki>#</nowiki> SBATCH --mail-type END'''
+Of course, any program (including SLURM jobs) running within the screen gets terminated when the screen is destroyed.
-'''<nowiki>#</nowiki> SBATCH --mail-type FAIL'''
+= Using <code>salloc</code> to reserve resources =
-<nowiki>#</nowiki>----------------end of preamble----------------
+== What is <code>salloc</code>? ==
-'''<nowiki>#</nowiki> srun <command_to_run_within_container>'''
+[https://slurm.schedmd.com/salloc.html <code>salloc</code>] is a SLURM command that allows a user to reserve a set of resources (e.g., a 40 GB GPU) for a given time in the future.
-: <nowiki>#</nowiki> to run the user job, either uncomment (and personalise) the above srun command or use the [[Docker#Preparation|entrypoint]] of the Docker container
+The typical use of <code>salloc</code> is to "book" an interactive session where the user enjoys '''complete control of a set of resources'''. The resources that are part of this set are chosen by the user. Within the "booked" session, any job run by the user that relies on the reserved resources is immediately put into execution by SLURM.
-</blockquote>
-== Nvidia Pyxis ==
+More precisely:
+* the user, using <code>salloc</code>, specifies what resources they need and the time when they will need them;
+* when the delivery comes, SLURM creates an interactive shell session for the user;
+* within such session, the user can use <code>srun</code> and <code>sbatch</code> to run programs, enjoying full (i.e. not shared with anyone else) and instantaneous access to the resources.
-Some of the options described below are specifically dedicated to Docker containers: these are provided by the [https://github.com/NVIDIA/pyxis Nvidia Pyxis] package that has been installed on Mufasa as an adjunct to SLURM. Pyxis allows unprivileged users (i.e., those that are not administrators of Mufasa) to execute containers and run commands within them.
+Resource reservation using <code>salloc</code> is only possible if the request is done in advance wrt the delivery time. The more the resources that the user wants to reserve are in high demand, the more anticipated the request should be to ensure that SLURM is able to fulfill it.
-More specifically, options <code>‑‑container-image</code>, <code>‑‑no‑container‑entrypoint</code>, <code>‑‑container-mounts</code> are provided to <code>srun</code> by Pyxis.
+When a user makes a request for resources with <code>salloc</code>, the request (called an '''allocation''') gets added to the job queue of SLURM of the requisite partition as a job in <code>pending</code> (<code>PD</code>) state (job states are described [[User_Jobs#Interpreting Job state as provided by squeue|here]]). Indeed, resource allocation is the first part of SLURM's process of executing a user job, while the second part is running the program and letting it use the allocated resources. Using <code>salloc</code> actually corresponds to having SLURM perform the first part of the process (resource allocation) while leaving the second part (running programs) to the user.
-See the  [https://github.com/NVIDIA/pyxis Nvidia Pyxis github page] for additional information about the options that it provides to <code>srun</code>.
+Until the delivery time specified by the user comes, the allocation remains in state <code>PD</code>, and other jobs requesting the same resources, even if submitted later, are executed. While the request waits for the delivery time, however, it accumulates a priority that increases over time. The longer the allocation stays in the <code>PD</code> state, the stronger this accumulation of priority: so, by requesting resources with <code>salloc</code> '''well in advance of the delivery time''', users can ensure that the resources they need will be ready for them at the requested delivery time, even if these resources are highly contended.
-== Launching a user job from within a Docker container ==
+== <code>salloc</code> commands ==
-For interactive user jobs, once the Docker container (run as [[User Jobs#Using SLURM to run a Docker container|explained here]]) is up and running, the user is dropped to the interactive environment specified by <code><command_to_run_within_container></code>. This interactive environment can be, for instance, a bash shell or an interactive Python console. Once inside the interactive environment, the user can simply run the required program in the usual way (depending on the type of environment).
+<code>salloc</code> commands use a similar syntax to <code>srun</code> commands. In particular, <code>salloc</code> lets a user specify what resources they need and -importantly- a '''delivery time''' for the requested resources (delivery time can also be specified with <code>srun</code>, but in that case it is not very useful).
-Please note that the interactive environment of the Docker container does not have any relation with Mufasa's system. The only contact point is the part of Mufasa's filesystem that has been grafted to the container's filesystem via the <code>‑‑container‑mounts</code> option of <code>srun</code>. In particular, none of the software packages (such as the Nvidia drivers) installed on Mufasa are available in the container, unless they have been installed in it at preparation time (as explained in [[Docker]]), or manually after the container is put in execution.
+The typical <code>salloc</code> command has this form:'
-Also note that, once a Docker container launched with <code>srun</code> is in execution, its own bash shell is completely indistinguishable from the bash shell of Mufasa where the <code>srun</code> command that put the container in execution was issued. The two shells share the same terminal window. The only clue to the fact that you now are, in fact, in the container's shell may be the command prompt, which should now show your location as <code>/opt</code>.
+<pre style="color: lightgrey; background: black;">
+salloc [general_SLURM_options] --begin=<time>
+</pre>
+where
-= Detaching from a running job with <code>screen</code> =
+:; [general_SLURM_options]
+:: represents the options already described in [[#Options of srun and sbatch|Options of srun and sbatch]]
-A consequence of the way <code>srun</code> operates is that if you launch an [[User Jobs#Interactive and non-interactive user jobs|interactive user job]], the shell where the command is running must remain open: if it closes, the job terminates. That shell runs in the terminal of your own PC where the [[System#Accessing Mufasa|SSH connection to Mufasa]] exists.
+:;<nowiki>--begin=<time></nowiki>
+:: specifies the delivery time of the resources reserved with <code>salloc</code>, according to the syntax described below. The delivery time must be a future time.
-If you do not plan to keep the SSH connection to Mufasa open (for instance because you have to turn off or suspend your PC), there is a way to keep your interactive job alive. Namely, you should use command <code>srun</code> inside a ''screen session'' (often simply called "a screen"), then ''detach'' from the ''screen'' ([https://linuxize.com/post/how-to-use-linux-screen/ here] is one of many tutorials about <code>screen</code> available online).
+=== Syntax of parameter <code>--begin</code> ===
-Once you have detached from the screen session, you can close the SSH connection to Mufasa without damage. When you need to reach your (still running) job again, you can can open a new SSH connection to Mufasa and then ''reattach'' to the ''screen''.
+If the allocation is for the current day, you can specify <nowiki><time></nowiki> as hours and minutes in the form
-A use case for screen is writing your program in such a way that it prints progress advancement messages as it goes on with its work. Then, you can check its advancement by periodically reconnecting to the screen where the program is running and reading the messages it printed.
+:<code>HH:MM</code>
-Basic usage of <code>screen</code> is explained below.
+If you want to specify a time of a different day, the form for <time> is <code>YYYY-MM-DDTHH:MM</code>, where the uppercase 'T' separates date from time.
-== Creating a screen session, running a job in it, detaching from it ==
+It is also possible to specify <time> as relative to the current time, in one of the following forms:
-# Connect to Mufasa with SSH
+: <code>now+Kminutes</code>
-# From the Mufasa shell, run <pre style="color: lightgrey; background: black;">screen</pre>
+: <code>now+Khours</code>
-# In the ''screen session'' ("screen") thus created (it has the look of an empty shell), launch your job with <code>srun</code>
+: <code>now+Kdays</code>
-# ''Detach'' from the screen by pressing '''''ctrl + A''''' followed by '''''D''''': you will come back to the original Mufasa shell, while your process will go on running in the screen
+where K is a (positive) integer.
-# You can now close the SSH connection to Mufasa without damaging your running job
-== Reattaching to an active screen session ==
+Examples:
-# Connect to Mufasa with SSH
+: <code>--begin=16:00</code>
-# In the Mufasa shell, run <pre style="color: lightgrey; background: black;">screen -r</pre>
+: <code>--begin=now+1hours</code>
-# You are now back to the screen where you launched your job
+: <code>--begin=now+1days</code>
+: <code>--begin=2030-01-20T12:34:00</code>
-== Closing (i.e. destroying) a screen session ==
+Note that Mufasa's time zone is GMT, so <nowiki><time></nowiki> must be expressed in GMT as well. If you want to know Mufasa's current time, use command
-When you do not need a screen session anymore:
-# reattach to the screen as explained above
+<pre style="color: lightgrey; background: black;">
-# destroy the screen by pressing '''ctrl + A''' followed by '''\''' (i.e., backslash)
+date
+</pre>
-Of course, any job running within the screen gets terminated when the screen is destroyed.
+It provides an output similar to the following:
+<pre style="color: lightgrey; background: black;">
+Thu Nov 10 16:43:30 UTC 2022
+</pre>
-= Automatic job caching =
+== How to use <code>salloc</code> ==
-When a job is run via SLURM (with or without an execution script), Mufasa exploits a (fully tranparent) caching mechanism to speed up its execution. The speedup is obtained by removing the need for the running job to execute accesses to the (mechanical and therefore relatively slow) HDDs where <code>/home</code> partitions reside, substituting them with accesses to (solid-state and therefore much faster) SSDs.
+In the typical scenario, the user of <code>salloc</code> will make use of [[User_Jobs#Detaching from a running job with screen|screen]]. Command <code>screen</code> creates a shell session (called "a screen") that it is possible to abandon without closing it ([[#Creating_a_screen_session.2C_running_a_job_in_it.2C_detaching_from_it|detaching from the screen]]). It is then possible to reach again the screen at a later time ([[#Reattaching_to_an_active_screen_session|reattaching to the screen]]). This means that a user can create a screen, run <code>salloc</code> within it to create an allocation for time X, detach from the screen and reattach to it just before time X to use the reserved resources from the interactive session created by <code>salloc</code>.
-Each time a job is run via SLURM, this is what happens automatically:
+More precisely, the operations needed to do this are the following:
-# Mufasa temporarily copies code and associated data from the directory where the executables are located (in the user's own <code>/home</code>) to a cache space located on system SSDs
+# Connect to the [[System#Login server|login server]].
-# Mufasa launches the cached copy of the user executables, using the cached copies of the data as its input files
+# From the login server shell, run <pre style="color: lightgrey; background: black;">screen</pre>
-# The executables create their output files in the cache space
+# In the ''screen session'' ("screen") thus created run the [[#salloc commands|<code>salloc</code> command]], specifying via its options the resources you need and the time at which you want them delivered.
-# When the user jobs end, Mufasa copies the output files from the cache space back to the user's own <code>/home</code>
+# SLURM will respond with a message similar to <pre style="color: lightgrey; background: black;">salloc: Pending job allocation XXXX</pre>
+# ''Detach'' from the screen by pressing '''''ctrl + A''''' followed by '''''D''''': you will come back to the original login server shell.
+# You can now close the SSH connection to the login server without damaging your resource allocation request.
+# At the delivery time you specified in the [[#salloc commands|<code>salloc</code> command]], connect to the login server with SSH.
+# Once you are in the login server shell, reattach to the screen with command <pre style="color: lightgrey; background: black;">screen -r</pre>
+# You are now back to the screen where you used <code>salloc</code>; as soon as SLURM provides to you with the resources you reserved, message "''salloc: Pending job allocation XXXX''" changes to the shell prompt.
+# You are now in the interactive shell session you booked with <code>salloc</code>. From here, you can run any programs you want, including <code>srun</code> and <code>sbatch</code>. For the whole duration of the allocation, your programs have unrestricted use of all the resources you reserved with <code>salloc</code>.<br>'''Important!''' Any job run within the shell session is subject to the time limit (i.e., maximum duration) imposed by the partition it is running on! Therefore, if the job reaches the time limit, it gets '''forcibly terminated''' by SLURM. Termination depends exclusively from the time limit: so it occurs even if the end time for the allocation has not been reached yet. (Of course, the job also gets terminated if the allocation ends.)
+# Once the interactive shell session is not needed anymore, cancel it by exiting from the session with <pre style="color: lightgrey; background: black;">exit</pre> (Note that if you get to the end of the time period you specified in your request without closing the shell session, SLURM does it for you, killing any programs still running.)
+# You are now back to your screen. Destroy it by pressing '''ctrl + A''' followed by '''\''' (i.e., backslash) to get back to the login server shell.
-The whole process is completely transparent to the user. The user simply prepares the executable (or the [[User Jobs# Using execution scripts to wrap user jobs|execution script]]) in a subdirectory of their <code>/home</code> directory and runs the job. When job execution is complete, the user finds their output data in the origin subdirectory of <code>/home</code>, exactly as if the execution actually occurred there.
+== Cancelling a resource request made with <code>salloc</code> ==
-'''Important!''' The caching mechanism requires that ''during job execution'' the user does not modify the contents of the <code>/home</code> subdirectory where executable and data were at execution time. Any such change, in fact, will be overwritten by Mufasa at the end of the execution, when files are copied back from the caching space.
+To cancel a request for resources made as explained in [[#How to use salloc|How to use <code>salloc</code>]], follow these steps:
+# Connect to the the [[System#Login server|login server]] with SSH.
+# Once you are in the login server shell, reattach to the screen where you used command <code>salloc</code> with command <pre style="color: lightgrey; background: black;">screen -r</pre>
+# You should see the message "''salloc: Pending job allocation XXXX''" (if the allocation is still pending) or ""''salloc: job XXXX queued and waiting for resources''" (if the allocation is done and waiting for its start time). Now just press '''Ctrl + C'''. This communicates to SLURM your intention to cancel your request for resources.
+# SLURM will communicate the cancellation with message <pre style="color: lightgrey; background: black;">salloc: Job allocation XXXX has been revoked.</pre>
+# Destroy the screen by pressing '''ctrl + A''' followed by '''\''' (i.e., backslash) to get back to the login server shell.
 = Monitoring and managing jobs =
@@ Line 556: / Line 555: @@
 This output comprises the following information:
-; JOBID
+:; JOBID
-: Numerical identifier of the job assigned by SLURM
+:: Numerical identifier of the job assigned by SLURM
-: This identifier is used to intervene on the job, for instance with <code>scancel</code>
+:: This identifier is used to intervene on the job, for instance with <code>scancel</code>
-; PARTITION
+:; PARTITION
-: the partition that the job is run on
+:: the partition that the job is run on
-; NAME
+:; NAME
-: the name assigned to the job; can be personalised using the <code>--job-name</code> option
+:: the name assigned to the job; can be personalised using the <code>--job-name</code> option
-; USER
+:; USER
-: username of the user who launched the job
+:: username of the user who launched the job
-; ST
+:; ST
-: job state (see [[User Jobs#Job state|Job state]] for further information)
+:: job state (see [[SLURM#Job state|Job state]] for further information)
-; TIME
+:; TIME
-: time that has passed since the beginning of job execution
+:: time that has passed since the beginning of job execution
-; NODES
+:; NODES
-: number of nodes where the job is being executed (for Mufasa, this is always 1 as it is a single machine)
+:: number of nodes where the job is being executed (for Mufasa, this is always 1 as it is a single machine)
-; NODELIST (REASON)
+:; NODELIST (REASON)
-: name of the nodes where the job is being executed: for Mufasa it is always <code>gn01</code>, which is the name of the node corresponding to Mufasa.
+:: name of the nodes where the job is being executed: for Mufasa it is always <code>gn01</code>, which is the name of the node corresponding to Mufasa.
@@ Line 588: / Line 587: @@
 </pre>
-=== Interpreting Job state ===
+=== Interpreting Job state as provided by <code>squeue</code> ===
 Jobs typically pass through several states in the course of their execution. Job state is shown in column "ST" of the output of <code>squeue</code> as an abbreviated code (e.g., "R" for RUNNING).
@@ Line 594: / Line 593: @@
 The most relevant codes and states are the following:
-; PD PENDING
+:'''<code>PD</code>''' PENDING
-: Job is awaiting resource allocation.
+:: Job is awaiting resource allocation.
-; R RUNNING
+:'''<code>R</code>''' RUNNING
-: Job currently has an allocation.
+:: Job currently has an allocation.
-; S SUSPENDED
+:'''<code>S</code>''' SUSPENDED
-: Job has an allocation, but execution has been suspended and CPUs have been released for other jobs.
+:: Job has an allocation, but execution has been suspended and CPUs have been released for other jobs.
-; CG COMPLETING
+:'''<code>CG</code>''' COMPLETING
-: Job is in the process of completing. Some processes on some nodes may still be active.
+:: Job is in the process of completing. Some processes on some nodes may still be active.
-; CD COMPLETED
+:'''<code>CD</code>''' COMPLETED
-: Job has terminated all processes on all nodes with an exit code of zero.
+:: Job has terminated all processes on all nodes with an exit code of zero.
 Beyond these, there are other (less frequent) job states. [https://slurm.schedmd.com/squeue.html The SLURM doc page for <code>squeue</code>] provides a complete list of them.
-== Knowing when jobs are expected to start ==
+== Knowing when jobs are expected to end or start ==
+If you are interested in understanding when jobs are expected to start or end, use command
+<pre style="color: lightgrey; background: black;">
+squeue -o "%5i %8u %10P %.2t |%19S |%.11L|"
+</pre>
+which provides an output is similar to the following:
+<pre style="color: lightgrey; background: black;">
+JOBID USER     PARTITION  ST |START_TIME          |  TIME_LEFT|
+  thuynh   fat        PD |2022-11-11T17:55:54 | 3-00:00:00|
+  thuynh   fat        PD |2022-11-11T17:55:54 | 3-00:00:00|
+  cziyang  fat         R |2022-11-08T16:58:03 | 1-00:48:14|
+  thuynh   fat         R |2022-11-10T08:13:30 | 2-16:03:41|
+  gnannini fat         R |2022-11-08T17:55:54 | 1-01:46:05|
+  ssaitta  gpu         R |2022-11-10T08:13:00 |    6:03:11|
+  dmilesi  gpulong     R |2022-11-10T15:11:32 | 2-23:01:43|
+  cziyang  gpulong     R |2022-11-10T09:45:01 | 1-17:35:12|
+</pre>
+;:For running jobs (state <code>R</code>):
+::column "START_TIME" tells you when the job started its execution
+::column "TIME_LEFT" tells you how much remains of the running time requested by the job
+;:For pending jobs (state <code>PD</code>):
+::column "START_TIME" tells you when the job is expected to start its execution
+::column "TIME_LEFT" tells you how much running time has been requested by the job
-Command
+'''Important!''' Start and end times are forecasts based on the features of current jobs in the queues, and may change if running jobs end prematurely and/or if new jobs with higher priority are added to the queues. So these times should never be considered as certain.
+If you simply want to know when pending jobs (state <code>PD</code>) are expected to begin execution, use
 <pre style="color: lightgrey; background: black;">
@@ Line 619: / Line 648: @@
 </pre>
-lists pending jobs in order of increasing start time (so the job on top is the one which will start first).
+which lists pending jobs in order of increasing START_TIME (the job on top is the one which will be run first). For each pending job the command provides an output similar to the example below:
-Its output is similar to this:
 <pre style="color: lightgrey; background: black;">
 JOBID PARTITION     NAME     USER ST          START_TIME  NODES SCHEDNODES           NODELIST(REASON)
        fat training   thuynh PD 2022-10-27T09:28:01      1 (null)               (Resources)
-[...info about other pending jobs...]
 </pre>
-where execution time is in column "START TIME".
-'''Important!''' Execution times are forecasts based on current job requests, so they should not be considered as certain.
 == Getting detailed information about a job ==
@@ Line 644: / Line 666: @@
 <pre style="color: lightgrey; background: black;">
-JobId=936 JobName=bash
+JobId=65 JobName=test_script.sh
-    UserId=acasella(1001) GroupId=acasella(1001) MCS_label=N/A
+    UserId=gfontana(10003) GroupId=gfontana(10004) MCS_label=N/A
-    Priority=7885 Nice=0 Account=research QOS=normal
+    Priority=14208 Nice=0 Account=admin QOS=nogpu
     JobState=RUNNING Reason=None Dependency=(null)
     Requeue=0 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
-    RunTime=03:21:59 TimeLimit=3-00:00:00 TimeMin=N/A
+    RunTime=00:00:55 TimeLimit=01:00:00 TimeMin=N/A
-    SubmitTime=2022-02-08T11:57:24 EligibleTime=2022-02-08T11:57:24
+    SubmitTime=2025-11-06T10:31:10 EligibleTime=2025-11-06T10:31:10
-    AccrueTime=Unknown
+    AccrueTime=2025-11-06T10:31:10
-    StartTime=2022-02-08T11:57:24 EndTime=2022-02-11T11:57:24 Deadline=N/A
+    StartTime=2025-11-06T10:31:10 EndTime=2025-11-06T11:31:10 Deadline=N/A
-   PreemptEligibleTime=2022-02-08T11:57:24 PreemptTime=None
+    SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-11-06T10:31:10 Scheduler=Main
-    SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-02-08T11:57:24 Scheduler=Main
+    Partition=jobs AllocNode:Sid=mufasa2-login:42020
-    Partition=fat AllocNode:Sid=rk018445:4034
     ReqNodeList=(null) ExcNodeList=(null)
     NodeList=gn01
     BatchHost=gn01
-    NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
+    NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
-    TRES=cpu=8,mem=128G,node=1,billing=8,gres/gpu:40gb=1
+    ReqTRES=cpu=1,mem=4G,node=1,billing=1
+   AllocTRES=cpu=1,mem=4G,node=1,billing=1
     Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
-    MinCPUsNode=8 MinMemoryNode=128G MinTmpDiskNode=0
+    MinCPUsNode=1 MinMemoryNode=4G MinTmpDiskNode=0
     Features=(null) DelayBoot=00:00:00
-    OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)
+    OverSubscribe=OK Contiguous=0 Licenses=(null) LicensesAlloc=(null) Network=(null)
-    Command=/bin/bash
+    Command=./test_script.sh
-    WorkDir=/home/acasella
+    WorkDir=/home/gfontana
-   Power=
-   TresPerNode=gres:gpu:40gb:1
 </pre>
-In particular, the line beginning with ''"StartTime="'' provides expected times for the start and end of job execution. As explained in [[User_Jobs#Knowing_when_jobs_are_expected_to_start|Knowing when jobs are expected to start]], start time is only a prediction and subject to change.
+In particular, the line beginning with ''"StartTime="'' provides expected times for the start and end of job execution. As explained in [[User_Jobs#Knowing_when_jobs_are_expected_to_end_or_start|Knowing when jobs are expected to end or start]], start time is only a prediction and subject to change.
+== Cancelling a job with <code>scancel</code> ==
+It is possible to cancel a job using command <code>scancel</code>, either while it is waiting for execution or when it is in execution (in this case you can choose what system signal to send the process in order to terminate it).
-== Canceling a job with <code>scancel</code> ==
+Please note that [[System#Job priority|job priority]] for your user depends (also) on the overall duration of the jobs that you ran on Mufasa. Therefore, '''cancelling jobs that are not needed anymore improves your future jobs' priority'''.
-It is possible to cancel a job using command <code>scancel</code>, either while it is waiting for execution or when it is in execution (in this case you can choose what system signal to send the process in order to terminate it). The following are some examples of use of <code>scancel</code> adapted from [https://slurm.schedmd.com/scancel.html SLURM's documentation].
+The following are some examples of use of <code>scancel</code> adapted from [https://slurm.schedmd.com/scancel.html SLURM's documentation].
 <pre style="color: lightgrey; background: black;">
@@ Line 702: / Line 726: @@
 <pre style="color: lightgrey; background: black;">
-sacct -X
+sacct -X --format=User%-10,Start,End,SubmitLine%-100
 </pre>
-provides a list of all jobs run today by your user.
+provides a list of all <code>srun</code> or <code>sbatch</code> commands executed by your user after yesterday's midnight. For each job, the command provides the start and end time and the command you used to run it (if needed, you can increase the value after <code>SubmitLine%-</code> to show longer commands in full).

Difference between revisions of "User Jobs"

Latest revision as of 16:42, 20 July 2026

Running jobs with SLURM

srun and sbatch

Options of srun and sbatch

Interactive jobs

Non-interactive jobs

Executing jobs on Mufasa

Key concept: use containers!

Interactive and non-interactive user jobs

Using SLURM to run an interactive job on Mufasa

Interaction between container filesystem and local filesystem

How to know if your shell is a SLURM job

Using SLURM to run a non-interactive job on Mufasa

Job output

Cancelling completed jobs

Looking for unused GPUs

Detaching from a running job with screen

Creating a screen session, running a job in it, detaching from it

Reattaching to an active screen session

Closing (i.e. destroying) a screen session

Using salloc to reserve resources

What is salloc?

salloc commands

Syntax of parameter --begin

How to use salloc

Cancelling a resource request made with salloc

Monitoring and managing jobs

Inspecting jobs with squeue

Interpreting Job state as provided by squeue

Knowing when jobs are expected to end or start

Getting detailed information about a job

Cancelling a job with scancel

Knowing what jobs you ran today

Navigation menu

Search

`srun` and `sbatch`

Options of `srun` and `sbatch`

Detaching from a running job with `screen`

Using `salloc` to reserve resources

What is `salloc`?

`salloc` commands

Syntax of parameter `--begin`

How to use `salloc`

Cancelling a resource request made with `salloc`

Inspecting jobs with `squeue`

Interpreting Job state as provided by `squeue`

Cancelling a job with `scancel`