Difference between revisions of "System"

From Mufasa (BioHPC)
Jump to navigation Jump to search
 
(296 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Mufasa is a Linux server located in a server room managed by the [[Roles|System Administrators]]. [[Roles|Job Users]] and [[Roles|Job Administrators]] can only access Mufasa remotely.  
Mufasa is a Linux server located in a server room managed by the [[Roles|System Administrators]].
 
[[Roles|Job Users]] and [[Roles|Job Administrators]] can only access Mufasa remotely.  


Remote access to Mufasa is performed using the [[System#Accessing Mufasa|SSH protocol]] for the execution of commands and the [[System#File transfer|SFTP protocol]] for the exchange of files. Once logged in, a user interacts with Mufasa via a terminal (text-based) interface.
Remote access to Mufasa is performed using the [[System#Accessing Mufasa|SSH protocol]] for the execution of commands and the [[System#File transfer|SFTP protocol]] for the exchange of files. Once logged in, a user interacts with Mufasa via a terminal (text-based) interface.


= <span style="background:#ffff00">Mufasa 2.0</span> =
At the beginning of November 2025 Mufasa has been subjected to a comprehensive hardware and software overhaul: the new system is sometimes called '''Mufasa 2.0'''; to distinguish it from the old system, the latter is sometimes called "Mufasa 1.0".


This wiki is currently (November 2025) being updated to reflect the changes. Elements that changed significantly from Mufasa 1.0 to Mufasa 2.0 are <span style="background:#ffff00">highlighted in yellow</span>. When the title of a section is highlighted, it means either that it is a new section, or that the section's contents significantly changed.


= Hardware =
= Hardware =


Mufasa is a server for massively parallel computation. Its main hardware components are:
[[File:hw.png|right|320px]]
Mufasa is a server for massively parallel computation. It has been set up and configured by [https://www.e4company.com/en/ E4 Computer Engineering] with the support of the [http://www.biomech.polimi.it/ Biomechanics Group], the [http://www.cartcas.polimi.it/ CartCasLab] laboratory and the [https://nearlab.polimi.it/ NearLab] laboratory.
 
Mufasa's main hardware components are:


* 32-core, 64-thread AMD processor
* Supermicro A+ Server 4124GS-TNR
* 2 AMD Epyc 7542 32-core processors (64 CPU cores total)
* 1 TB RAM
* 1 TB RAM
* 9 TB of SSDs (for OS and execution cache)
* 7 TB of SSDs <span style="background:#ffff00">(fast temporary repository for datasets actively being used - see [[#Storage spaces|Storage spaces]])</span>
* 28TB of HDDs (for user /home directories)
* 25 TB of HDDs (for user <code>/home</code> directories)
* 5 Nvidia A100 GPUs [based on the ''Ampere'' architecture]
* <span style="background:#ffff00">8 Nvidia A100 GPUs</span> [based on the ''Ampere'' architecture]
* Linux Ubuntu operating system
* [https://ubuntu.com/ Ubuntu Linux 24.04 LTS server] operating system
 
System resources are shared among different users and processes in order to optimise their usage and availability. This sharing is managed by [[System#The SLURM job scheduling system|SLURM]].
 
== CPUs and GPUs ==
 
Mufasa is fitted with two 32-core CPU, so the system has a total of 64 phyical CPUs. Of the 64 CPUs, most are reserved for the [[System#The SLURM job scheduling system|SLURM job scheduling system]] and can only be accessed via SLURM; the remaining few are used by the [[#Login server|login server]]).


Usually each of these resources (e.g., a GPU) is not fully assigned to a single user or a single job. On the contrary, access resources are shared among different users and processes in order to optimise their usage and availability.
For what concerns GPUs, some of the physical A100 GPUs have been subdivided into “virtual” GPUs with different capabilities using [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/ Nvidia's MIG system]. Precisely, 5 of the A100 GPUs have been subdivided into two GPUs, each possessing half the RAM of the original device (i.e., 20 GB). Since the A100 has 7 compute units onboard, one of the two virtual GPUs built out of a single A100 has 3 compute units, while the other has 4 compute units.


For what concerns GPUs, the 5 physical A100 GPUs are subdivided into “virtual” GPUs with different capabilities using Nvidia' MIG system. From [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/ MIG's user guide]:
All in all, the GPU complement of Mufasa comprises the following devices:


<blockquote>“''The Multi-Instance GPU (MIG) feature allows GPUs based on the NVIDIA Ampere architecture (such as NVIDIA A100) to be securely partitioned into up to seven separate GPU Instances for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilization. This feature is particularly beneficial for workloads that do not fully saturate the GPU’s compute capacity and therefore users may want to run different workloads in parallel to maximize utilization.''”
:; <span style="background:#FFFF00">5 GPUs with 20GB of RAM and 3 compute units</span>
</blockquote>
:; <span style="background:#FFFF00">5 GPUs with 20GB of RAM and 4 compute units</span>
:; <span style="background:#FFFF00">3 GPUs with 40 GB of RAM</span>


In practice, MIG allows flexible partitioning of a very powerful (but single) GPU to create multiple virtual GPUs with different capabilities, that are then made available to users as if they were separate devices.
Thanks to MIG, users can use all the GPUs listed above as if they were all physical devices installed on Mufasa, without having to worry (or even know) which actually are and which instead are virtual GPUs. How these devices are made available to Mufasa users is explained in [[SLURM]].


Command
You can use command


<code>[https://developer.nvidia.com/nvidia-system-management-interface '''nvidia-smi''']</code>
<pre style="color: lightgrey; background: black;">
nvidia-smi -L
</pre>


(“smi” stands for System Management Interface) provides an overview of the physical and virtual GPUs available to users in a system<ref>On Mufasa, this command may require to be launched via the SLURM job scheduling system (as explained in Section 2 of this document) in order to be able to access the GPUs.
to get in-depth information about the physical and virtual GPUs available to users in a system based on MIG. (On Mufasa, this command needs to be launched in a bash shell opened through SLURM in order to be able to access the GPUs.)


= Accessing Mufasa =
= Accessing Mufasa =


User access to Mufasa is always remote and exploits the ''SSH'' (''Secure SHell'') protocol. To open a remote connection to Mufasa, open a local terminal on your computer and, in it, run command
== <span style="background:#ffff00">Login server</span> ==
 
Differently from Mufasa 1.0, Mufasa 2.0 employs a ''login server'' to manage user connections. The login server is a Linux virtual machine with very limited resources (no GPUs, few CPUs, small RAM). Its task is only to provide users with a way to log into the system and launch [[User Jobs]] with SLURM. Jobs launched via SLURM run on Mufasa 2.0's physical hardware (not on the virtual hardware of the login server) and therefore can access to the hardware resources of Mufasa, such as the GPUs.
 
When you access Mufasa via SSH, the remote shell you are provided with is a shell to the login server, unable to perform computationally heavy tasks: for heavy tasks you have to [[User Jobs|launch a SLURM job]]. The only tasks you can execute directly from the login server shell are simple "housekeeping" tasks on your home directory, such as deleting files you do not need anymore.
 
Please note that if you try to run computationally heavy processes in the login server you can easily overwhelm its scarce resources, making it unavailable to all users and thus making Mufasa unreachable by anyone. The login server has safety mechanisms to prevent processes from hogging too much of its resources... by killing such processes. However, please avoid checking if these mechanisms work.
 
== <span style="background:#ffff00">Logging into the login server</span> ==
 
User access to Mufasa is always remote and exploits the ''SSH'' (''Secure SHell'') protocol.  
 
Access to Mufasa is not direct: instead, it is managed by a [[#Login server|login server]]. Once the user has logged into the login server, they can issue commands to [[#SLURM|SLURM]] to [[User Jobs|run processing jobs]].
 
To open a remote connection to the login server, open a local terminal on your computer and, in it, run command
 
<pre style="color: lightgrey; background: black;">
ssh <username>@10.79.23.96
</pre>
 
For example, user <code>mrossi</code> may access Mufasa with command
 
<code>ssh mrossi@10.79.23.96</code>
 
As soon as you launch the ''ssh'' command, you will be asked to type the password (i.e., the one of your user account on Mufasa). Once you provide the password, the local terminal on your computer becomes a remote terminal (a “remote shell”) through which you interact with the login server. The remote shell sports a ''command prompt'' such as


<pre style="color: silver; background: black;">
<pre style="color: lightgrey; background: black;">
ssh <your_username_on_Mufasa>@<Mufasa's_IP_address>
<username>@mufasa2-login:~$
</pre>
</pre>


where <code><Mufasa's_IP_address></code> is either <code>'''10.79.23.96'''</code> or <code>'''10.79.23.97'''</code>
(<code>mufasa2-login</code> is the Linux hostname of the login server). For instance, user <code>mrossi</code> will see a prompt similar to this:
For example, user mrossi may access Mufasa with command
 
<code>mrossi@mufasa2-login:~$</code>
 
Access via SSH works with Linux, MacOs and Windows 10 (and later) terminals. For other Windows users, a handy alternative tool (also including an X server, required to run on Mufasa Linux programs with a graphical user interface) is [https://mobaxterm.mobatek.net/ MobaXterm].
 
If you don't have a user account on Mufasa, you first have to ask your supervisor for one. See [[Users]] for more information.
 
In the remote shell to the login server opened via SSH, you can issue commands by typing them after the prompt, then pressing the ''enter'' key. Being Mufasa a Linux server, it will respond to all the standard Linux system commands such as <code>pwd</code> (which prints the path to the current directory) or <code>cd <destination_dir></code> (which changes the current directory). On the internet you can find many tutorials about the Linux command line, such as [https://linuxcommand.org/index.php this one].
 
To close the SSH session run


<pre style="color: silver; background: black;">
<pre style="color: lightgrey; background: black;">
ssh mrossi@10.79.23.97
exit
</pre>
</pre>


Access via SSH works with Linux, MacOs and Windows 10 (and later) terminals. For Windows users, a handy alternative tool (also including an X server, required to run on Mufasa Linux programs with a graphical user interface) is [https://mobaxterm.mobatek.net/ MobaXterm].
from the command prompt of the remote shell.
 
== <span style="background:#ffff00">Direct access to Mufasa (for users with running SLURM jobs)</span> ==
 
While a user has SLURM jobs in execution, such user may also log into Mufasa directly, without passing through the login server. This allows the user to interact with the running jobs, e.g. to monitor their progress. Direct access to Mufasa is done via SSH with command


If you don't have a user account on Mufasa, you first have to ask your supervisor for one. See [[System#Users and groups|Users and groups]] for more information about Mufasa's users.
<pre style="color: lightgrey; background: black;">
ssh <username>@10.79.23.97
</pre>


As soon as you launch the ''ssh'' command, you will be asked to type the password (i.e., the one of your user account on Mufasa). Once you provide the password, the local terminal on your computer becomes a remote terminal (a “remote shell”) through which you interact with Mufasa. The remote shell sports a ''command prompt'' such as
where both <code>username</code> and password are the same used for the [[#Login server|login server]].
For example, user <code>mrossi</code> may access Mufasa with command


<your_username_on_Mufasa>@rk018445:~$
<code>ssh mrossi@10.79.23.97</code>


(''rk018445'' is the Linux hostname of Mufasa). For instance, user mrossi will see a prompt similar to this:
As soon as you launch the ''ssh'' command, you will be asked to type the password (i.e., the one of your user account on Mufasa). Once you provide the password, the local terminal on your computer becomes a remote terminal (a “remote shell”) through which you interact with the Mufasa. The remote shell sports a ''command prompt'' such as


<pre style="color: silver; background: black;">
<pre style="color: lightgrey; background: black;">
mrossi@rk018445:~$
<username>@mufasa2:~$
</pre>
</pre>


In the remote shell, you can issue commands to Mufasa by typing them after the prompt, then pressing the ''enter'' key. Being Mufasa a Linux server, it will respond to all the standard Linux system commands such as <code>pwd</code> (which prints the path to the current directory) or <code>cd <destination_dir></code> (which changes the current directory). On the internet you can find many tutorials about the Linux command line, such as [https://linuxcommand.org/index.php this one].
(<code>mufasa2</code> is the Linux hostname of Mufasa). For instance, user <code>mrossi</code> will see a prompt similar to this:


To close the SSH session run
<code>mrossi@mufasa2:~$</code>
 
To a user logged into Mufasa this way, Mufasa appears to possess ''only the resources that SLURM has allocated to the running job(s) of that user''. For instance, if the user has a running job that requested 4 CPUs and a GPU, to that user Mufasa will appear to have only 4 CPUs and a single GPU.
 
A special case of direct access to Mufasa occurs when a user, from the login server, asks SLURM to execute an [[User_Jobs#Interactive_jobs|interactive job]]. Such a job, when it goes into execution, opens a shell to Mufasa. To the user, this corresponds to the fact that the shell they were using to interact with the login server changes into a shell opened ''directly on Mufasa''. This corresponds to the command prompt changing from
 
<pre style="color: lightgrey; background: black;">
<username>@mufasa2-login:~$
</pre>
 
to
 
<pre style="color: lightgrey; background: black;">
<username>@mufasa2:~$
</pre>
 
Another way to know if the current shell is the “base” shell or one run via SLURM is to execute command
 
<pre style="color: lightgrey; background: black;">
echo $SLURM_JOB_ID
</pre>
 
If no number gets printed, this means that the shell is the “base” one. If a number is printed, it is the SLURM job ID of the /bin/bash process.
 
To close the SSH session to Mufasa run


<pre style="color: silver; background: black;">
<pre style="color: lightgrey; background: black;">
exit
exit
</pre>
</pre>
Line 71: Line 154:


== VPN ==
== VPN ==
To be able to connect to Mufasa, your computer must belong to Polimi's LAN. This happens either because the computer is physically located at Politecnico di Milano and connected via ethernet, or because you are using Polimi's VPN to connect to its LAN from somewhere else (such as your home). In particular, using the VPN is the ''only'' way to use Mufasa from outside Polimi. See [https://intranet.deib.polimi.it/ita/vpn-wifi this DEIB webpage] for instructions about how to activate VPN access.


== Timeout ==
To be able to connect to Mufasa, your computer must belong to Polimi's LAN. This happens either because the computer is physically located at Politecnico di Milano and connected via ethernet, or because you are using Polimi's VPN (Virtual Private Network) to connect to its LAN from somewhere else (such as your home). In particular, using the VPN is the ''only'' way to use Mufasa from outside Polimi. See [https://intranet.deib.polimi.it/ita/vpn-wifi this DEIB webpage] for instructions about how to activate VPN access.


SSH sessions to Mufasa may be subjected to an inactivity timeout: i.e., after a given inactivity period the ssh session gets automatically closed. Users who need to be able to reconnect to the very same shell where they launched a program (for instance because their program is interactive or because it provides progress update messages) should use the ''screen'' command, as explained in [[User Jobs#Using screen with srun]].
== SSH timeout ==


== Using SSH with graphics ==
SSH sessions to Mufasa may be subjected to an inactivity timeout: i.e., after a given inactivity period the ssh session gets automatically closed. Users who need to be able to reconnect to the very same shell where they launched a program (for instance because their program is interactive or because it provides progress update messages) should [[User Jobs#Detaching from a running job with screen|use the ''screen'' command]].


The standard form of the ''ssh'' command, i.e. the one described above, should always be preferred. However, it only allows text communication with Mufasa. In special cases it may be necessary to remotely run (on Mufasa) Linux programs that have a graphical user interface. These programs require interaction with the X server of the remote user's machine (which must use Linux as well). A special mode of operation of ''ssh'' is needed to enable this. This mode is engaged by running
== SSH and graphics ==


<code> ssh -X <your username on Mufasa>@<Mufasa's IP address></code>
The standard form of the ''ssh'' command, i.e. the one described at the beginning of [[system#Accessing Mufasa|Accessing Mufasa]], should always be preferred. However, it only allows text communication with Mufasa. In special cases it may be necessary to remotely run (on Mufasa) Linux programs that have a graphical user interface. These programs require interaction with the X server of the remote user's machine (which must use Linux as well). A special mode of operation of ''ssh'' is needed to enable this. This mode is engaged by running command <code>ssh</code> like this:


= File transfer =
<pre style="color: lightgrey; background: black;">
ssh -X <your username on Mufasa>@<Mufasa's IP address>
</pre>


Uploading files from local machine to Mufasa and downloading files from Mufasa onto local machines is done using the ''sftp'' (''Secure File Transfer Protocol'') protocol.
== File transfer ==


For this, Linux and MacOS users can directly use the ''sftp'' package, as explained (for instance) in [https://geekflare.com/sftp-command-examples/ this guide]. In order to access Mufasa for file transfer, the first thing to do is to run the following command (note the similarity to SSH connections):
Uploading files from local machine to Mufasa and downloading files from Mufasa onto local machines is done using the ''SFTP'' protocol (''Secure File Transfer Protocol'').  


'''''s''''''ftp'''''' &lt;''''''your''''''_''''''username''''''_''''''on''''''_''''''Mufasa''''''&gt;@&lt;Mufasa's''''''_''''''IP''''''_''''''address''''''&gt;'''''
Linux and MacOS users can directly use the ''sftp'' package, as explained (for instance) by [https://geekflare.com/sftp-command-examples/ this guide]. Windows users can interact with Mufasa via SFTP protocol using the [https://mobaxterm.mobatek.net/ MobaXterm] software package. MacOS users can interact with Mufasa via SFTP also with the [https://cyberduck.io/ Cyberduck] software package.


You will be asked your password. Once you provide it, you access (via the terminal) an interactive sftp shell, where the command prompt takes the form
For Linux and MacOS user, file transfer to/from Mufasa occurs via an ''interactive sftp shell'', i.e. a remote shell very similar to the one one described [[#Logging into the login server|above]].
The first thing to do is to open a terminal and run the following command (note the similarity to SSH connections):


''sftp&gt;''
<pre style="color: lightgrey; background: black;">
sftp <username>@<IP_address>
</pre>


You can run the required ''sftp'' commands from this shell. Most of these commands have two forms: one to act on the remote machine (i.e. Mufasa) and one to act on the local machine (i.e. the user's computer). To differentiate, the “local” versions usually have names that start with the letter “l” (lowercase L).  
where <code>username</code> is the username on Mufasa of the user, and <code><IP_address></code> is [[#Logging into the login server|the IP address of Mufasa]].


MacOS users can interact with Mufasa via SFTP also using the [https://cyberduck.io/ Cyberduck] software package.
You will be asked your password. Once you provide it, you access an interactive sftp shell, where the command prompt takes the form


Windows users can interact with Mufasa via SFTP protocol using the [https://mobaxterm.mobatek.net/ MobaXterm] software package.
<pre style="color: lightgrey; background: black;">
sftp>
</pre>


The most basic ''sftp'' commands (to be issued from the sftp command prompt) are:
From this shell you can run the commands to exchange files. Most of these commands have two forms: one to act on the remote machine (in this case, Mufasa) and one to act on the local machine (i.e. your own computer). To differentiate, the “local” versions usually have names that start with the letter “l” (lowercase L).


'''''cd ''''''&lt;''''''path''''''&gt;'''''Change directory to &lt;path&gt; on remote machine (i.e. Mufasa)
<pre style="color: lightgrey; background: black;">
cd <path>
</pre>
to change directory to <code><path></code> on the remote machine.


'''''lcd ''''''&lt;''''''path''''''&gt;'''''''''''Change directory to &lt;path&gt; on local machine (i.e. user's machine)
<pre style="color: lightgrey; background: black;">
lcd <path>
</pre>
to change directory to <code><path></code> on the local machine.
 
<pre style="color: lightgrey; background: black;">
get <filename>
</pre>
to download (i.e. copy) <code><filename></code> from the current directory of the remote machine to the current directory of the local machine.
 
<pre style="color: lightgrey; background: black;">
put <filename>
</pre>
to upload (i.e. copy) <code><filename></code> from the current directory of the local machine to the current directory of the remote machine.
 
Naturally, a user can only upload files to directories where they have write permission (usually only their own /home directory and its subdirectories). Also, users can only download files from directories where they have read permission. (File permission on Mufasa follow the standard Linux rules.)
 
In addition to the terminal interface, users of Linux distributions based on Gnome (such as Ubuntu) can use a handy graphical tool to exchange files with Mufasa. In Gnome's Nautilus file manager, write


'''''get &lt;file&gt;'''''Downloads (i.e. copies) &lt;file&gt; from current directory of remote<br />
<code>sftp://<username>@<IP_address></code>
machine tocurrent directory of local machine


'''''put &lt;file&gt;'''''Uploads (i.e. copies) &lt;file&gt; from current directory of local machine to<br />
in the address bar of Nautilus, where <code>username</code> is your username on Mufasa and <code><IP_address></code> is [[#Logging into the login server|the IP address of Mufasa]]. Nautilus becomes a graphical interface to Mufasa's remote filesystem.
current directory of remote machine


'''''exit'''''Quit sftp
= Using Mufasa =


Of course, a user can only upload files to directories where they have write permission (usually only their own /home directory and its subdirectories), and can only download files that they have read permission.
This section provide a brief guide to Mufasa users (especially those who are not experienced in the use of Linux and/or remote servers) about interacting with the system.


== <span style="background:#FFFF00">Storage spaces</span> ==


User jobs require storage of programs and data files. On Mufasa, the space available to users for data storage is the <code>/home/</code> directory. <code>/home/</code> contains two types of directories:


= Docker containers =
:; Personal directories
:: ''Location and access''
::: Personal directories are in <code>/home/</code>
::: They are dedicated to individual users of Mufasa.
::: The home directory of user <code>UserName</code> is <code>/home/UserName/</code>
:: ''Usage''
::: The home directory of a user is their own personal space on Mufasa. Space is limited (see [[#Disk quotas|Disk quotas]]), so you'll need to do some "housekeeping" to avoid filling it up.
::: The general rule is: ''keep in your home directory '''only''' the files that the work you are doing on Mufasa '''right now''' needs; remove a file as soon as it is not needed anymore for your current work''.
::: ''Mufasa is not a storage space!''


'''As a general rule, all computation performed on Mufasa must occur within '''[https://www.docker.com/ '''Docker containers''']. This allows every user to configure their own execution environment without any risk of interfering with everyone else's.
:; Shared directories
:: ''Location and access''
::: Shared directories are in <code>/home/shared/</code>
::: They are dedicated to research groups, and each group decides internally how to manage the group's directory.
::: The shared directory of research group <code>GroupName</code> is <code>/home/shared/GroupName/</code>
::: Users who belong to the research group can read from and write to the directory.
::: Directory <code>/home/shared/common/</code> is available to all research groups
::: Any user can read from and write to the directory.
:: ''Usage''
::: Shared directories are used:
::: -  To '''share data'''. If multiple users are using the same data, it makes sense to put the data in a shared directory instead of having multiple copies of them in each user's home directory.
::: -  For '''faster read/write'''. Shared spaces are physically located on faster disks wrt the personal home directories (SSDs instead of mechanical HDDs). When a processing job requires reading or writing very large amounts of data, placing such data in a shared directory can significantly speed up the job.
::: '''Important!''' Shared directories are used by several people, so it's important to ''quickly remove from them any file that is not actively in use''.


From [https://docs.docker.com/get-started/ Docker's documentation]:
== <span style="background:#FFFF00">Disk quotas</span> ==


<blockquote>“''Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure.''
In Mufasa, [[#Storage spaces|Storage spaces]] are subjected to quotas: i.e., the files that are stored in them cannot occupy more than a given amount of disk space. Quotas apply both to personal directories (e.g., <code>/home/userX/</code>) and to shared directories (e.g., <code>/home/shared/ResearchGroupY/</code>).
</blockquote>
 
<blockquote>Docker provides the ability to package and run an application in a loosely isolated environment called a container. The isolation and security allow you to run many containers simultaneously on a given host. Containers are lightweight and contain everything needed to run the application, so you do not need to rely on what is currently installed on the host.
The quotas assigned to user <code>userX</code> and the amount of it that you are currently using can be inspected with command
</blockquote>
 
<blockquote>''A container is a sandboxed process on your machine that is isolated from all other processes on the host machine. When running a container, it uses an isolated filesystem. [containing] everything needed to run an application - all dependencies, configuration, scripts, binaries, etc. The image also contains other configuration for the container, such as environment variables, a default command to run, and other metadata.''
<pre style="color: lightgrey; background: black;">
</blockquote>
df -h /home/userX
Using Docker allows each user of Mufasa to build the software environment that their job(s) require. In particular, using Docker containers enables users to configure their own (containerized) system and install any required libraries on their own, without need to ask administrators to modify the configuration of Mufasa. As a consequence, users can freely experiment with their (containerized) system without risk to the work of other users and to the stability and reliability of Mufasa. In particular, containers allow users to run jobs that require multiple and/or obsolete versions of the same library.
</pre>
 
When <code>df</code> is run from the  is [[#Logging into the login server|login server]], its output is similar to the following:
 
<pre style="color: lightgrey; background: black;">
gfontana@mufasa2-login:~$ df -h /home/gfontana/
Filesystem        Size  Used Avail Use% Mounted on
192.168.1.1:/home  200G  161G  40G  81% /home
</pre>
 
Option <code>-h</code> provides ''human-readable'' values using measurement units such as K (KBytes), M (MBytes), G (GBytes)).
 
The data provided by <code>df</code> is the following:
 
:; Column "Filesystem"
:: the filesystem for which quota information is provided. This includes an (inconsequential) IP address because of the way the Virtual machine acting as [[#Logging into the login server|login server]] is connected to the physical machine  
 
:; Column "Size"
:: the disk quota assigned to the user
 
:; Column "Used"
:: the overall size of the files currently in the directory
 
:; Column "Avail"
:: how much space is still available in the directory before hitting the soft quota
 
:; Column "Use%"
:: percentage of the soft quota used up by the files currently in the directory
 
:; Column "Mounted on"
:: location of the directory in the filesystem of Mufasa
 
Quotas assigned by the quota system of Mufasa 2.0 are '''hard quotas'''. This means that the limit cannot be exceeded. When a user reaches their hard limit, they cannot use any more disk space: for them, the filesystem behaves as if the disks are out of space. Disk writes will fail, temporary files will fail to be created, and the user will start to see warnings and errors while performing common tasks. The only disk operation allowed is file deletion.
 
== Finding out how much disk space is used by a directory ==
 
If your user has reading permission to directory <code>/path/to/dir/</code> you can find out how much disk space is used by the directory with command <code>du</code> like this:
 
<pre style="color: lightgrey; background: black;">
du -sh /path/to/dir/
</pre>
 
The <code>-sh</code> flag is used to ask for options <code>-s</code> (which provides the overall size of the directory) and <code>-h</code> (which provides ''human-readable'' values using measurement units such as K (KBytes), M (MBytes), G (GBytes)).
 
In particular, you can find out how much disk space is used by your home directory with command
 
<pre style="color: lightgrey; background: black;">
du -sh ~
</pre>
 
In fact, in Linux the symbol <code>~</code> is shorthand for the path to the user's home directory.
 
If you want a detailed summary of how much disk space is used by each item (i.e., subdirectory or file) in a directory you own, use command
 
<pre style="color: lightgrey; background: black;">
du -h /path/to/dir/
</pre>
 
== Hidden files and directories ==
 
In Linux, directories and files with a leading "." in their name are ''hidden''. Usually these do not appear in listings, such as the output of the <code>ls</code> command, to avoid cluttering them up: however, they still occupy disk space.
 
The output of command <code>du</code>, however, also considers hidden elements and provides their size: therefore it can help you understand why the quota system says that you are using more disk space than reported by <code>ls</code>.
 
To get a list of all the files in a directory, including hidden ones, use command
 
<pre style="color: lightgrey; background: black;">
ls -a
</pre>
 
== Changing file/directory ownership and permissions ==
 
Every file or directory in a Linux system is owned by both a user and a group. User and group ownerships are not connected, so a file can have as group owner a group that its user owner does not belong to.
 
Being able to manipulate who owns a file and what permissions any user has on that file is often important in a multi-user system such as Mufasa. This is a recapitulation of the main Linux commands to manipulate file permissions. Key commands are
 
:'''<code>chown</code>''' to change user ownership
:'''<code>chgrp</code>''' to change group ownership
:'''<code>chmod</code>''' to change access permissions
 
Of course, they can only be employed by a user who has writing permission on the file or directory to be modified.
 
All three commands above accept option <code>-R</code> (uppercase) for recursive operation, so -if needed- you can change ownership and/or permissions of all contents of a directory and its subdirectories with a single command.
 
The syntax of <code>chown</code> commands is
 
<pre style="color: lightgrey; background: black;">
chown <new_user_owner> <path/to/file>
</pre>
 
where <code><new_user_owner></code> is the user part of the new file ownership.
 
The syntax of <code>chgrp</code> commands is
 
<pre style="color: lightgrey; background: black;">
chgrp <new_group_owner> <path/to/file>
</pre>
 
where <code><new_group_owner></code> is the group part of the new file ownership.
 
User and group ownership for a file can also be both changed at the same time with
 
<pre style="color: lightgrey; background: black;">
chown <new_user_owner>:<new_group_owner> <path/to/file>
</pre>
 
For what concerns <code>chmod</code>, the easiest way to use it makes use of symbolic descriptions of the permissions. The format for this is
 
<pre style="color: lightgrey; background: black;">
chmod [users]<+|-><permissions> <path/to/file>
</pre>
 
where
 
:<code><path/to/file></code> is the file or directory that the change is applied to
 
:<code>[users]</code> is '''<code>ugo</code>''' or a subset of it; the three letters correspond respectively:
:::to the '''u'''ser who owns <code><path/to/file></code>
:::to the '''g'''roup that owns <code><path/to/file></code>
:::to everyone else ('''o'''thers)
:::If <code>[users]</code> is not specified, it is assumed to be '''<code>u</code>'''
:'''<code>+</code>''' or '''<code>-</code>''' correspond to adding or removing permissions
:<code><permissions></code> is '''<code>rwx</code>''' or a subset, corresponding to '''r'''ead, '''w'''rite and e'''x'''ecute permissions
 
Note that <code>r</code>, <code>w</code> and <code>x</code> permission have a different meaning for files and for directories.
 
;For files:
: permission '''<code>r</code>''' allows to read the contents of the file
: permission '''<code>w</code>''' allows to change the contents of the file
: permission '''<code>x</code>''' allows to execute the file (provided that it is a program: e.g., a shell script)


A large number of preconfigured Docker containers are already available, so users do not usually need to start from scratch in preparing the environment where their jobs will run on Mufasa. The official Docker container repository is [https://hub.docker.com/search?q=&type=image dockerhub].
;For directories:
: permission '''<code>r</code>''' allows to list the files within the directory
: permission '''<code>w</code>''' allows to create, rename, or delete files within the directory
: permission '''<code>x</code>''' allows to enter the directory (i.e., <code>cd</code> into it) and access its files


How to run Docker containers on Mufasa will be explained in Part 2 of this document.
For instance, if the owner of file <code>myfile.txt</code> runs


<pre style="color: lightgrey; background: black;">
chmod g+rwx myfile.txt
</pre>


they are granting permission to read, write and execute <code>myfile.txt</code> to all the Linux users belonging to the same group of the user.


== <span id="anchor-6"></span>The SLURM job scheduling system ==
If the owner of directory <code>mydir</code> runs


Mufasa uses [https://slurm.schedmd.com/overview.html SLURM] to manage shared access to its resources. '''Users of Mufasa must use SLURM to run and manage the jobs they run on the machine'''<ref>It is possible for users to run jobs without using SLURM; however, running jobs run this way is only intended for “housekeeping” activities and only provides access to a small subset of Mufasa's resources. For instance, jobs run outside SLURM cannot access the GPUs, can only use a few processor cores, can only access a small portion of RAM. Using SLURM is therefore necessary for any resource-intensive job.
<pre style="color: lightgrey; background: black;">
</ref>. From [https://slurm.schedmd.com/documentation.html SLURM's documentation]:
chmod go-x mydir
</pre>


<blockquote>“''Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.''”
they are taking away permission to enter directory <code>mydir</code> from everyone except the user who owns the directory.
</blockquote>
The use of a job scheduling system ensures that Mufasa's resources are exploited in an efficient way. However, the fact that a schedule exists means that usually a job does not get immediately executed as soon as it is launched: instead, the job gets ''queued'' and will be executed as soon as possible, according to the availability of resources in the machine.


Useful references for SLURM users are the [https://slurm.schedmd.com/man_index.html collected man pages] and the [https://slurm.schedmd.com/pdfs/summary.pdf command overview].
If you want additional information about file and directory permissions in a Linux system work, [https://www.redhat.com/sysadmin/linux-file-permissions-explained this is a good online guide].


In order to let SLURM schedule job execution, before launching a job a user must specify what resources (such as RAM, processor cores, GPUs, ...) it requires. While managing process queues, SLURM will consider such requirements and match them with the available resources. As a consequence, resource-heavy jobs generally take longer to get executed, while less demanding jobs are usually put into execution quickly. On the other hand, processes that -while running- try to use more resources than they requested get killed by SLURM to avoid damaging other jobs.
= <span style="background:#ffff00">Containers</span> =


All in all, the take-away message is: ''consider carefully how much resources to ask for your job''.
[[File:Singularity.png|right|262px]]
'''As a general rule, all computation performed on Mufasa must occur within [https://en.wikipedia.org/wiki/Containerization_(computing) containers]'''. Below is the definition of a container according to Docker (the most diffuse container platform):


In Part 2 of this document it will be explained how resource requests can be greatly simplified by making use of predefined resource sets called ''SLURM partitions''.
<blockquote>
''A container is a sandboxed process on your machine that is isolated from all other processes on the host machine. When running a container, it uses an isolated filesystem. [containing] everything needed to run an application - all dependencies, configuration, scripts, binaries, etc. The image also contains other configuration for the container, such as environment variables, a default command to run, and other metadata.''
</blockquote>


The container system used by [[#Mufasa 2.0|Mufasa 2.0]] is '''Singularity''', which is especially suitable for High Performance Computing environments. Singularity containers are files or -if executed in ''sandbox'' modality- directories. For details and instructions about Singularity, see [[Singularity|the section of this manual dedicated to it]].


Singularity provides a comprehensive [https://docs.sylabs.io/guides/3.0/user-guide/index.html# user guide] to its features. For basic usage, this wiki should contain all the information needed by users of Mufasa.


= Users and groups =
Using containers allows each user of Mufasa to build the software environment that their job(s) require. In particular, using containers enables users to configure their own (containerized) system and install any required libraries on their own, without need to ask administrators to modify the configuration of Mufasa. As a consequence, users can freely experiment with their (containerized) system without risk to the work of other users and to the stability and reliability of Mufasa. In particular, containers allow users to run jobs that require multiple and/or obsolete versions of the same library.
A strong advantage of Singularity is that it allows users to be ''root'' (i.e., to have full administrator privileges) on the software environment internal to a container even if they only have the status of a normal user outside the container. This allows each user to have complete freedom in configuring their own containers. The tool to enable root privileges inside containers is called [https://docs.sylabs.io/guides/4.3/user-guide/fakeroot.html Fakeroot].


As already explained, only Mufasa users can access the machine and interact with it. Creation of new users is done by Job Administrators or by specially designated users within each research group.
How to run Singularity containers on Mufasa is explained in [[User Jobs|User Jobs]]. How to create and use them is explained in the [[Singularity|Singularity section]] of this manual.


Mufasa usernames have the form '''''xyyy''''' (all lowercase) where '''''x''''' is the first letter of the first name and '''''yyy''''' is the complete surname. For instance, user Mario Rossi will be assigned user name ''mrossi''. If multiple users with the same surname and first letter of the name exist, those created after the first are given usernames ''xyyy01'', ''xyyy02'', and so on.
== Singularity and Docker ==


On Linux machines such as Mufasa, users belong to ''groups''. On Mufasa, groups are used to identify the research group that a specific user is part of. Assigment of Mufasa's users to groups follow these rules:
From Singularity's [https://docs.sylabs.io/guides/2.6/user-guide/singularity_and_docker.html documentation]:


* All users belong to group '''''users'''''.
<blockquote>
* Additionally, each user must belong to ''one and only one'' of the following (within brackets is the name of the faculty who is in charge of Mufasa for each group):
"''Singularity can be used with Docker images. This feature was included because developers use and really like using Docker and scientists have already put much resources into creating Docker images. Thus, one of our early goals was to support Docker. What can you do?''
** '''''nearmrs''''', i.e. [https://nearlab.polimi.it/medical/ Medical Robotics Section of NearLab] (prof. De Momi);
* ''You don’t need Docker''  
** '''''nearnes''''', i.e. [https://nearlab.polimi.it/neuroengineering/ NeuroEngineering Section of NearLab] (prof. Ferrante);
* ''You can shell into a Singularity-ized Docker image''
** '''''cartcas''''', i.e. [http://www.cartcas.polimi.it/ CartCasLab] (prof. Cerveri);
* ''You can run a Docker image instantly as a Singularity image''
** '''''biomech''''', i.e. [http://www.biomech.polimi.it/ Biomechanics Research Group] (prof. Votta);
* ''You can pull a Docker image (without sudo)''
** '''''bio''''', for BioEngineering users not belonging to the research groups listed above.
* ''You can build images with bases from assembled Docker layers that include environment, guts, and labels''"
</blockquote>


Users who are not Job Administrators but have been given the power to create users can do so with command
= The SLURM job scheduling system =
 
[[File:262px-Slurm logo.png|right|262px]]
Mufasa uses [https://slurm.schedmd.com/overview.html SLURM] (''Slurm Workload Manager'', formerly known as ''Simple Linux Utility for Resource Management'') to manage shared access to its resources. From [https://slurm.schedmd.com/documentation.html SLURM's documentation]:
<blockquote>“''Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.''”
</blockquote>


''sudo /opt/share/sbin/add_user.sh -u &lt;user&gt; -g users,&lt;group&gt;''
This wiki includes a [[SLURM|section dedicated to SLURM]]. It explains how SLURM works, focusing on how it is configured on Mufasa.


where ''&lt;user&gt;'' is the username of the new user and ''&lt;group&gt;'' is one of the 6 groups from the list above.
'''Users of Mufasa must use SLURM to run any resource-heavy process'''. A ''resource-heavy process'' is any computing job that requires one or more of the following:
* GPUs
* multiple CPUs
* powerful CPUs
* a significant amount of RAM.


For instance, in order to create a user on Mufasa for a person named Mario Rossi belonging to the NeuroEngineering Section of NearLab, the following command will be used:
Jobs run via SLURM have access to all the resources of Mufasa. Jobs run outside SLURM are executed by the [[#Login server|login server]] virtual machine, which has minimal resources and no GPUs. Using SLURM is therefore the only way to execute resource-heavy jobs on Mufasa. This is a key difference between Mufasa 1.0 and [[#Mufasa 2.0|Mufasa 2.0]].


''sudo /opt/share/sbin/add_user.sh -u mrossi -g users,nearnes''
'''IMPORTANT!''' The priority (and thus the wait before execution) of any job run via SLURM depends on its '''priority'''. [[SLURM#How_to_maximise_the_priority_of_your_jobs|Learn how to maximise the priority of your jobs]].


New users are created with a predefined password, that they will be asked to change at their first login. For security reason, it is important that such first login occurs as soon as possible.
SLURM is capable of managing complex computing systems composed of multiple ''clusters'' (i.e. sets) of servers, each comprising one ''node'' (i.e. machine) or more. The case of Mufasa is the simplest of all: Mufasa is in fact the one node of a SLURM computing cluster composed of a single machine.

Latest revision as of 09:28, 27 November 2025

Mufasa is a Linux server located in a server room managed by the System Administrators.

Job Users and Job Administrators can only access Mufasa remotely.

Remote access to Mufasa is performed using the SSH protocol for the execution of commands and the SFTP protocol for the exchange of files. Once logged in, a user interacts with Mufasa via a terminal (text-based) interface.

Mufasa 2.0

At the beginning of November 2025 Mufasa has been subjected to a comprehensive hardware and software overhaul: the new system is sometimes called Mufasa 2.0; to distinguish it from the old system, the latter is sometimes called "Mufasa 1.0".

This wiki is currently (November 2025) being updated to reflect the changes. Elements that changed significantly from Mufasa 1.0 to Mufasa 2.0 are highlighted in yellow. When the title of a section is highlighted, it means either that it is a new section, or that the section's contents significantly changed.

Hardware

Hw.png

Mufasa is a server for massively parallel computation. It has been set up and configured by E4 Computer Engineering with the support of the Biomechanics Group, the CartCasLab laboratory and the NearLab laboratory.

Mufasa's main hardware components are:

  • Supermicro A+ Server 4124GS-TNR
  • 2 AMD Epyc 7542 32-core processors (64 CPU cores total)
  • 1 TB RAM
  • 7 TB of SSDs (fast temporary repository for datasets actively being used - see Storage spaces)
  • 25 TB of HDDs (for user /home directories)
  • 8 Nvidia A100 GPUs [based on the Ampere architecture]
  • Ubuntu Linux 24.04 LTS server operating system

System resources are shared among different users and processes in order to optimise their usage and availability. This sharing is managed by SLURM.

CPUs and GPUs

Mufasa is fitted with two 32-core CPU, so the system has a total of 64 phyical CPUs. Of the 64 CPUs, most are reserved for the SLURM job scheduling system and can only be accessed via SLURM; the remaining few are used by the login server).

For what concerns GPUs, some of the physical A100 GPUs have been subdivided into “virtual” GPUs with different capabilities using Nvidia's MIG system. Precisely, 5 of the A100 GPUs have been subdivided into two GPUs, each possessing half the RAM of the original device (i.e., 20 GB). Since the A100 has 7 compute units onboard, one of the two virtual GPUs built out of a single A100 has 3 compute units, while the other has 4 compute units.

All in all, the GPU complement of Mufasa comprises the following devices:

5 GPUs with 20GB of RAM and 3 compute units
5 GPUs with 20GB of RAM and 4 compute units
3 GPUs with 40 GB of RAM

Thanks to MIG, users can use all the GPUs listed above as if they were all physical devices installed on Mufasa, without having to worry (or even know) which actually are and which instead are virtual GPUs. How these devices are made available to Mufasa users is explained in SLURM.

You can use command

nvidia-smi -L

to get in-depth information about the physical and virtual GPUs available to users in a system based on MIG. (On Mufasa, this command needs to be launched in a bash shell opened through SLURM in order to be able to access the GPUs.)

Accessing Mufasa

Login server

Differently from Mufasa 1.0, Mufasa 2.0 employs a login server to manage user connections. The login server is a Linux virtual machine with very limited resources (no GPUs, few CPUs, small RAM). Its task is only to provide users with a way to log into the system and launch User Jobs with SLURM. Jobs launched via SLURM run on Mufasa 2.0's physical hardware (not on the virtual hardware of the login server) and therefore can access to the hardware resources of Mufasa, such as the GPUs.

When you access Mufasa via SSH, the remote shell you are provided with is a shell to the login server, unable to perform computationally heavy tasks: for heavy tasks you have to launch a SLURM job. The only tasks you can execute directly from the login server shell are simple "housekeeping" tasks on your home directory, such as deleting files you do not need anymore.

Please note that if you try to run computationally heavy processes in the login server you can easily overwhelm its scarce resources, making it unavailable to all users and thus making Mufasa unreachable by anyone. The login server has safety mechanisms to prevent processes from hogging too much of its resources... by killing such processes. However, please avoid checking if these mechanisms work.

Logging into the login server

User access to Mufasa is always remote and exploits the SSH (Secure SHell) protocol.

Access to Mufasa is not direct: instead, it is managed by a login server. Once the user has logged into the login server, they can issue commands to SLURM to run processing jobs.

To open a remote connection to the login server, open a local terminal on your computer and, in it, run command

ssh <username>@10.79.23.96

For example, user mrossi may access Mufasa with command

ssh mrossi@10.79.23.96

As soon as you launch the ssh command, you will be asked to type the password (i.e., the one of your user account on Mufasa). Once you provide the password, the local terminal on your computer becomes a remote terminal (a “remote shell”) through which you interact with the login server. The remote shell sports a command prompt such as

<username>@mufasa2-login:~$

(mufasa2-login is the Linux hostname of the login server). For instance, user mrossi will see a prompt similar to this:

mrossi@mufasa2-login:~$

Access via SSH works with Linux, MacOs and Windows 10 (and later) terminals. For other Windows users, a handy alternative tool (also including an X server, required to run on Mufasa Linux programs with a graphical user interface) is MobaXterm.

If you don't have a user account on Mufasa, you first have to ask your supervisor for one. See Users for more information.

In the remote shell to the login server opened via SSH, you can issue commands by typing them after the prompt, then pressing the enter key. Being Mufasa a Linux server, it will respond to all the standard Linux system commands such as pwd (which prints the path to the current directory) or cd <destination_dir> (which changes the current directory). On the internet you can find many tutorials about the Linux command line, such as this one.

To close the SSH session run

exit

from the command prompt of the remote shell.

Direct access to Mufasa (for users with running SLURM jobs)

While a user has SLURM jobs in execution, such user may also log into Mufasa directly, without passing through the login server. This allows the user to interact with the running jobs, e.g. to monitor their progress. Direct access to Mufasa is done via SSH with command

ssh <username>@10.79.23.97

where both username and password are the same used for the login server. For example, user mrossi may access Mufasa with command

ssh mrossi@10.79.23.97

As soon as you launch the ssh command, you will be asked to type the password (i.e., the one of your user account on Mufasa). Once you provide the password, the local terminal on your computer becomes a remote terminal (a “remote shell”) through which you interact with the Mufasa. The remote shell sports a command prompt such as

<username>@mufasa2:~$

(mufasa2 is the Linux hostname of Mufasa). For instance, user mrossi will see a prompt similar to this:

mrossi@mufasa2:~$

To a user logged into Mufasa this way, Mufasa appears to possess only the resources that SLURM has allocated to the running job(s) of that user. For instance, if the user has a running job that requested 4 CPUs and a GPU, to that user Mufasa will appear to have only 4 CPUs and a single GPU.

A special case of direct access to Mufasa occurs when a user, from the login server, asks SLURM to execute an interactive job. Such a job, when it goes into execution, opens a shell to Mufasa. To the user, this corresponds to the fact that the shell they were using to interact with the login server changes into a shell opened directly on Mufasa. This corresponds to the command prompt changing from

<username>@mufasa2-login:~$

to

<username>@mufasa2:~$

Another way to know if the current shell is the “base” shell or one run via SLURM is to execute command

echo $SLURM_JOB_ID

If no number gets printed, this means that the shell is the “base” one. If a number is printed, it is the SLURM job ID of the /bin/bash process.

To close the SSH session to Mufasa run

exit

from the command prompt of the remote shell.

VPN

To be able to connect to Mufasa, your computer must belong to Polimi's LAN. This happens either because the computer is physically located at Politecnico di Milano and connected via ethernet, or because you are using Polimi's VPN (Virtual Private Network) to connect to its LAN from somewhere else (such as your home). In particular, using the VPN is the only way to use Mufasa from outside Polimi. See this DEIB webpage for instructions about how to activate VPN access.

SSH timeout

SSH sessions to Mufasa may be subjected to an inactivity timeout: i.e., after a given inactivity period the ssh session gets automatically closed. Users who need to be able to reconnect to the very same shell where they launched a program (for instance because their program is interactive or because it provides progress update messages) should use the screen command.

SSH and graphics

The standard form of the ssh command, i.e. the one described at the beginning of Accessing Mufasa, should always be preferred. However, it only allows text communication with Mufasa. In special cases it may be necessary to remotely run (on Mufasa) Linux programs that have a graphical user interface. These programs require interaction with the X server of the remote user's machine (which must use Linux as well). A special mode of operation of ssh is needed to enable this. This mode is engaged by running command ssh like this:

ssh -X <your username on Mufasa>@<Mufasa's IP address>

File transfer

Uploading files from local machine to Mufasa and downloading files from Mufasa onto local machines is done using the SFTP protocol (Secure File Transfer Protocol).

Linux and MacOS users can directly use the sftp package, as explained (for instance) by this guide. Windows users can interact with Mufasa via SFTP protocol using the MobaXterm software package. MacOS users can interact with Mufasa via SFTP also with the Cyberduck software package.

For Linux and MacOS user, file transfer to/from Mufasa occurs via an interactive sftp shell, i.e. a remote shell very similar to the one one described above. The first thing to do is to open a terminal and run the following command (note the similarity to SSH connections):

sftp <username>@<IP_address>

where username is the username on Mufasa of the user, and <IP_address> is the IP address of Mufasa.

You will be asked your password. Once you provide it, you access an interactive sftp shell, where the command prompt takes the form

sftp>

From this shell you can run the commands to exchange files. Most of these commands have two forms: one to act on the remote machine (in this case, Mufasa) and one to act on the local machine (i.e. your own computer). To differentiate, the “local” versions usually have names that start with the letter “l” (lowercase L).

cd <path>

to change directory to <path> on the remote machine.

lcd <path>

to change directory to <path> on the local machine.

get <filename>

to download (i.e. copy) <filename> from the current directory of the remote machine to the current directory of the local machine.

put <filename>

to upload (i.e. copy) <filename> from the current directory of the local machine to the current directory of the remote machine.

Naturally, a user can only upload files to directories where they have write permission (usually only their own /home directory and its subdirectories). Also, users can only download files from directories where they have read permission. (File permission on Mufasa follow the standard Linux rules.)

In addition to the terminal interface, users of Linux distributions based on Gnome (such as Ubuntu) can use a handy graphical tool to exchange files with Mufasa. In Gnome's Nautilus file manager, write

sftp://<username>@<IP_address>

in the address bar of Nautilus, where username is your username on Mufasa and <IP_address> is the IP address of Mufasa. Nautilus becomes a graphical interface to Mufasa's remote filesystem.

Using Mufasa

This section provide a brief guide to Mufasa users (especially those who are not experienced in the use of Linux and/or remote servers) about interacting with the system.

Storage spaces

User jobs require storage of programs and data files. On Mufasa, the space available to users for data storage is the /home/ directory. /home/ contains two types of directories:

Personal directories
Location and access
Personal directories are in /home/
They are dedicated to individual users of Mufasa.
The home directory of user UserName is /home/UserName/
Usage
The home directory of a user is their own personal space on Mufasa. Space is limited (see Disk quotas), so you'll need to do some "housekeeping" to avoid filling it up.
The general rule is: keep in your home directory only the files that the work you are doing on Mufasa right now needs; remove a file as soon as it is not needed anymore for your current work.
Mufasa is not a storage space!
Shared directories
Location and access
Shared directories are in /home/shared/
They are dedicated to research groups, and each group decides internally how to manage the group's directory.
The shared directory of research group GroupName is /home/shared/GroupName/
Users who belong to the research group can read from and write to the directory.
Directory /home/shared/common/ is available to all research groups
Any user can read from and write to the directory.
Usage
Shared directories are used:
- To share data. If multiple users are using the same data, it makes sense to put the data in a shared directory instead of having multiple copies of them in each user's home directory.
- For faster read/write. Shared spaces are physically located on faster disks wrt the personal home directories (SSDs instead of mechanical HDDs). When a processing job requires reading or writing very large amounts of data, placing such data in a shared directory can significantly speed up the job.
Important! Shared directories are used by several people, so it's important to quickly remove from them any file that is not actively in use.

Disk quotas

In Mufasa, Storage spaces are subjected to quotas: i.e., the files that are stored in them cannot occupy more than a given amount of disk space. Quotas apply both to personal directories (e.g., /home/userX/) and to shared directories (e.g., /home/shared/ResearchGroupY/).

The quotas assigned to user userX and the amount of it that you are currently using can be inspected with command

df -h /home/userX

When df is run from the is login server, its output is similar to the following:

gfontana@mufasa2-login:~$ df -h /home/gfontana/
Filesystem         Size  Used Avail Use% Mounted on
192.168.1.1:/home  200G  161G   40G  81% /home

Option -h provides human-readable values using measurement units such as K (KBytes), M (MBytes), G (GBytes)).

The data provided by df is the following:

Column "Filesystem"
the filesystem for which quota information is provided. This includes an (inconsequential) IP address because of the way the Virtual machine acting as login server is connected to the physical machine
Column "Size"
the disk quota assigned to the user
Column "Used"
the overall size of the files currently in the directory
Column "Avail"
how much space is still available in the directory before hitting the soft quota
Column "Use%"
percentage of the soft quota used up by the files currently in the directory
Column "Mounted on"
location of the directory in the filesystem of Mufasa

Quotas assigned by the quota system of Mufasa 2.0 are hard quotas. This means that the limit cannot be exceeded. When a user reaches their hard limit, they cannot use any more disk space: for them, the filesystem behaves as if the disks are out of space. Disk writes will fail, temporary files will fail to be created, and the user will start to see warnings and errors while performing common tasks. The only disk operation allowed is file deletion.

Finding out how much disk space is used by a directory

If your user has reading permission to directory /path/to/dir/ you can find out how much disk space is used by the directory with command du like this:

du -sh /path/to/dir/

The -sh flag is used to ask for options -s (which provides the overall size of the directory) and -h (which provides human-readable values using measurement units such as K (KBytes), M (MBytes), G (GBytes)).

In particular, you can find out how much disk space is used by your home directory with command

du -sh ~

In fact, in Linux the symbol ~ is shorthand for the path to the user's home directory.

If you want a detailed summary of how much disk space is used by each item (i.e., subdirectory or file) in a directory you own, use command

du -h /path/to/dir/

Hidden files and directories

In Linux, directories and files with a leading "." in their name are hidden. Usually these do not appear in listings, such as the output of the ls command, to avoid cluttering them up: however, they still occupy disk space.

The output of command du, however, also considers hidden elements and provides their size: therefore it can help you understand why the quota system says that you are using more disk space than reported by ls.

To get a list of all the files in a directory, including hidden ones, use command

ls -a

Changing file/directory ownership and permissions

Every file or directory in a Linux system is owned by both a user and a group. User and group ownerships are not connected, so a file can have as group owner a group that its user owner does not belong to.

Being able to manipulate who owns a file and what permissions any user has on that file is often important in a multi-user system such as Mufasa. This is a recapitulation of the main Linux commands to manipulate file permissions. Key commands are

chown to change user ownership
chgrp to change group ownership
chmod to change access permissions

Of course, they can only be employed by a user who has writing permission on the file or directory to be modified.

All three commands above accept option -R (uppercase) for recursive operation, so -if needed- you can change ownership and/or permissions of all contents of a directory and its subdirectories with a single command.

The syntax of chown commands is

chown <new_user_owner> <path/to/file>

where <new_user_owner> is the user part of the new file ownership.

The syntax of chgrp commands is

chgrp <new_group_owner> <path/to/file>

where <new_group_owner> is the group part of the new file ownership.

User and group ownership for a file can also be both changed at the same time with

chown <new_user_owner>:<new_group_owner> <path/to/file>

For what concerns chmod, the easiest way to use it makes use of symbolic descriptions of the permissions. The format for this is

chmod [users]<+|-><permissions> <path/to/file>

where

<path/to/file> is the file or directory that the change is applied to
[users] is ugo or a subset of it; the three letters correspond respectively:
to the user who owns <path/to/file>
to the group that owns <path/to/file>
to everyone else (others)
If [users] is not specified, it is assumed to be u
+ or - correspond to adding or removing permissions
<permissions> is rwx or a subset, corresponding to read, write and execute permissions

Note that r, w and x permission have a different meaning for files and for directories.

For files
permission r allows to read the contents of the file
permission w allows to change the contents of the file
permission x allows to execute the file (provided that it is a program: e.g., a shell script)
For directories
permission r allows to list the files within the directory
permission w allows to create, rename, or delete files within the directory
permission x allows to enter the directory (i.e., cd into it) and access its files

For instance, if the owner of file myfile.txt runs

chmod g+rwx myfile.txt

they are granting permission to read, write and execute myfile.txt to all the Linux users belonging to the same group of the user.

If the owner of directory mydir runs

chmod go-x mydir

they are taking away permission to enter directory mydir from everyone except the user who owns the directory.

If you want additional information about file and directory permissions in a Linux system work, this is a good online guide.

Containers

Singularity.png

As a general rule, all computation performed on Mufasa must occur within containers. Below is the definition of a container according to Docker (the most diffuse container platform):

A container is a sandboxed process on your machine that is isolated from all other processes on the host machine. When running a container, it uses an isolated filesystem. [containing] everything needed to run an application - all dependencies, configuration, scripts, binaries, etc. The image also contains other configuration for the container, such as environment variables, a default command to run, and other metadata.

The container system used by Mufasa 2.0 is Singularity, which is especially suitable for High Performance Computing environments. Singularity containers are files or -if executed in sandbox modality- directories. For details and instructions about Singularity, see the section of this manual dedicated to it.

Singularity provides a comprehensive user guide to its features. For basic usage, this wiki should contain all the information needed by users of Mufasa.

Using containers allows each user of Mufasa to build the software environment that their job(s) require. In particular, using containers enables users to configure their own (containerized) system and install any required libraries on their own, without need to ask administrators to modify the configuration of Mufasa. As a consequence, users can freely experiment with their (containerized) system without risk to the work of other users and to the stability and reliability of Mufasa. In particular, containers allow users to run jobs that require multiple and/or obsolete versions of the same library. A strong advantage of Singularity is that it allows users to be root (i.e., to have full administrator privileges) on the software environment internal to a container even if they only have the status of a normal user outside the container. This allows each user to have complete freedom in configuring their own containers. The tool to enable root privileges inside containers is called Fakeroot.

How to run Singularity containers on Mufasa is explained in User Jobs. How to create and use them is explained in the Singularity section of this manual.

Singularity and Docker

From Singularity's documentation:

"Singularity can be used with Docker images. This feature was included because developers use and really like using Docker and scientists have already put much resources into creating Docker images. Thus, one of our early goals was to support Docker. What can you do?

  • You don’t need Docker
  • You can shell into a Singularity-ized Docker image
  • You can run a Docker image instantly as a Singularity image
  • You can pull a Docker image (without sudo)
  • You can build images with bases from assembled Docker layers that include environment, guts, and labels"

The SLURM job scheduling system

262px-Slurm logo.png

Mufasa uses SLURM (Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management) to manage shared access to its resources. From SLURM's documentation:

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

This wiki includes a section dedicated to SLURM. It explains how SLURM works, focusing on how it is configured on Mufasa.

Users of Mufasa must use SLURM to run any resource-heavy process. A resource-heavy process is any computing job that requires one or more of the following:

  • GPUs
  • multiple CPUs
  • powerful CPUs
  • a significant amount of RAM.

Jobs run via SLURM have access to all the resources of Mufasa. Jobs run outside SLURM are executed by the login server virtual machine, which has minimal resources and no GPUs. Using SLURM is therefore the only way to execute resource-heavy jobs on Mufasa. This is a key difference between Mufasa 1.0 and Mufasa 2.0.

IMPORTANT! The priority (and thus the wait before execution) of any job run via SLURM depends on its priority. Learn how to maximise the priority of your jobs.

SLURM is capable of managing complex computing systems composed of multiple clusters (i.e. sets) of servers, each comprising one node (i.e. machine) or more. The case of Mufasa is the simplest of all: Mufasa is in fact the one node of a SLURM computing cluster composed of a single machine.