Difference between revisions of "System"
Line 33: | Line 33: | ||
= Accessing Mufasa = | = Accessing Mufasa = | ||
User access to Mufasa is always remote and exploits the ''SSH'' (''Secure SHell'') protocol. To open a remote connection to Mufasa, open a local terminal on your computer and, in it, run command | User access to Mufasa is always remote and exploits the ''SSH'' (''Secure SHell'') protocol. | ||
To open a remote connection to Mufasa, open a local terminal on your computer and, in it, run command | |||
<pre style="color: silver; background: black;"> | <pre style="color: silver; background: black;"> | ||
Line 85: | Line 87: | ||
= File transfer = | = File transfer = | ||
Uploading files from local machine to Mufasa and downloading files from Mufasa onto local machines is done using the '' | Uploading files from local machine to Mufasa and downloading files from Mufasa onto local machines is done using the ''SFTP'' protocol (''Secure File Transfer Protocol''). | ||
Linux and MacOS users can directly use the ''sftp'' package, as explained (for instance) by [https://geekflare.com/sftp-command-examples/ this guide]. Windows users can interact with Mufasa via SFTP protocol using the [https://mobaxterm.mobatek.net/ MobaXterm] software package. | |||
For Linux and MacOS user, file transfer to/from Mufasa occurs via an ''interactive sftp shell'', i.e. a remote shell very similar to the one one described in [[Accessing Mufasa|Accessing Mufasa]]. | |||
The first thing to do is to open a terminal and run the following command (note the similarity to SSH connections): | |||
<pre style="color: silver; background: black;"> | |||
sftp <your_username_on_Mufasa>@<Mufasa's_IP_address> | |||
</pre> | |||
where <code><Mufasa's_IP_address></code> is either <code>'''10.79.23.96'''</code> or <code>'''10.79.23.97'''</code> | |||
You will be asked your password. Once you provide it, you access | You will be asked your password. Once you provide it, you access an interactive sftp shell, where the command prompt takes the form | ||
<pre style="color: silver; background: black;"> | |||
sftp> | |||
</pre> | |||
From this shell you can run the commands to exchange files. Most of these commands have two forms: one to act on the remote machine (in this case, Mufasa) and one to act on the local machine (i.e. your own computer). To differentiate, the “local” versions usually have names that start with the letter “l” (lowercase L). | |||
MacOS users can interact with Mufasa via SFTP also using the [https://cyberduck.io/ Cyberduck] software package. | MacOS users can interact with Mufasa via SFTP also using the [https://cyberduck.io/ Cyberduck] software package. | ||
The most basic ''sftp'' commands (to be issued from the sftp command prompt) are: | The most basic ''sftp'' commands (to be issued from the sftp command prompt) are: |
Revision as of 15:23, 17 January 2022
Mufasa is a Linux server located in a server room managed by the System Administrators. Job Users and Job Administrators can only access Mufasa remotely.
Remote access to Mufasa is performed using the SSH protocol for the execution of commands and the SFTP protocol for the exchange of files. Once logged in, a user interacts with Mufasa via a terminal (text-based) interface.
Hardware
Mufasa is a server for massively parallel computation. Its main hardware components are:
- 32-core, 64-thread AMD processor
- 1 TB RAM
- 9 TB of SSDs (for OS and execution cache)
- 28TB of HDDs (for user /home directories)
- 5 Nvidia A100 GPUs [based on the Ampere architecture]
- Linux Ubuntu operating system
Usually each of these resources (e.g., a GPU) is not fully assigned to a single user or a single job. On the contrary, access resources are shared among different users and processes in order to optimise their usage and availability.
For what concerns GPUs, the 5 physical A100 GPUs are subdivided into “virtual” GPUs with different capabilities using Nvidia' MIG system. From MIG's user guide:
“The Multi-Instance GPU (MIG) feature allows GPUs based on the NVIDIA Ampere architecture (such as NVIDIA A100) to be securely partitioned into up to seven separate GPU Instances for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilization. This feature is particularly beneficial for workloads that do not fully saturate the GPU’s compute capacity and therefore users may want to run different workloads in parallel to maximize utilization.”
In practice, MIG allows flexible partitioning of a very powerful (but single) GPU to create multiple virtual GPUs with different capabilities, that are then made available to users as if they were separate devices.
Command
(“smi” stands for System Management Interface) provides an overview of the physical and virtual GPUs available to users in a system<ref>On Mufasa, this command may require to be launched via the SLURM job scheduling system (as explained in Section 2 of this document) in order to be able to access the GPUs.
Accessing Mufasa
User access to Mufasa is always remote and exploits the SSH (Secure SHell) protocol.
To open a remote connection to Mufasa, open a local terminal on your computer and, in it, run command
ssh <your_username_on_Mufasa>@<Mufasa's_IP_address>
where <Mufasa's_IP_address>
is either 10.79.23.96
or 10.79.23.97
For example, user mrossi may access Mufasa with command
ssh mrossi@10.79.23.97
Access via SSH works with Linux, MacOs and Windows 10 (and later) terminals. For Windows users, a handy alternative tool (also including an X server, required to run on Mufasa Linux programs with a graphical user interface) is MobaXterm.
If you don't have a user account on Mufasa, you first have to ask your supervisor for one. See Users and groups for more information about Mufasa's users.
As soon as you launch the ssh command, you will be asked to type the password (i.e., the one of your user account on Mufasa). Once you provide the password, the local terminal on your computer becomes a remote terminal (a “remote shell”) through which you interact with Mufasa. The remote shell sports a command prompt such as
<your_username_on_Mufasa>@rk018445:~$
(rk018445 is the Linux hostname of Mufasa). For instance, user mrossi will see a prompt similar to this:
mrossi@rk018445:~$
In the remote shell, you can issue commands to Mufasa by typing them after the prompt, then pressing the enter key. Being Mufasa a Linux server, it will respond to all the standard Linux system commands such as pwd
(which prints the path to the current directory) or cd <destination_dir>
(which changes the current directory). On the internet you can find many tutorials about the Linux command line, such as this one.
To close the SSH session run
exit
from the command prompt of the remote shell.
VPN
To be able to connect to Mufasa, your computer must belong to Polimi's LAN. This happens either because the computer is physically located at Politecnico di Milano and connected via ethernet, or because you are using Polimi's VPN to connect to its LAN from somewhere else (such as your home). In particular, using the VPN is the only way to use Mufasa from outside Polimi. See this DEIB webpage for instructions about how to activate VPN access.
Timeout
SSH sessions to Mufasa may be subjected to an inactivity timeout: i.e., after a given inactivity period the ssh session gets automatically closed. Users who need to be able to reconnect to the very same shell where they launched a program (for instance because their program is interactive or because it provides progress update messages) should use the screen command, as explained in User Jobs#Using screen with srun.
Using SSH with graphics
The standard form of the ssh command, i.e. the one described above, should always be preferred. However, it only allows text communication with Mufasa. In special cases it may be necessary to remotely run (on Mufasa) Linux programs that have a graphical user interface. These programs require interaction with the X server of the remote user's machine (which must use Linux as well). A special mode of operation of ssh is needed to enable this. This mode is engaged by running
ssh -X <your username on Mufasa>@<Mufasa's IP address>
File transfer
Uploading files from local machine to Mufasa and downloading files from Mufasa onto local machines is done using the SFTP protocol (Secure File Transfer Protocol).
Linux and MacOS users can directly use the sftp package, as explained (for instance) by this guide. Windows users can interact with Mufasa via SFTP protocol using the MobaXterm software package.
For Linux and MacOS user, file transfer to/from Mufasa occurs via an interactive sftp shell, i.e. a remote shell very similar to the one one described in Accessing Mufasa. The first thing to do is to open a terminal and run the following command (note the similarity to SSH connections):
sftp <your_username_on_Mufasa>@<Mufasa's_IP_address>
where <Mufasa's_IP_address>
is either 10.79.23.96
or 10.79.23.97
You will be asked your password. Once you provide it, you access an interactive sftp shell, where the command prompt takes the form
sftp>
From this shell you can run the commands to exchange files. Most of these commands have two forms: one to act on the remote machine (in this case, Mufasa) and one to act on the local machine (i.e. your own computer). To differentiate, the “local” versions usually have names that start with the letter “l” (lowercase L).
MacOS users can interact with Mufasa via SFTP also using the Cyberduck software package.
The most basic sftp commands (to be issued from the sftp command prompt) are:
cd '<'path'>Change directory to <path> on remote machine (i.e. Mufasa)
lcd '<'path'>''''''Change directory to <path> on local machine (i.e. user's machine)
get <file>Downloads (i.e. copies) <file> from current directory of remote
machine tocurrent directory of local machine
put <file>Uploads (i.e. copies) <file> from current directory of local machine to
current directory of remote machine
exitQuit sftp
Of course, a user can only upload files to directories where they have write permission (usually only their own /home directory and its subdirectories), and can only download files that they have read permission.
Docker containers
As a general rule, all computation performed on Mufasa must occur within Docker containers. This allows every user to configure their own execution environment without any risk of interfering with everyone else's.
From Docker's documentation:
“Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure.
Docker provides the ability to package and run an application in a loosely isolated environment called a container. The isolation and security allow you to run many containers simultaneously on a given host. Containers are lightweight and contain everything needed to run the application, so you do not need to rely on what is currently installed on the host.
A container is a sandboxed process on your machine that is isolated from all other processes on the host machine. When running a container, it uses an isolated filesystem. [containing] everything needed to run an application - all dependencies, configuration, scripts, binaries, etc. The image also contains other configuration for the container, such as environment variables, a default command to run, and other metadata.”
Using Docker allows each user of Mufasa to build the software environment that their job(s) require. In particular, using Docker containers enables users to configure their own (containerized) system and install any required libraries on their own, without need to ask administrators to modify the configuration of Mufasa. As a consequence, users can freely experiment with their (containerized) system without risk to the work of other users and to the stability and reliability of Mufasa. In particular, containers allow users to run jobs that require multiple and/or obsolete versions of the same library.
A large number of preconfigured Docker containers are already available, so users do not usually need to start from scratch in preparing the environment where their jobs will run on Mufasa. The official Docker container repository is dockerhub.
How to run Docker containers on Mufasa will be explained in Part 2 of this document.
The SLURM job scheduling system
Mufasa uses SLURM to manage shared access to its resources. Users of Mufasa must use SLURM to run and manage the jobs they run on the machine<ref>It is possible for users to run jobs without using SLURM; however, running jobs run this way is only intended for “housekeeping” activities and only provides access to a small subset of Mufasa's resources. For instance, jobs run outside SLURM cannot access the GPUs, can only use a few processor cores, can only access a small portion of RAM. Using SLURM is therefore necessary for any resource-intensive job. </ref>. From SLURM's documentation:
“Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.”
The use of a job scheduling system ensures that Mufasa's resources are exploited in an efficient way. However, the fact that a schedule exists means that usually a job does not get immediately executed as soon as it is launched: instead, the job gets queued and will be executed as soon as possible, according to the availability of resources in the machine.
Useful references for SLURM users are the collected man pages and the command overview.
In order to let SLURM schedule job execution, before launching a job a user must specify what resources (such as RAM, processor cores, GPUs, ...) it requires. While managing process queues, SLURM will consider such requirements and match them with the available resources. As a consequence, resource-heavy jobs generally take longer to get executed, while less demanding jobs are usually put into execution quickly. On the other hand, processes that -while running- try to use more resources than they requested get killed by SLURM to avoid damaging other jobs.
All in all, the take-away message is: consider carefully how much resources to ask for your job.
In Part 2 of this document it will be explained how resource requests can be greatly simplified by making use of predefined resource sets called SLURM partitions.
Users and groups
As already explained, only Mufasa users can access the machine and interact with it. Creation of new users is done by Job Administrators or by specially designated users within each research group.
Mufasa usernames have the form xyyy (all lowercase) where x is the first letter of the first name and yyy is the complete surname. For instance, user Mario Rossi will be assigned user name mrossi. If multiple users with the same surname and first letter of the name exist, those created after the first are given usernames xyyy01, xyyy02, and so on.
On Linux machines such as Mufasa, users belong to groups. On Mufasa, groups are used to identify the research group that a specific user is part of. Assigment of Mufasa's users to groups follow these rules:
- All users belong to group users.
- Additionally, each user must belong to one and only one of the following (within brackets is the name of the faculty who is in charge of Mufasa for each group):
- nearmrs, i.e. Medical Robotics Section of NearLab (prof. De Momi);
- nearnes, i.e. NeuroEngineering Section of NearLab (prof. Ferrante);
- cartcas, i.e. CartCasLab (prof. Cerveri);
- biomech, i.e. Biomechanics Research Group (prof. Votta);
- bio, for BioEngineering users not belonging to the research groups listed above.
Users who are not Job Administrators but have been given the power to create users can do so with command
sudo /opt/share/sbin/add_user.sh -u <user> -g users,<group>
where <user> is the username of the new user and <group> is one of the 6 groups from the list above.
For instance, in order to create a user on Mufasa for a person named Mario Rossi belonging to the NeuroEngineering Section of NearLab, the following command will be used:
sudo /opt/share/sbin/add_user.sh -u mrossi -g users,nearnes
New users are created with a predefined password, that they will be asked to change at their first login. For security reason, it is important that such first login occurs as soon as possible.