Introduction to Slurm and modules - Exercises

Magali Hennion

14/10/2024

I. Setup

Connect to IFB cluster

Using SSH

ssh username@core.cluster.france-bioinformatique.fr

Or using OpenOnDemand

In order to make easier the work on the cluster, an OnDemand instance is implemented. This way, you can access the cluster, modify your files, run your scripts, see your results, etc. in a simple web browser.

Open a web browser and go to https://ondemand.cluster.france-bioinformatique.fr.
Enter your login and password and sign in. Grant access.
You can now use Jupyter Lab. Click on the app, then select your account, the resources you need (default resources are sufficient unless you want to run calculations within Jupyter Notebooks), and press Launch.

The launcher allows you to start a Terminal that can be used for the rest of this course.

Optional: use a file explorer

To view your files in your file manager and modify them directly using any local text editor you can also connect via sftp.

Please see the instructions for - Windows - Mac - Linux.

Be careful
Never use word processor (like Microsoft Word or LibreOffice Writer) to modify your code and never copy/past code to/from those softwares. Use only text editors and UTF-8 encoding.

Security warning
Never leave your computer unsupervised with your session open and HPC server connected.

Warm-up

Where are you on the cluster?

pwd

Then explore the /shared folder

tree -L 1 /shared

/shared/bank folder contains commonly used data and resources. Explore it by yourself with commmands like ls or cd.

Can you see the first 10 lines of the mm10.fa file? (mm10.fa = mouse genomic sequence version 10)

There is a 2417_wf4bioinfo project accessible to you, navigate to this folder and list what is inside.

Then go to one of your projects and create a folder named 2417_wf4bioinfo. This is where you will do all the exercices. If you don’t have a project, you can create a folder named with your login in the 2417_wf4bioinfo folder and work there.

cd /shared/projects/2417_wf4bioinfo/
mkdir -p $USER/day1-slurm_module
cd $USER/day1-slurm_module

II. Slurm basics

Get information about the cluster

sinfo

Slurm sbatch command

sbatch allows you to send an executable file to be ran on a computation node.

My first sbatch script

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo

echo ??

Exo 1: Starting from that minimal example, make a script named flatter.sh printing “What a nice training !”

Then run the script:

sbatch flatter.sh

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo

echo "What a nice training !"

The output that should have appeared on your screen has been diverted to slurm-xxxxx.out but this name can be changed using SBATCH options.

Exo 2: Modify flatter.sh to change Slurm output file name.

Hint - Use #SBATCH --output (-o in short)

then run it.

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --output flatter.out

echo "What a nice training !"

Exo 3: Run using sbatch the command hostname in a way that the sbatch outfile is called hostname.out.

What is the output ? How does it differ from typing directly hostname in the terminal and why ?

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --output hostname.out

# -- COMMANDS --
hostname

Useful sbatch options 1/2

Options	Flag	Function
–account	-A	account to run the job
−−partition	-p	partition to run the job
−−job-name	-J	give your job a name
−−output	-o	output file name
−−error	-e	error file name
−−chdir	-D	set the working directory before running
−−time	-t	limit the total run time (fast partition: 24h)
−−mem		memory that your job will have access to (per node)

To find out more, the Slurm manual man sbatch or https://slurm.schedmd.com/sbatch.html.

III. Job handling and monitoring

Follow your jobs

The sleep command : do nothing (delay) for the set number of seconds.

Exo 4: Restart from your previous sbatch script and launch a simple job that will launch sleep 600.

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --job-name=sleep
#SBATCH --output %x-%j.out

# -- COMMANDS --
sleep 600

squeue

On your terminal, type

squeue

ST Status of the job.
R = Running
PD = Pending

To see jobs on fast partition

squeue -p fast

To see only the jobs of untel

squeue -u untel

To see only your jobs

squeue --me

scancel

To cancel a job which you started, use the scancel command followed by the jobID (Number given by SLURM, visible in squeue)

scancel jobID

You can stop the previous sleep job with this command.
To cancel all your jobs at once, use --me.

scancel --me

Exo 5: What is the line #SBATCH --output %x-%j.out doing?

Click to see the answer

This command controls the name of Slurm output file. The %x means the job_name (sleep here), and the %j means the jobID (ie 41994442).

Filename pattern

sbatch allows for a filename pattern to contain one or more replacement symbols, which are a percent sign % followed by a letter.

Replacement symbols	Function
%j	jobid of the running job
%J	jobid.stepid of the running job (e.g. `128.0`)
%u	User name
%x	Job name

Find out more in Slurm documentation.

Job monitoring : sacct

Re-run sleep.sh and type

sacct

You can pass the option --format to list the information that you want to display, including memory usage, time of running,…
For instance

sacct --format=JobID,JobName,Start,Elapsed,CPUTime,NCPUS,NodeList,MaxRSS,ReqMeM,State

To see every options, run sacct --helpformat. We’ll see another useful command to monitore your jobs right after the module introduction!

IV. Modules

A lot of tools are installed on the cluster. To list them, use one of the following commands.

module available
module avail
module av

You can limit the search for a specific tool, for example look for the different versions of multiqc on the cluster using module av multiqc.

[mhennion @ clust-slurm-client 11:17]$ day1-slurm_module : module av multiqc
----------------- /shared/software/modulefiles ------------------------
multiqc/1.3  multiqc/1.6  multiqc/1.7  multiqc/1.9  multiqc/1.11  multiqc/1.12  multiqc/1.13

To load a tool

You can specify a version of the tool.

module load tool/1.3

You can load several tools at once.

module load tool1 tool2 tool3

Of note, the tool order might be important, for instance if several tools need python, the python version that will be used is the one of the last tool. To avoid conflicts, you can load the 1st tool, use it, then unload it (see below) and load the next one.

To list the modules loaded

module list

To unload one module

module unload tool1

To remove all loaded modules

module purge

module example : reportseff

After the run, the reportseff command allows you to access information about the efficiency of one or several job.

Exo 6: Load the module reportseff and check the resource usage of previous jobs.

Click to see an example solution

module load reportseff
reportseff .

reportseff can be run on one job using reportseff JobID.
reportseff output format can be customized

reportseff --format "+Start,CPUTime,NCPUS,NodeList,MaxRSS,ReqMeM" --modified-sort

See the full documentation.

Practical example

Module best practice : Load your modules within your “sbatch” file for consistency.

Exo 7: Run an alignment using STAR version 2.7.5a.

Starting from the following script, write a sbatch script to align reads.

#!/bin/bash
# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --job-name=Alignment
#SBATCH --output=star-alignment-%j.out
#SBATCH --error=star-alignment-%j.err
#SBATCH  ?? # increase memory to 30G

# -- MODULES --
module purge
module load  ?? # find appropriate STAR module (2.7.5a)

# -- VARIABLES --
pathToIndex= ?? # look for the path of the index for homo sapiens (hg38) made for STAR 
pathToFastq1= ?? # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R1 fastq.gz file
pathToFastq2= ?? # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R2 fastq.gz file
outputFileName= ?? # choose your output file name

# -- COMMANDS --
STAR --genomeDir $pathToIndex \
--readFilesIn $pathToFastq1 $pathToFastq2 \
--outFileNamePrefix $outputFileName \
--readFilesCommand zcat

The FASTQ files to align are in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq.
You need an index folder for STAR (version 2.7.5a) for the human hg38 genome, look for it in the banks.
You have to increase the RAM to 30G.

After the run

Check the resource that was used using reportseff.

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --job-name=Alignment
#SBATCH --output=star-alignment-%j.out
#SBATCH --error=star-alignment-%j.err
#SBATCH --mem=30G # increase memory to 30G

# -- MODULES --
module purge
module load star/2.7.5a # find appropriate STAR module (2.7.5a)

# -- VARIABLES --
pathToIndex=/shared/bank/homo_sapiens/hg38/star-2.7.5a # look for the path of the index for homo sapiens (hg38) made for STAR 
pathToFastq1=/shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq/D192red_2M_R1.fastq.gz # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R1 fastq.gz file
pathToFastq2=/shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq/D192red_2M_R2.fastq.gz # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R2 fastq.gz file
outputFileName=STAR_results/D192red # choose your output file name

# -- COMMANDS --
STAR --genomeDir $pathToIndex \
--readFilesIn $pathToFastq1 $pathToFastq2 \
--outFileNamePrefix $outputFileName \
--readFilesCommand zcat

V. Parallelization

Useful sbatch options 2/2

Options	Default	Function
−−nodes	1	Number of nodes required (or min-max)
−−nodelist		Select one or several nodes
−−ntasks-per-node	1	Number of tasks invoked on each node
−−mem	2GB	Memory required per node
−−cpus-per-task	1	Number of CPUs allocated to each task
−−mem-per-cpu	2GB	Memory required per allocated CPU
−−array		Submit multiple jobs to be executed with identical parameters

Multi-threading

Some tools allow multi-threading, i.e. the use of several CPUs to accelerate one task. It is the case of STAR with the --runThreadN option.

Exo 8: Modify the previous sbatch file to use 4 threads to align the FASTQ files on the reference. Run and check time and memory usage.

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --job-name=Alignment
#SBATCH --output=star-alignment-%j.out
#SBATCH --error=star-alignment-%j.err
#SBATCH --mem=30G # increase memory to 30G
#SBATCH --cpus-per-task=4


# -- MODULES --
module purge
module load star/2.7.5a # find appropriate STAR module (2.7.5a)

# -- VARIABLES --
pathToIndex=/shared/bank/homo_sapiens/hg38/star-2.7.5a # look for the path of the index for homo sapiens (hg38) made for STAR 
pathToFastq1=/shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq/D192red_2M_R1.fastq.gz # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R1 fastq.gz file
pathToFastq2=/shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq/D192red_2M_R2.fastq.gz # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R2 fastq.gz file
outputFileName=STAR_results/D192red # choose your output file name

# -- COMMANDS --
STAR --genomeDir $pathToIndex \
--readFilesIn $pathToFastq1 $pathToFastq2 \
--outFileNamePrefix $outputFileName \
--readFilesCommand zcat \
--runThreadN 4

Use Slurm variables

To save resources, we have generated a reduced genome, you can find it at /shared/projects/2417_wf4bioinfo/Slurm-training/star-2.7.5a_hg38_chr22. Modify your script to use this index. You can now reduce the RAM to 3Go.

The Slurm controller will set some variables in the environment of the batch script. They can be very useful. For instance, you can improve the previous script using $SLURM_CPUS_PER_TASK.

Exo 9: Modify the previous sbatch file to use the reduced index and $SLURM_CPUS_PER_TASK.

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --job-name=Alignment
#SBATCH --output=star-alignment-%j.out
#SBATCH --error=star-alignment-%j.err
#SBATCH --mem=3G # increase memory to 3G
#SBATCH --cpus-per-task=4


# -- MODULES --
module purge
module load star/2.7.5a # find appropriate STAR module (2.7.5a)

# -- VARIABLES --
pathToIndex=/shared/projects/2417_wf4bioinfo/Slurm-training/star-2.7.5a_hg38_chr22 # Use the index of hg38 chr22 only
pathToFastq1=/shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq/D192red_2M_R1.fastq.gz # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R1 fastq.gz file
pathToFastq2=/shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq/D192red_2M_R2.fastq.gz # look in /shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq to get the path to the R2 fastq.gz file
outputFileName=STAR_results/D192red # choose your output file name

# -- COMMANDS --
STAR --genomeDir $pathToIndex \
--readFilesIn $pathToFastq1 $pathToFastq2 \
--outFileNamePrefix $outputFileName \
--readFilesCommand zcat \
--runThreadN $SLURM_CPUS_PER_TASK

The full list of variables is visible in Slurm documentation.

Some useful ones:
- $SLURM_CPUS_PER_TASK
- $SLURM_JOB_ID
- $SLURM_JOB_ACCOUNT
- $SLURM_JOB_NAME
- $SLURM_JOB_PARTITION

Of note, Bash shell variables can also be used in the sbatch script:
- $USER
- $HOME
- $HOSTNAME
- $PWD
- $PATH

Job arrays

Job arrays allow to start the same job a lot of times (same executable, same resources) on different files for example. If you add the following line to your script, the job will be launch 6 times (at the same time), the variable $SLURM_ARRAY_TASK_ID taking the value 0 to 5.

#SBATCH --array=0-5

Exo 10: Starting from the following draft, make a simple script launching 6 jobs in parallel.

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --array= ?? # to adjust to select the samples to process (here all 6)
#SBATCH --output=HelloArray_%A_%a.out  # "%A" will be replaced by the job ID and "%a" by the task number
#SBATCH --job-name=ArrayExample

# -- VARIABLES --
SAMPLE_LIST=(S01 S02 S03 S04 S05 S06)
SAMPLE=${SAMPLE_LIST[$SLURM_ARRAY_TASK_ID]} # take the nth element of the list, n being the task number

# -- COMMANDS --
echo "Hello I am the task number $SLURM_ARRAY_TASK_ID from the job array $?." # $? Look for the Slurm variable for the job ID. 
sleep 20
echo "And I will process sample $SAMPLE."

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --array=0-5  # to adjust to the number of samples (here all 6)
#SBATCH --output=HelloArray_%A_%a.out  # "%A" will be replaced by the job ID and "%a" by the task number
#SBATCH --job-name=ArrayExample

# -- VARIABLES --
SAMPLE_LIST=(S01 S02 S03 S04 S05 S06)
SAMPLE=${SAMPLE_LIST[$SLURM_ARRAY_TASK_ID]} # take the nth element of the list, n being the task number

# -- COMMANDS --
echo "Hello I am the task number $SLURM_ARRAY_TASK_ID from the job array $SLURM_ARRAY_JOB_ID."  # $? Look for the Slurm variable for the job ID. 
sleep 20
echo "And I will process sample $SAMPLE."

It is possible to limit the number of jobs running at the same time using %max_running_jobs in #SBATCH --array option.

Exo 11: Modify your script to run only 2 jobs at the time.

You will see using squeue command that some of the tasks are pending until the others are over.

[user @ clust-slurm-client 11:28]$ star : squeue --me
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  42162738_[2-5%2]      fast ArrayExa mhennion PD       0:00      1 (JobArrayTaskLimit)
        42162738_0      fast ArrayExa mhennion  R       0:03      1 cpu-node-61
        42162738_1      fast ArrayExa mhennion  R       0:03      1 cpu-node-61

Click to see an example solution

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --array=0-5%2  # Run maximum 2 tasks at the time with "%2"
#SBATCH --output=HelloArray_%A_%a.out  # "%A" will be replaced by the job ID and "%a" by the task number
#SBATCH --job-name=ArrayExample

# -- VARIABLES --
SAMPLE_LIST=(S01 S02 S03 S04 S05 S06)
SAMPLE=${SAMPLE_LIST[$SLURM_ARRAY_TASK_ID]} # take the nth element of the list, n being the task number

# -- COMMANDS --
echo "Hello I am the task number $SLURM_ARRAY_TASK_ID from the job array $SLURM_ARRAY_JOB_ID."  # $? Look for the Slurm variable for the job ID. 
sleep 20
echo "And I will process sample $SAMPLE."

Array Example: Take all files matching a patern in a directory

#!/bin/bash

# -- SBATCH OPTIONS --
#SBATCH --account 2417_wf4bioinfo
#SBATCH --array=0-7   # if 8 files to proccess 

FASTQFOLDER=/shared/projects/2417_wf4bioinfo/Slurm-training/test_fastq
cd $FASTQFOLDER

FQ=(*fastq.gz)  #Create a bash array
echo ${FQ[@]}   #Echos array contents
INPUT=$(basename -s .fastq.gz "${FQ[$SLURM_ARRAY_TASK_ID]}") #Each elements of the array are indexed (from 0 to n-1) for slurm 
echo $INPUT     #Echos simplified names of the fastq files

You can alternatively use ls or find to identify the files to process and get the nth with sed (or awk).

#SBATCH --array=1-4   # If 4 files, as sed index start at 1
INPUT=$(ls $FASTQFOLDER/*.fastq.gz | sed -n ${SLURM_ARRAY_TASK_ID}p)
echo $INPUT

Job Array Common Mistakes

The index of bash arrays starts at 0
Don’t forget to have different output files for each task of the array
Same with your log names (%a or %J in the name will do the trick)
Do not overload the cluster! Please use %50 (for example) at the end of your indexes to limit the number of tasks (here to 50) running at the same time. The 51st will start as soon as one finishes!
The RAM defined using #SBATCH --mem=25G is for each task

Useful resources

To find out more, read the SLURM manual : man sbatch or https://slurm.schedmd.com/sbatch.html
Ask for help or signal problems on the cluster : https://community.france-bioinformatique.fr
IFB cluster documentation: https://ifb-elixirfr.gitlab.io/cluster/doc/

Thanks

drawing