Institut Français de Bioinformatique (IFB)
Last updated: 2025-09-29
2025-09-28
Chloé QUIGNOT (BIOI2 @I2BC) - ORCID: 0000-0001-8504-232X
Source: adapted from FAIRbioinfo 2021 training material of the IFB and Snakemake introduction tutorial from BIOI2
Material under CC-BY-SA
licence
Snakemake = Python (aka “snake”, a programming language) + Make (a rule-based automation tool)
Workflows are made up of blocks, each block performs a specific (set of) instruction(s)
| - 1 rule = 1 instruction (ideally) - inputs and outputs are one or multiple files - at least 1 input and/or 1 output per rule |
execution order ≠ code order => Snakemake does a pick & mix of the rules it needs at execution
Rules are linked together by Snakemake using matching filenames in their input and output directives.
At execution, Snakemake creates a DAG (directed acyclic graph), that it will follow to generate the final output of your pipeline.
Below is a workflow example using 2 tools sequentially to align 2 protein sequences:
In this example, we have:
fusionFasta and Mafft*.fasta*fused.fasta*aligned.fasta generated by
Mafft| Snakemake (Smk) steps | running path |
|---|---|
| Smk creates the DAG from the snakefile | |
Smk sees that the final output
*aligned.fasta doesn’t exist but knows it can create it
with the Mafft rule |
|
Mafft needs files matching
*fused.fasta (don’t exist) but the fusionFasta
rule can generate it |
|
fusionFasta needs
.fasta files |
| Snakemake steps | running path |
|---|---|
.fasta files exist! Smk stops
backtracking |
|
Smk runs the fusionFasta
rule |
|
P10415_P01308_fused.fasta
exists and feeds the Mafft rule |
|
the final output
(P10415_P01308_aligned.fasta) is generated, the workflow
has finished |
Snakemake’s job is to make sure that everything is up-to-date, otherwise it (re-)runs the rules that need to be run…
Rules are run if:
Many default files constitute the “Snakemake system” & there are standards on how to organise them.
They are not all necessary for a basic pipeline execution.
The most important is the Snakefile, that’s where all
the code is saved.
For more information: https://github.com/snakemake-workflows/snakemake-workflow-template
rule myRuleName:
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
rule myRuleName:
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
=> Rules usually have a unique name which defines them
rule myRuleName:
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
=> Rules usually have a unique name which defines
them
=> input, output, shell etc.
are called directives
rule myRuleName:
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
=> Rules usually have a unique name which defines
them
=> input, output, shell etc.
are called directives
=> "myInputFile" & "myOutputFile"
specify 1 or more input & output files
rule myRuleName:
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
=> Rules usually have a unique name which defines
them
=> input, output, shell etc.
are called directives
=> "myInputFile" & "myOutputFile"
specify 1 or more input & output files
=> shell specifies what to do (shell
commands in this case -> alternative directives exist)
rule myRuleName:
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
=> Rules usually have a unique name which defines
them
=> input, output, shell etc.
are called directives
=> "myInputFile" & "myOutputFile"
specify 1 or more input & output files
=> shell specifies what to do (shell
commands in this case -> alternative directives exist)
=> {input} & {output} are
placeholders & are replaced by input & output
file names at execution
rule myRuleName:
____input: "myInputFile"
____output: "myOutputFile"
____shell: "echo {input} > {output}"
=> Rules usually have a unique name which defines
them
=> input, output, shell etc.
are called directives
=> "myInputFile" & "myOutputFile"
specify 1 or more input & output files
=> shell specifies what to do (shell
commands in this case -> alternative directives exist)
=> {input} & {output} are
placeholders & are replaced by input & output
file names at execution
=> code alignment (=indentations) is important
=> files and shell directives should be given within
quotes (', " or """ for
multi-line code)
=> additional & optional directives exist, e.g.:
params:, resources:, log:, etc.
(we’ll see them later)
For more information: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
rule mafft:
input:
"P10415_P01308_fused.fasta",
output:
"P10415_P01308_aligned.fasta",
shell:
"""
mafft {input} > {output}
"""
rule fusionFasta:
input:
p1="P10415.fasta",
p2="P01308.fasta",
output:
"P10415_P01308_fused.fasta",
shell:
"""
cat {input.p1} {input.p2} > {output}
"""
fusionFasta & mafftfusionFasta: 2 input (p1 &
p2) & 1 output filemafft: 1 input & 1 output fileNB: input & output files can be named
e.g. p1="P10415.fasta"
and explicitly accessed in shell
e.g. {input.p1} or {input[0]}
When Snakemake is installed (how to install):
Snakefilesnakemake --cores 1 to run the pipeline
(--cores specifies the number of cores to use)When you run Snakemake, you’ll get a full report printed on the screen of its progress:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
------- -------
fusionFasta 1
mafft 1
total 2
[...]
5 of 3 steps (100%) done
Complete log: .snakemake/log/2024-02-20T150605.574089.snakemake.log
When it’s finished, a .snakemake folder will appear in
your working directory:
ls -a to see itTo visualise the complete workflow (--dag), rule
dependencies (--rulegraph) or rule dependencies with their
I/O files, in dot language. Uses the dot tool of the
graphviz package to create a png, pdf or other format:
snakemake --dag | dot -Tpng > dag.png
snakemake --rulegraph | dot -Tpng > rule.png
snakemake --filegraph | dot -Tpng > file.png
--dry-run
optionUsing this option will perform a “dry-run” i.e. nothing will be executed but everything that would’ve been run is displayed on the screen
-p --printshellcmds-DAll command line options: https://snakemake.readthedocs.io/en/stable/executing/cli.html#all-options
input and output to specify input & output
files):rule myRuleName
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
{input} &
{output} placeholders within the shell directive
snakemake --cores 1 command
(+ other options available)--dag, --rulegraph,
--filegraph and --dry-runwildcardse.g. {upid}, {sample} etc.
{}".+")
rule fusionFasta:
input:
p1="P10415.fasta",
p2="P01308.fasta",
output:
"P10415_P01308_fused.fasta",
shell:
"""
cat {input.p1} {input.p2} > {output}
"""
rule fusionFasta:
input:
p1="{upid1}.fasta",
p2="{upid2}.fasta",
output:
"{upid1}_{upid2}_fused.fasta",
shell:
"""
cat {input.p1} {input.p2} > {output}
"""
Objective: learn how to run an already-existing Snakefile