2025-10-30
Frédéric Jarlier, Julien Roméjon, Philippe Hupé, Laurent Jourdren, Quentin Duvert
Workflow manager
Nextflow key concepts
Channel & Operator
Process & Directive
Configuration & Profile
Debug & Metrics
Conclusion
algorithm
bioinformatics tools
analysis parameters
genomes, threshold
biological tuning
cluster
error management
parallel execution
containers
technical tuning
log management, reproducibility
“Write the biological part and delegate the informatic one to the
workflow manager”
-> BIO (what) -
INFO (how)
workflow manager adds complexity to your pipeline stack
you must get benefits from it
never paste pre-existing code into your workflow manager
“it works!” is not a good enough reason
try to use the most features you can from your workflow manager
always ask yourself about the more nextflow-ic way to do something
|
Operators work on Channels
Directives work on Processes
Workflows chain Processes by injecting Channels
Nextflow is a Data Flow Model programming language that helps building complex worflows
The idea is to chain multiple tasks like in *nix system with piping command
Nextflow > Groovy > Java / main file:
main.nf
Queue Channel (Dataflow channel)
asynchronous sequence of values (FIFO)
iteration, consumption
Value Channel (Dataflow value)
unique value
never empty, no consumption
Initialisation -> Factories
Manipulation -> Operators
Filesystem manipulations
Paired files, recursive search…
files = Channel.fromPath('data/**.fa')
moreFiles = Channel.fromPath('data/**/*.fa')
Channel
.fromFilePairs('/my/data/SRR*_{1,2}.fastq')
.view()
// output similar to:
// [SRR493366, [/my/data/SRR493366_1.fastq, /my/data/SRR493366_2.fastq]]
// [SRR493367, [/my/data/SRR493367_1.fastq, /my/data/SRR493367_2.fastq]]
// [SRR493368, [/my/data/SRR493368_1.fastq, /my/data/SRR493368_2.fastq]]
// [SRR493369, [/my/data/SRR493369_1.fastq, /my/data/SRR493369_2.fastq]]
// [SRR493370, [/my/data/SRR493370_1.fastq, /my/data/SRR493370_2.fastq]]
// [SRR493371, [/my/data/SRR493371_1.fastq, /my/data/SRR493371_2.fastq]].filter{...}, .first(),
.unique().map{...}, .groupTuple(),
.collect(), .flatten().splitCsv(), .splitFasta(...).join(...), .mix(...),
.concat(...).multimap{...}, .branch{...}.count(), .min(), .max().dump(), .set{...},
.ifEmpty(...), .view()process <name> {
[directive]
input:
<input qualifier> <input name>
output:
<output qualifier> <output name>[, emit: <name>]
script:
<script to be executed>
}“A process starts when all the inputs are ready”
“Path outputs must have been produced at the end of the
process”
Optional settings to customize process execution
Different kinds
nextflow.config$HOME/.nextflow/confignextflow.config in projectDir, then launchDir-c <config-files>
option from the command line-C <config-files> <=>
ignore all the other config filesincludeConfig 'path/extra.config'paramsprocesstimeline, report,
trace, workflow, dagconda, apptainer,
dockeraws, azure, google,
tower…Profile <=> activate specific parts of configuration files from the command line
$ nextflow -C custom.config run main.nf -profile sgu,clst --run G553 --bclDir /path
-C custom.config: nextflow option to load a unique
configuration file
run: nextflow command
main.nf: main script
-profile sgu,clst: option of the run command to
activate 2 profiles defined in configuration files
--run G553, --bclDir /path: pipeline
parameters, overriding default values from params
scope
Channel & Operator || Queue vs Value ; be sure about the content of your channels
Process & Directive || Execution order based on input availability
Configuration & Debug || Make the work directory
your friend
automatic parallelization
software dependencies: conda, docker, apptainer/singularity, etc…
high portability: scheduler (pbs, slurm, …) / cloud (google, aws, azure, etc…)
error management: error recovery with “-resume”, dry
run with “-stub”, debug with “work”
files
Seqera AI chatbot: https://seqera.io/ask-ai
nf-core: https://nf-co.re
geniac: https://geniac.readthedocs.io