Nextflow working group

2025-10-30

Frédéric Jarlier, Julien Roméjon, Philippe Hupé, Laurent Jourdren, Quentin Duvert

The pipeline

Environment
```
module load nextflow
module load mafft
```
Nextflow files
```
main.nf
nextflow.config
sample.csv
```
Goal of the pipeline:
- loadData: runs wget to fetch fasta sequences from the UniProt website
- fusionFasta: runs cat to concatenate 2 fasta files
- mafft: runs mafft to align the 2 sequences within the fused output of fusionFasta

The processes

main.nf : is responsible of the chaining of the processes

process loadData {...}
process fusionFasta {...}
process mafft {...}

workflow {
    main:

    loadData(....) 
    fusionFasta(...)
    mafft(...)

In the main.nf you include or write your processes
In the workflow you initialize the inputs, chain the processes et set the output

sample.csv and loadData

This files contains:

sample  url
P10415  https://www.uniprot.org/uniprot/P10415.fasta
P01308  https://www.uniprot.org/uniprot/P01308.fasta

Load

process loadData {
  tag { sample }
  
  publishDir "${params.outdir}/data/", mode : 'copy'

  input: tuple val(sample), val(url)
  output: tuple val('wget'), path("$sample"+".fasta"), emit: out

  script:
  """
  wget ${url} -O ${sample}.fasta
  """
}

mafft

process mafft {
    publishDir "${params.outdir}/alignment/", mode : 'copy'

    input: path(multifasta)
    output: path('mafft_align.fasta')
    
    script:
    """
    mafft ${multifasta} > mafft_align.fasta
    """
}

workflow

The workflow is the glue

workflow {
  main:
    Channel.fromPath(params.input) | splitCsv(header:true, sep: '\t') 
        | map { row -> [row.sample, row.url] } | set { input_ch }

    loadData(input_ch) // parallelized far all samples in sample.csv file

    loadData.out | groupTuple() | set {fastaList}

    /* give the list of files to concat to fusionFasta*/
    fusionFasta(fastaList)

    /* Run the mult. align with mafft*/
    mafft(fusionFasta.out) 
    
}

nextflow.config

/* Parameters for the pipeline  */
params { 
  input = 'sample.csv'
  outdir = "results"
}
/* reports rules */
timeline {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/execution_timeline.html"
}
report {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/execution_report.html"
}

Anothe way of passing params is to use command line: nextflow run main.nf --input=sample.csv --outdir=results

🔗 https://nextflow.io/docs/latest/reference/config.html

nextflow.config


trace {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/execution_trace.txt"
}
dag {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/pipeline_dag.html"
}

Run the pipeline

Call nextflow

(base) fjarlier@clust-slurm-client:~/TP_intro$ module load mafft
(base) fjarlier@clust-slurm-client:~/TP_intro$ nextflow run main.nf

 N E X T F L O W   ~  version 25.04.7

Launching `main.nf` [condescending_liskov] DSL2 - revision: 33d6a8bc68

executor >  local (4)
[5d/ec6219] loadData (P01308) [100%] 2 of 2 ✔
[a8/d2c870] fusionFasta (1)   [100%] 1 of 1 ✔
[5e/0caee7] mafft (1)         [100%] 1 of 1 ✔

Results

(base) fjarlier@clust-slurm-client:~/TP_intro$ ls
main.nf  nextflow.config  results  sample.csv  work

work is an intermediate folder
results contains the published results

Run the pipeline

(base) fjarlier@clust-slurm-client:~/TP_intro$ cd pipeline_info
(base) fjarlier@clust-slurm-client:~/TP_intro$ ls
execution_report.html  execution_timeline.html  execution_trace.txt  pipeline_dag.html