Nextflow working group

Nextflow working group

2025-10-30

Frédéric Jarlier, Julien Roméjon, Philippe Hupé, Laurent Jourdren, Quentin Duvert

The pipeline

  • Environment

    module load nextflow
    module load mafft
  • Nextflow files

    main.nf
    nextflow.config
    sample.csv
  • Goal of the pipeline:

    • loadData: runs wget to fetch fasta sequences from the UniProt website

    • fusionFasta: runs cat to concatenate 2 fasta files

    • mafft: runs mafft to align the 2 sequences within the fused output of fusionFasta

The processes

  • main.nf : is responsible of the chaining of the processes

    process loadData {...}
    process fusionFasta {...}
    process mafft {...}
    
    workflow {
        main:
    
        loadData(....) 
        fusionFasta(...)
        mafft(...)
  • In the main.nf you include or write your processes

  • In the workflow you initialize the inputs, chain the processes et set the output

sample.csv and loadData

  • This files contains:
sample  url
P10415  https://www.uniprot.org/uniprot/P10415.fasta
P01308  https://www.uniprot.org/uniprot/P01308.fasta
  • Load
process loadData {
  tag { sample }
  
  publishDir "${params.outdir}/data/", mode : 'copy'

  input: tuple val(sample), val(url)
  output: tuple val('wget'), path("$sample"+".fasta"), emit: out

  script:
  """
  wget ${url} -O ${sample}.fasta
  """
}

mafft

process mafft {
    publishDir "${params.outdir}/alignment/", mode : 'copy'

    input: path(multifasta)
    output: path('mafft_align.fasta')
    
    script:
    """
    mafft ${multifasta} > mafft_align.fasta
    """
}

workflow

  • The workflow is the glue
workflow {
  main:
    Channel.fromPath(params.input) | splitCsv(header:true, sep: '\t') 
        | map { row -> [row.sample, row.url] } | set { input_ch }

    loadData(input_ch) // parallelized far all samples in sample.csv file

    loadData.out | groupTuple() | set {fastaList}

    /* give the list of files to concat to fusionFasta*/
    fusionFasta(fastaList)

    /* Run the mult. align with mafft*/
    mafft(fusionFasta.out) 
    
}

nextflow.config

/* Parameters for the pipeline  */
params { 
  input = 'sample.csv'
  outdir = "results"
}
/* reports rules */
timeline {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/execution_timeline.html"
}
report {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/execution_report.html"
}
  • Anothe way of passing params is to use command line: nextflow run main.nf --input=sample.csv --outdir=results

🔗 https://nextflow.io/docs/latest/reference/config.html

nextflow.config


trace {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/execution_trace.txt"
}
dag {
    enabled = true
    overwrite = true
    file    = "${params.outdir}/pipeline_info/pipeline_dag.html"
}

Run the pipeline

  • Call nextflow
(base) fjarlier@clust-slurm-client:~/TP_intro$ module load mafft
(base) fjarlier@clust-slurm-client:~/TP_intro$ nextflow run main.nf

 N E X T F L O W   ~  version 25.04.7

Launching `main.nf` [condescending_liskov] DSL2 - revision: 33d6a8bc68

executor >  local (4)
[5d/ec6219] loadData (P01308) [100%] 2 of 2 ✔
[a8/d2c870] fusionFasta (1)   [100%] 1 of 1 ✔
[5e/0caee7] mafft (1)         [100%] 1 of 1 ✔
  • Results
(base) fjarlier@clust-slurm-client:~/TP_intro$ ls
main.nf  nextflow.config  results  sample.csv  work
  • work is an intermediate folder
  • results contains the published results

Run the pipeline

  • DAG
(base) fjarlier@clust-slurm-client:~/TP_intro$ cd pipeline_info
(base) fjarlier@clust-slurm-client:~/TP_intro$ ls
execution_report.html  execution_timeline.html  execution_trace.txt  pipeline_dag.html