*Basing yourself on a bash script, the idea of this exercise is to add another branch to your previous pipeline to consolidate what you've learnt previously.*

# Where to start?

## Understand the bash script

Let’s understand the bash script & identify all the steps and tools involved that we will need.

Let’s split the bash script into parts:

- The first part is applied once, at the beginning of the script in order to index the reference genome that will be used to map the reads:

```bash
# line 30
bowtie2-build --threads ${ncpus_build} ${genome} Bwt2_index/tauri > Bwt2_index/Bwt2_index.log 2>&1
   # input: reference genome fasta
   # tool: bowtie2-build (bowtie/1.3.1)
   # aim: index the reference genome
   # output: Bwt2_index/ folder with indexed genome files
```

- In the “for” loop section, we apply on each sample independently a set of tools:

```bash
# line 43
fastqc --outdir FastQC ${rnadir}/${sample}.fastq.gz > FastQC/${sample}.log 2>&1
   # input: each RNAseq fastq
   # tool: fastqc (fastqc/0.12.1)
   # aim: run QC analysis (per sample)
   # output: a zip & html report in FastQC/ folder
```

```bash
# line 47
bowtie2 --threads ${ncpus_map} -x Bwt2_index/tauri -U ${rnadir}/${sample}.fastq.gz -S SAMs/${sample}.sam > LOGs/${sample}_bowtie2.log 2>&1
   # input: each RNAseq fastq + the indexed reference genome files
   # tool: bowtie2 (bowtie/1.3.1)
   # aim: map the reads onto the reference genome (per sample)
   # output: mapped reads in SAM format within the SAMs/ folder
```

```bash
# lines 51-53
samtools view -b SAMs/${sample}.sam -o BAMs/${sample}.bam
samtools sort --threads ${ncpus_sort} BAMs/${sample}.bam -o BAMs/${sample}_sort.bam
samtools index -@ ${ncpus_index} BAMs/${sample}_sort.bam
   # input: mapped reads in SAM format
   # tool: samtools (samtools/1.18)
   # aim: 1. change format sam -> bam (per sample)
   #      2. sort reads (per sample)
   #      3. index the sorted bam (per sample)
   # general output: sorted and indexed reads in BAM format within the BAMs/ folder
```

```bash
# line 57
featureCounts -T ${ncpus_count} -t gene -g ID -a ${annots} -s 2 -o Counts/${sample}_ftc.txt BAMs/${sample}_sort.bam > LOGs/${sample}_ftc.log 2>&1
   # input: sorted & indexed reads in BAM format + reference genome annotation
   # tool: featureCounts (subread/subread-1.5.2)
   # aim: count reads per gene in reference genome (per sample)
   # output: a gene count table within the Counts/ folder
```

- Outside the for loop, at the end of the bash pipeline, we concatenate all the count tables & al the FastQC reports:

```bash
# lines 61-62
paste Counts/*_ftc.txt > Counts/ftc_tmp.txt
awk -v nb=${nbs} -v col=7 'BEGIN {FS ="\t"}{ctmp=$1; for (i=col; i<=nb*col; i=i+col) {count=sprintf("%s\t%s", ctmp, $i); ctmp=count}; print count}' Counts/ftc_tmp.txt | sed '1d' > Counts/counts.txt
   # input: gene count table of all samples
   # tools: paste & awk (standard in bash)
   # aim: concatenate the gene count table of all samples and reformat to a clean tab-separated file
   # output: a single clean tab-separated gene count table containing all samples
```

```bash
# line 65
multiqc FastQC/ --filename report_multiqc.html --outdir FastQC/
```

- The final pipeline should look like this (without going into much detail with the individual input/output files):

![Workflow example](images/tp_workflow_extension.png)

## Try running the bash script as is

If you try running the bash script, make sure you to load the modules first:

```bash
module load multiqc/1.13 fastqc/0.12.1 subread/2.0.6 bowtie/1.3.1 samtools/1.18
```
NB: subread is the package that contains featureCounts

Execution example (copy the script to your directory first):

```bash
bash bash_pipeline.sh -g Data/O.tauri_genome.fna -a Data/O.tauri_annotation.gff -d Data/ SRR3099585_chr18 SRR3099586_chr18 SRR3099587_chr18 SRR3105697_chr18 SRR3105698_chr18 SRR3105699_chr18
```

## Your mission

The FastQC/MultiQC branch is already implemented in the Snakefile from the previous exercises.
Your mission is to start from that Snakefile and add the bottom branch (bowtie/samtools/counts) to it.