{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c18a119f-ab4b-4f73-ac3e-7e0ba72cbf6b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "# Introduction to Snakemake workflow\n",
    "\n",
    "<a href=\"https://snakemake.readthedocs.io/en/stable/\"><img src=\"images/logo-snakemake.png\" alt=\"snakemake\" width=30%/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4d94a31-f945-4967-82cb-35170feffc3e",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "**schedule:**\n",
    "- workflow introduction\n",
    "- snakemake introduction, rule concept\n",
    "- snakemake & snakefile\n",
    "- example with a 2-steps workflow"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4455298-a049-4f8b-ad6f-adbe3f956650",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Workflow definition\n",
    "\n",
    "a pool of commands, progressively linked by the treatments, from the input data towards the results:\n",
    "\n",
    "<img src=\"images/FAIR_smk_data_wf.png\" alt=\"a workflow\" width=80%/>\n",
    "\n",
    "_arrow: output of tool n−1 = input for tool n_\n",
    "\n",
    "In case of data paralelization, several data flows can be processed in parallel:\n",
    "\n",
    "<img src=\"images/FAIR_smk_n_data_wf.png\" alt=\"a workflow\" width=80%/>\n",
    "\n",
    "With a multi-cores PC or a computational cluster (ex. 2000 cores), one (or more) core can be attributed to one workflow."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "326231e7-0fcd-4cc4-a9d6-3a2e8f3a348a",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Workflow management systems\n",
    "\n",
    "Many workflow management systems, many forms:\n",
    "- command line: shell (need to script allelization alone, not easy)\n",
    "- rule: <a href=\"https://snakemake.readthedocs.io/en/stable/\"><img src=\"images/logo-snakemake.png\" alt=\"snakemake\" width=16%/></a>, <a href=\"https://cmake.org\"><img src=\"images/logo-cmake.png\" alt=\"c-make\" width=10%/></a>, <a href=\"https://www.nextflow.io/\"><img src=\"images/logo-nextflow.png\" alt=\"nextflow\" width=10%/></a>, ...\n",
    "- graphic interface: <a href=\"https://usegalaxy.org\"><img src=\"images/logo-galaxy.png\" alt=\"Galaxy\" width=12%/></a>, Taverna, Keppler, ...\n",
    "\n",
    "**pros:** <br>\n",
    "- reproducibility: keep track (when file was generated & how) <br>\n",
    "- manage parallelization (error recovery)\n",
    "\n",
    "**cons:** <br>\n",
    "- learning effort"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22492525-eea3-4c7a-ba3c-c05e967c6f52",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "We choose:<br>\n",
    "<a href=\"https://snakemake.readthedocs.io/en/stable/\"><img src=\"images/logo-snakemake.png\" alt=\"snakemake\" width=16%/></a>\n",
    "\n",
    "- works on **files** (rather than streams, reading/writing from databases, or passing variables in memory)<br>\n",
    "- is based on **Python** (but know how to code in Python is not required)<br>\n",
    "- has features for defining the **environment** for each task (running a large number of small third-party tools is current in bioinformatics)<br>\n",
    "- is easily to be **scaled** from desktop to server, cluster, grid or cloud environments without modification from your single core laptop (ie. develop on laptop using a small subset of data, run the real analysis on a cluster) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c05912d-7ca9-4086-aeba-81dd8b28f845",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## The Snakemake rule (1/2)\n",
    "\n",
    "Snakemake: mix of the programming language Python (snake) and the [Make](https://www.gnu.org/software/make/manual/), a rule-based automation tool \n",
    "\n",
    "Good practice: one step, one rule\n",
    "\n",
    "<img src=\"images/FAIR_WF_rule_concept_en.png\" alt=\"snakemake\" width=40%/>\n",
    "\n",
    "A rule is defined by it name and may contain **directives**:\n",
    "- `input:` list one or more file names\n",
    "- `output:` list one or more file names\n",
    "- command (`run:` for python ; `shell:` for shell, R, etc)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85ebbbde-82f5-4ed5-875b-4d23165391cf",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## The Snakemake rule (2/2)\n",
    "\n",
    "\n",
    "\n",
    "<img src=\"images/FAIR_WF_rule_concept_en.png\" alt=\"snakemake\" width=40%/>\n",
    "\n",
    "```\n",
    "rule myRuleName:\n",
    "   input: myInFile\n",
    "   output: myOutputFile\n",
    "   shell: \"cat < {input} > {output}\"\n",
    "```\n",
    "\n",
    "Remark: with 1 command line, use a `shell:` directive ; with many command lines, use a `run:` directive with the python `shell(”...”)` function\n",
    "\n",
    "Optional directives can be added, eg.: `params:`, `message:`, `log:`, `threads:`, ..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a5b100e-4ab1-4311-b2ea-81e757af9331",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## The data flow linkage and rules order\n",
    "\n",
    "A snakemake workflow links rules thank to the filenames of the rule input and output directives.\n",
    "\n",
    "<img src=\"images/FAIR_WF_rule_concept_en.png\" alt=\"output becomes\" width=40%/> <img src=\"images/signe_egal.png\" alt=\"input\" width=5% class=\"middle-img\"/> <img src=\"images/FAIR_WF_rule_concept_en.png\" alt=\"input\" width=40%/> \n",
    "\n",
    "Snakemake rules order: the first rule is the default target rule and specifies the result files\n",
    "\n",
    "Snakemake creates a **DAG** (directed acyclic graph) corresponding to the rules linkage"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c08211e8-64ca-4d54-977e-7bd78a7d8470",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Snakemake run options\n",
    "\n",
    "- `-s mySmk` to change the default snakefile name\n",
    "- dry-run, do not execute anything, display what would be done: `-n --dryrun`<br>\n",
    "- print the shell command: `-p --printshellcmds`<br>\n",
    "- print the reason for each rule execution: `-r --reason`<br>\n",
    "- print a summary and status of rule: `-D`<br>\n",
    "- limit the number of jobs in parallel: `-j 1` (cores: `-c 1`)<br>\n",
    "\n",
    "[all Snakemake options](https://snakemake.readthedocs.io/en/stable/executing/cli.html#all-options)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89f2540c-3afb-47ac-b30f-24a6efeff97e",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Snakemake output options\n",
    "\n",
    "- to automatically create HTML reports (`--report report.html`) with runtime statistics, a visualization of the workflow topology, used software and data provenance information (need to add the `jinja2` package as a dependency)<br>\n",
    "- use the `--archive` option (need git) to save your project\n",
    "- complete workflow (`--dag`) or rules dependencies (`--rulegraph`) visualizations (with the `dot` tool of the `graphviz` package):\n",
    "```\n",
    "snakemake --dag -s mySmk | dot -Tpng > mySmk_dag.png\n",
    "snakemake --rulegraph -s mySmk | dot -Tpng > mySmk_rule.png\n",
    "```\n",
    "<img src=\"images/ex1_o7_dag.png\" alt=\"DAG\" width=60%/> <img src=\"images/ex1_o7_rule.png\" alt=\"rules\" width=10%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a287263a-a1dd-4552-abfd-2def6e2d1e9e",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Snakemake environment options\n",
    "\n",
    "Snakemake supports environments on a per-rule basis (created & activated on the fly):<br>\n",
    "\n",
    "**conda:**<br>\n",
    "- add a `conda:` directive in the rule definition (_eg._ `conda: myCondaEnvironment.yml`)<br>\n",
    "- run Snakemake with the `--use-conda` option<br>\n",
    "\n",
    "**docker:**\n",
    "- add a `container:` directive in the rule definition (_eg._ `container: \"docker://biocontainers/fastqc\"`) <br>\n",
    "- run Snakemake with the `--use-singularity` and `--singularity-args \"-B /path/outside/container/:/path/inside/container/\"` options<br>\n",
    "\n",
    "**module:**<br>\n",
    "- add a `envmodules:` directive in the rule definition (_eg._ `envmodules: \"fastqc/0.11.9\"`)<br>\n",
    "- run Snakemake with the `--use-envmodules` option"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ecf402c-a3ce-4106-b936-27e843d97435",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Get a Snakefile\n",
    "\n",
    "The **snakefile** is the **text file** that encodes the rules, and so the workflow. <br>\n",
    "The command `snakemake` runs the workflow encoded in the `Snakefile` file.\n",
    "\n",
    "You can get a snakefile:<br>\n",
    "- from github, your colleagues, ...<br>\n",
    "- snakemake \"core\" ([nf-core](https://nf-co.re) equivalent) : https://snakemake.github.io/snakemake-workflow-catalog/ (2k pipelines, 177 testés)<br>\n",
    "- compose with [snakemake wrappers](https://snakemake-wrappers.readthedocs.io/)<br>\n",
    "- by using a Nextflow workflow! (integration via snakemake-wrappers)<br>\n",
    "- create from scratch <br>\n",
    "\n",
    "To run the workflow for one input: `snakemake myInFile`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ff4afc8-9456-4032-ac20-4ddc6f0bbb7e",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Snakefile: input (output) specifications\n",
    "enumerated:\n",
    "```\n",
    "rule all:\n",
    "  input: \"P10415.fasta\",\"P01308.fasta\"\n",
    "```\n",
    "\n",
    "python list & wildcards:\n",
    "```\n",
    "DATASETS=[\"P10415\",\"P01308\"]\n",
    "rule all:\n",
    "  input: [\"{dataset}.fasta\".format(dataset=dataset) for dataset in DATASETS]\n",
    "```\n",
    "\n",
    "expand() & wildcards:\n",
    "```\n",
    "DATASETS=[\"P10415\",\"P01308\"]\n",
    "rule all:\n",
    "  input: expand(\"{dataset}.fasta\",dataset=DATASETS)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cb95bbd-966b-4810-a7d1-99653a41db16",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Snakefile: generalization with wilcards\n",
    "Snakemake use _wildcards_ allow to replace parts of filename:\n",
    "- reduce hardcoding: more flexible input and output directives, work on new data without modification\n",
    "- are automatically resolved (ie. replaced by regular expression \".+\" in filenames)\n",
    "- are writing into {}\n",
    "- are specific to a rule\n",
    "\n",
    "A same file can be accessed by different matchings:<br>\n",
    "Ex. with the file `101/file.A.txt` :<br>\n",
    "rule one : `output : \"{set}1/file.{grp}.txt\" # set=10, grp=A`<br>\n",
    "rule two : `output : \"{set}/file.A.{ext}\" # set=101, ext=txt`<br>\n",
    "(more on [wildcards](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards) in the snakemake documentation)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5cda3e9f-c631-47ef-ba6e-61e05fd195c7",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "### With and without wildcards example\n",
    "\n",
    "without wildcards, `uniprot_wow.smk`:\n",
    "```\n",
    "rule get_prot:\n",
    "  output: \"P10415.fasta\", \"P01308.fasta\"\n",
    "  run :\n",
    "    shell(\"wget https://www.uniprot.org/uniprot/P10415.fasta\")\n",
    "    shell(\"wget https://www.uniprot.org/uniprot/P01308.fasta\")\n",
    "```\n",
    "\n",
    "with wildcards, `uniprot_wiw.smk`:\n",
    "```\n",
    "rule all:\n",
    "  input: \"P10415.fasta\", \"P01308.fasta\"\n",
    "\n",
    "rule get_prot:\n",
    "  output: \"{prot}.fasta\"\n",
    "  shell: \"wget https://www.uniprot.org/uniprot/{wildcards.prot}.fasta\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce5c36a3-5383-4d5e-97c1-dee60969a0f7",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Snakefile: get input file names from the file system\n",
    "\n",
    "To deduce the identifiers (eg. IDs) of files in a directory, use the inbuilt `glob_wildcards` function, eg.:\n",
    "```\n",
    "IDs, = glob_wildcards(\"dirpath/{id}.txt\")\n",
    "```\n",
    "`glob_wildcards()` matches the given pattern against the files present in the system and thereby infers the values for all wildcards in the pattern (`{id}` here).\n",
    "\n",
    "**Hint:** Don’t forget the **coma** after the name (left hand side, IDs here)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eadec312-f6db-4146-8cda-085f167fd04c",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Snakefile: Using a snakemake config file\n",
    "\n",
    "The (optional) definition of a configfile allows to parameterize the workflow as needed (`--configfile smk_config.yml`)\n",
    "\n",
    "## Subworkflows or Modules\n",
    "\n",
    "It is also possible to define external workflows as modules, from which rules can be used by explicitly “importing” them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f1fb3a4-26d8-44ae-bbfc-ed585452da50",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## The workflow example\n",
    "\n",
    "We want to manage RNAseq data with a (small) workflow with 2 steps:<br>\n",
    "<img src=\"images/FAIR_WF_2steps.png\" alt=\"a 2 steps workflow example\" width=30%/>\n",
    "\n",
    "A _classical_ analysis with `fastq.gz` data (in the `${PWD}/Data` repository) and the creation of a `${PWD}/FastQC` repository gives:<br>\n",
    "<img src=\"images/smk_anim01_sh.png\" alt=\"a 2 steps workflow example\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03354c4b-263a-461c-83c6-ab8f6c223142",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Translation in snakefile\n",
    "\n",
    "<img src=\"images/smk_anim01_sh.png\" alt=\"a 2 steps workflow example\" width=80%/><br>\n",
    "<img src=\"images/smk_anim02_rule.png\" alt=\"rule translation\" width=80%/><br>\n",
    "3 linked rules : fastQC, multiQC, all. <br>\n",
    "Wildcard: rule concerns one file  (`*` in figure)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "452900bf-c28e-4085-a9d5-996a11e43e0b",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "Snakemake create the DAG from the snakefile<br>\n",
    "<img src=\"images/smk_anim02_rule.png\" alt=\"rule translation\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31793217-81d1-4ff6-b494-ff229b12f844",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "Snakemake launch: the rule all need `multiqc_report.html` that doesn't exist but links towards the multiQC rule<br>\n",
    "<img src=\"images/smk_anim03_init.png\" alt=\"rule all\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d3e437b-3426-4ace-94e8-e32c877acc54",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "The rule multiQC need zip files that doesn't exist but links towards the fastQC rule\n",
    "<img src=\"images/smk_anim05_back2.png\" alt=\"backward fastQC\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b447b236-bb7c-463e-90d5-07380ab54a85",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "The rule fastQC need fastq.gz files <br>\n",
    "<img src=\"images/smk_anim07_back4.png\" alt=\"rule translation\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3468713-d958-465a-a8d9-e295b92313ce",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "fastq.gz files exists, snakemake stops ascending to forward the flow and execute the fastQC rule. <br>\n",
    "<img src=\"images/smk_anim08_backData.png\" alt=\"rule translation\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfb2c813-f94f-427b-b317-8302e8f6663b",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "There are 3 sequence files so snakemake launch 3 fastQC rules<br>\n",
    "<img src=\"images/smk_anim09_fwd1.png\" alt=\"rule translation\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69500618-7c17-4942-8a8c-5441a912da77",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "After 3 executions of the fastQC rule, the zip files exist and feed the multiQC rule.<br>\n",
    "<img src=\"images/smk_anim10_fwd2.png\" alt=\"rule translation\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9298c34b-bc76-4c27-adf0-c125c3dfef98",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "the multiqc_report constitutes the input file of the rule all:<br>\n",
    "<img src=\"images/smk_anim11_fwd3.png\" alt=\"rule translation\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8838cb04-b761-45fe-a5ba-a08a7c61e0fc",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Running path\n",
    "\n",
    "So the rule all is completed, and the workflow too:<br>\n",
    "<img src=\"images/smk_anim12_fwdEnd.png\" alt=\"End of the workflow\" width=80%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76f0db65-52af-477d-a9aa-98be24ac806f",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "## Timestamp\n",
    "\n",
    "Snakemake automatically makes sure that everything is up to date, otherwise it launch the jobs that need to be:\n",
    "\n",
    "- output files have to be re-created when the input file timestamp is newer than the output file timestamp <br>\n",
    "- and from this point, Snakemake goes on through the workflow and applies rules <br>\n",
    "\n",
    "<img src=\"images/FAIR_smk_wf_timestamps.png\" alt=\"backtracking\" width=100%/>\n",
    "\n",
    "**note:** in last snakemake versions, _everything_ includes mtime, params, input, software-env, code (fix with the `--rerun-triggers` option)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e51eace3-65c4-44b7-847b-3d8283380da3",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## A Snakefile example\n",
    "\n",
    "The final objective is to create a snakefile to manage a small workflow with 2 steps: i) fastqc ii) multiqc \n",
    "\n",
    "<img src=\"images/FAIR_WF_2steps.png\" alt=\"a 2 steps workflow example\" width=40%/>\n",
    "\n",
    "These 2 tools (boinformatics domain) allow to check the quality of NGS data. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "647bfb62-1299-44b5-862d-1a25becaabd5",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "### prerequisite 1: data\n",
    "\n",
    "input data to run the workflow example are reduced RNASeq reads files. Download (`wget`) data from [zenodo here](https://zenodo.org/record/3997237): get url on the _download_ button, next `gunzip` and `tar -x`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84c8f1a0-a950-42ea-a22f-d3de8b4bc04c",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%sh\n",
    "# on IFB cluster\n",
    "cd ${PWD}\n",
    "wget -q https://zenodo.org/record/3997237/files/FAIR_Bioinfo_data.tar.gz\n",
    "gunzip FAIR_Bioinfo_data.tar.gz\n",
    "tar -xvf FAIR_Bioinfo_data.tar\n",
    "rm FAIR_Bioinfo_data*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20051a6a-c31c-4b80-b469-d7e8928fadd5",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "### prerequisite 2: snakefile\n",
    "\n",
    "`smk_all_samples.smk`, get it from [FAIR_smk](https://github.com/clairetn/FAIR_smk)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0424e490-963c-4989-a8b3-0f14de8ad199",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%sh\n",
    "# on IFB cluster\n",
    "cd ${PWD}\n",
    "git clone https://github.com/clairetn/FAIR_smk"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b38bec5-b8dc-4a35-a38d-b84ab3512249",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "### prerequisite 3: conda environment\n",
    "\n",
    "only if use with conda: use `envfair.yml` in the `FAIR_smk` repository cloned to create the conda environment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5b2a460-04fd-43f1-b461-f37636226fe8",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### pre-requite 4: Snakemake\n",
    "\n",
    "Laptop with docker:\n",
    "```\n",
    "save_jupylab_smk.tar # get the docker image archive\n",
    "docker load < save_jupylab_smk.tar # create the docker image\n",
    "docker run --rm -v ${PWD}:/home/jovyan -w /home/jovyan --user \"$(id -u):$(id -g)\" -p 8888:8888 test/jupylab_smk:1.0\n",
    "```\n",
    "Laptop with conda:\n",
    "```\n",
    "conda create env -f envfair.yml\n",
    "conda activate envfair\n",
    "```\n",
    "IFB core cluster (_version 7.8.2 of the docker container is not available_):\n",
    "```\n",
    "module load snakemake/7.7.0 fastqc/0.11.9 multiqc/1.12\n",
    "```\n",
    "check with: `snakemake --version`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bc8d354-2adf-49c5-b3ae-a787902a036f",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "### prerequisite 4: Snakemake & tools\n",
    "\n",
    "with `module load` on IFB core cluster (_version 7.8.2 of the docker container is not available_):\n",
    "```\n",
    "module load snakemake/7.7.0 fastqc/0.11.9 multiqc/1.12\n",
    "```\n",
    "check with: `snakemake --version`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "028f1583-9f4e-40f8-8904-b0db6d167168",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%sh\n",
    "# on IFB cluster\n",
    "cd ${PWD}\n",
    "module load snakemake/7.7.0\n",
    "snakemake --version "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "692e7742-4215-4abb-b67f-e286dbc03c02",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "### run the workflow\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "855a806f-630f-4cc7-a338-a1327444ecc2",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%sh\n",
    "# on IFB cluster\n",
    "cd ${PWD}\n",
    "module load snakemake/7.7.0 fastqc/0.11.9 multiqc/1.12\n",
    "snakemake -s FAIR_smk/smk_all_samples.smk \\\n",
    "          --configfile FAIR_smk/smk_config.yml -c1 -p"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fbb58ad-39fd-46d0-9893-c375a2227f66",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Conclusion\n",
    "\n",
    "With snakemake, you can launch the same snakefile (adapting the snakemake config file) on your laptop or on a computing cluster.\n",
    "\n",
    "\n",
    "Other ressources:\n",
    "- a formation to [create the workflow step-by-step](https://moodle.france-bioinformatique.fr/mod/resource/view.php?id=68)\n",
    "- the workflow composed with snakemake wrappers cf.  [ex1_o8_wrapper_linted.smk](https://github.com/clairetn/FAIR_smk)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}