“Code” is the code you care about. “Environment” is code you don’t care about.
The problem is that all of “the environment” influences the results produced by the code you care about ...
One solution: use an environment manager!
Conda distributions:
How to download conda packages
Some conda channels:
Channels list order:
The R interpreter is included in the r-essentials packages (200 r-packages). Add r-
before the regular R package name (eg. r-ggplot2
)
Conda enables the management (definition, creation, archiving) of software environments
but ... is still based on the OS!
This OS part is therefore missing to be fully reproducible but could be managed by a container solution.
Other software environment managers, such as , are also fully reproducible.
conda init bash
conda create env -n myenv
conda info --envs
conda activate myenv
conda list
conda install package
conda remove package
conda env remove -n myenv
conda deactivate
note: with the miniconda3 distribution environments are installed by default in a ~/miniconda3/envs/
repository
interactive
configuration file
yml
or json
format)-f
)reproducibility point of view:
<channel>::<package>=<version>
)You will:
1. if not yet done, initialize your conda shell
2. check that the tool is not present in your environment
3. create a config file to get the tool from conda
4. create a conda environment for the tool
5. activate the environment
6. use the tool (version)
7. quit
For example choose the multiqc tool, which is often used in NGS analyses.
Search the tool in the "conda hub"/Anaconda platform, identify the channel & version.
Next, you will use multiqc by the way of conda.
Conda is so used that it could even be installed by default to your machine.
Try this: conda --version
Otherwise Conda is:
jupyter/minimal-notebook
docker containermodule load conda
)Before creating Conda environment, Conda need to know your shell version. But the conda init bash
command need a close and re-open the terminal that doesn't work with the undelying terminal opened at the begining of this notebook.
So for now, open a term launcher and copy/paste the notebook command line.
Before getting the tool, if you haven't already done so, initialize your shell for conda (choose bash):
conda init bash
Check the absence of multiqc:
%%sh
cd ${PWD}
whereis multiqc
multiqc:
Using the channel & version you found on the conda hub, edit a yml file to guide the creation of the conda environment:
cd ${PWD}
echo "name: env_multiqc_1.16\nchannels:\n - bioconda\ndependencies:\n - bioconda::multiqc=1.16" > env_multiqc_1.16.yml
more env_multiqc_1.16.yml
Manage the "env_multiqc_1.16" environment: 1) create 2) activate 3) use 4) quit:
conda env list # list all conda env.
conda env create -f env_multiqc_1.16.yml # create
conda env list
conda activate env_multiqc_1.16 # activate
multiqc --help # use
conda deactivate # quit
multiqc --help # check the multiqc tool is not present
You've used conda to add a tool to your environment, and you've used it!
By specifying a configuration file for the creation of the conda environment which contains the version of the tool, you have a FAIR approach (within the limitations of conda)