introduction to the management of the execution environment with conda¶

conda

Why using an environment manager?¶

Konrad Hinsen:

“Code” is the code you care about. “Environment” is code you don’t care about. 

The problem is that all of “the environment” influences the results produced by the code you care about ...

One solution: use an environment manager!

  • avoid compilation and dependencies problems: an environment manager will take care of everything!
  • have several environments in parallel, each with their own set of tools (or version)
  • useful when cross-tools dependencies are incompatible with each other

Conda definitions¶

  • Environment: a set of packages/tools in a directory (added to your PATH)
  • Conda: an open source package + a general-purpose environment management system (installation, execution, upgrade). For any programming language, multi-platform (Windows, MacOS, Linux)
  • Conda package: a compressed tarball of a tool
  • Conda channel: the location where packages are stored

Conda access¶

Conda distributions:

  • Conda: environment management system, comes with a lot of packages
  • Miniconda3: comes without installed packages (except conda)
  • Mamba: a conda C-port (faster, more reproducible, & light-weighted version of conda)
  • Micromamba: it is to mamba what miniconda is to conda
    ==> conda & mamba commands are transparent

How to download conda packages

  • Anaconda/"conda hub" (private company) a data science platform, stores conda packages of many domains (Machine Learning, Data Visualization, Dashboarding-web, Image Processing, Natural Language Processing, etc)
  • made up of channels/owners. Each channel contains one or more conda packages
  • be careful when downloading any package from an untrusted source, always inspect before installation

About channels¶

Some conda channels:

  • default (not actually used, require a paid license)
  • conda-forge: many popular conda packages (python but also R, perl, C, rust, ...)
  • bioconda: bioinformaticians’ contributions
  • private

Channels list order:

  • when different channels have the same package ⇒ collisions
  • collisions resolved following the order of your channels list + "strict channel priority" enabled

Conda and R¶

The R interpreter is included in the r-essentials packages (200 r-packages). Add r- before the regular R package name (eg. r-ggplot2)

Conda limitations¶

Conda enables the management (definition, creation, archiving) of software environments

but ... is still based on the OS!

This OS part is therefore missing to be fully reproducible but could be managed by a container solution.

Other software environment managers, such as guix, are also fully reproducible.

pixi

Conda commands¶

  • initialisation of the shell: conda init bash
  • creation of a conda environment: conda create env -n myenv
  • list environments (* for the active one): conda info --envs
  • activate the myenv environment: conda activate myenv
  • list packages (only in an active environment): conda list
  • installation of a tool/package: conda install package
  • suppress a package from the environment: conda remove package
  • suppress the myenv environment: conda env remove -n myenv
  • inactivate the environment: conda deactivate

note: with the miniconda3 distribution environments are installed by default in a ~/miniconda3/envs/ repository

2 ways of using conda¶

interactive

  • create an environment
  • activate the environment
  • install some conda packages

configuration file

  • list all conda packages in a configuration file (yml or json format)
  • create the environment based on the configuration file (option -f)
  • activate the environment

reproducibility point of view:

  • use a configuration file (with the version of the package: <channel>::<package>=<version>)
  • set channel priority to strict
  • save your conda creation environment files with your codes

a practical example¶

You will:

1. if not yet done, initialize your conda shell
2. check that the tool is not present in your environment
3. create a config file to get the tool from conda
4. create a conda environment for the tool
5. activate the environment
6. use the tool (version)
7. quit

For example choose the multiqc tool, which is often used in NGS analyses.

Search the tool in the "conda hub"/Anaconda platform, identify the channel & version.

Next, you will use multiqc by the way of conda.

Conda access (practice)¶

Conda is so used that it could even be installed by default to your machine. Try this: conda --version

Otherwise Conda is:

  • present in the jupyter/minimal-notebook docker container
  • already activated on the IFB cluster, but to manage some environment variables, activate it (module load conda)

Before creating Conda environment, Conda need to know your shell version. But the conda init bash command need a close and re-open the terminal that doesn't work with the undelying terminal opened at the begining of this notebook.

So for now, open a term launcher and copy/paste the notebook command line.

Before getting the tool, if you haven't already done so, initialize your shell for conda (choose bash):

conda init bash

Check the absence of multiqc:

In [4]:
%%sh
cd ${PWD}
whereis multiqc
multiqc:

Using the channel & version you found on the conda hub, edit a yml file to guide the creation of the conda environment:

cd ${PWD}
echo "name: env_multiqc_1.16\nchannels:\n  - bioconda\ndependencies:\n  - bioconda::multiqc=1.16" > env_multiqc_1.16.yml
more env_multiqc_1.16.yml

Manage the "env_multiqc_1.16" environment: 1) create 2) activate 3) use 4) quit:

conda env list # list all conda env.
conda env create -f env_multiqc_1.16.yml # create
conda env list
conda activate env_multiqc_1.16 # activate 
multiqc --help # use
conda deactivate # quit
multiqc --help # check the multiqc tool is not present

Conclusion¶

You've used conda to add a tool to your environment, and you've used it!

By specifying a configuration file for the creation of the conda environment which contains the version of the tool, you have a FAIR approach (within the limitations of conda)