docker load < ./minimal-notebookVlab-3.4.2.tar
You should now have the images loaded in Docker. Check that it's the case with :
docker images
In the list, you should now have the "jupyter/minimal-notebook".
In order to check that it's working on your laptop, we will run a "simple" command.
docker run --rm -w /data -p 8888:8888 jupyter/minimal-notebook:lab-3.4.2
NB: A more complete command could be this but we will not execute it now.
docker run --rm -v ${PWD}:/data -w /data -p 8888:8888 jupyter/minimal-notebook:lab-3.4.2
We had an image and we started a container from this image. Let's check this with this command:
docker ps -a
Exit by going back to the terminal and typing twice Ctrl + C.
During "Usecase 2 - Development" you have already discovered tools to master your development environment and to expose your work. The next step is to provide your users with a ready-to-use distribution of your tool, with all the installation details solved.
In this exercise, you will learn how to build a new image tailored to your needs. Building on top of an existing image, you will automate the installation process of a software, directly saving the result as a stable image. Once the installation procedure set up, all the steps can be collected into a “recipe” called a Dockerfile, which is a simple text file with commands. Once written, this Dockerfile can then be used to build an image on-demand.
Goal : use a Dockerfile to create an image where samtools is installed
When you buy a new computer, you have to choose the operative system that will be installed, as well as a list of software that will be available from the start.
Creating a Dockerfile is very similar. The first step is to choose the base image FROM which you will start working.
For this exercise, we will use our good old "jupyter/minimal-notebook:lab-3.4.2" as a base. It has the advantage that it's already locally present, so we don't need to pull it locally. We could have chosen one of the many Ubuntu versions or one of the other available Linux distributions. https://hub.docker.com/search?q=linux&image_filter=official
Instructions
The FROM instruction specifies the base image on top of which we will be building our own.
Docker's build command will execute the Dockefile line by line and create a new layer for each executed instruction.
Congratulations, you have just built your very first image! It's name is "test/mysamtools" and we tagged it with version 1.0. Ok, for now, it's very basic, but it's a first step.
As a final step, you can use this command to list all images available in your session:
In the displayed list, you will see your very own image called test/mysamtools, with version 1.0. Notice its size compared to the minimal-notebook.
Now that we have a base, let's build on top of it.
Samtools is a famous software allowing to perform simple tasks on SAM files like counting sequenced reads, converting formats and calculating basic statistics. Samtools is very convenient for this exercise because it has very straightforward installation procedures.
Here are the commands to install Samtools in various contexts :
Once the installation procedure is chosen, the instructions can be added to the Dockerfile. Be careful when choosing the procedure that the necessary building blocks are available in your base image. In our case, make is not installed inside our base. If they are not, don't forget to install them before starting to install your tool.
The RUN command specifies any shell command that should be run for our purpose.
The Dockerfile should read:
(If necessary, open a Terminal, navigate to the folder where the Dockerfile is located.)
Type:
Hopefully, the end of the text displayed in your terminal now looks like this:
Congratulations, you now have a brand new image with Samtools installed!
Is your image still listed with the others? What is its size, now?
We can run a simple command to test the installation. For instance, we can ask samtools to display its default help message using the -h option.
As a bonus, we can also just connect to a container with bash.
docker run --rm -ti -w /data test/mysamtools:1.0 bash
Note that for now we don't have access to any local file...
It is usually a good practice to give also the name of a MAINTAINER, responsible for the writing and maintenance of the Dockerfile.
It's also a good practice to provide a default entry point into your image with the command. for instance, by adding this at the very end of your Dockerfile:
By doing this, samtools will be run by default in a docker run command.
Specify the working directory with WORKDIR. For instance:
And of course, write as many comments as you can (# sign).
You can find more information on what can be included in a Dockerfile as well as best practices at this URL: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/.
Learn by examples: you can also explore repositories of Docker images and see how others build their Dockerfiles :
Just be careful with the images you use as base. Always use a trusted source.
Imagine yourself as a teacher designing a practical session for Snakemake. You would like to provide your students with a homogeneous environment where they could execute your nicely written Jupyter Notebooks. During the FAIR_Bioinfo course, you've practiced using the minimal-notebook image. Maybe you could try to extend it with the tools you need?
How would you do that?
Goal: build the extended image.
The image should contain:
Do all software need to be installed?
What would be your Dockerfile?
With which command would you build it? What name would you choose?
How would you test the tools?
How would you distribute the image to your students in a place where the network is not optimal?
The answer to this last question is not in this tutorial...
Now let's see the image you will build!
Goal: review the Docker structure
Docker is a command line-based software allowing to manipulate images and create application containers.
As presented in the course, Docker consists of two elements:
a client, to receive commands from the user
a server, to execute commands and manage images and containers
Docker architecture
Typing this command will give the Client and Server versions available on your computer:
Goal : see the basic structure of a Docker command and how to display the related documentation.
Usage, options and a full list of available commands can be accessed through the command line in a terminal.
Type the following command (outside of JupyterLab).
The general usage of a Docker command line is as follows:
If you consider the first line, a Docker command is made of four parts.
The ‘docker’ keyword
[OPTIONS] : optional parameters to run the client
COMMAND : the name of a command to be run by the client
[arg…] : parameters for the chosen command
A COMMAND is a specific task that can be executed by Docker. In this tutorial, we are particularly interested in the commands pull, run, rm/rmi, build and save. All commands have a dedicated documentation page.
Try to access the documentation of one specific Docker command
Questions :
How many arguments are absolutely required by the command ‘docker pull’ ?
Do you remember what a registry is?
Goal: On the Docker hub, find and download the ‘FastQC’ image made by the BioContainers initiative.
Exercise :
In a web browser, navigate to the DockerHub : https://hub.docker.com/
In the top search bar, type : biocontainers
The Biocontainers organization made many images available to the scientific community. We will have a look at the available images.
Question :
How many times was the image downloaded ?
Exercise :
Among the list of images, find FastQC.
Click on the name to access the description page for this image.
Note that we could also have searched for biocontainers/fastqc in the search field.
Question :
On the image description page, can you find the ‘pull’ command ?
Exercise :
Copy the pull command
Execute the command inside a terminal.
You will get an error as this image has no default tag (“latest”). So we need to specify one in the command line.
Exercise :
Go to the “Tags” tab and copy the pull command of version v0.11.9_cv8
Question :
How many times do you see ‘Pull complete’ displayed ? Why ?
You can find more information on layers here: https://docs.docker.com/storage/storagedriver/#images-and-layers
Note that if you execute the pull command a second time, Docker will not re-download it (unless you force him to do it). Instead, you will see a message about its status.
Exercise :
Now, to be sure that the image was correctly pulled, let’s see the list of all available downloaded images inside our workspace.
Question :
What is the size of the biocontainers/fastqc image ?
Optional exercise :
Display the detailed description of the image.
What is displayed on the terminal is a description of the image in JSON format. You can find more information on the Docker website: https://docs.docker.com/engine/reference/commandline/inspect/.
Goal: Run a container from a pulled image.
Among the Docker commands, we will now use the ‘run’ command.
Question :
What are the options and parameters of the ‘run’ command ?
As displayed in the terminal, the description of the command is ‘Run a command in a new container’.
Question :
What is the difference between an image and a container ?
See the Docker documentation https://docs.docker.com/get-started/overview/#docker-objects
FastQC has been installed as a global program in the image we pulled. Consequently, it is directly accessible when interacting with the image.
Exercise :
Now, to run the application, execute the following command:
Question :
What was displayed on the terminal ? Is it a message from Docker or from FastQC ?
Congratulations! You just successfully downloaded and used your first Docker image !
Goal: map a local folder and run FastQC on a provided FastQ file.
Running FastQC without parameters was interesting as a demonstration of Docker’s features. But if we want to really run FastQC, we also need to provide parameters and, most importantly, input files. Because of encapsulation, this functionality is not available by default. We need to explicitly map a local folder (a “volume”) to a corresponding one inside the container (usually /home/), binding the two separated “worlds” together.
To achieve this mapping, we use the –v option of the run command. Before trying it, you can check the command options once again.
Download and unzip the relevant files: https://zenodo.org/record/3997237 inside your current folder. You should have a folder named Data, containing fastq.gz files (among other files).
Exercise : find the paths to bind
To bind our current folder to the /data/ folder located inside a container, we first need the absolute path of the current folder, obtained through the unix pwd command.
This path will be used in further commands through ${PWD}.
Then we need to know where to bind it, i.e. what are the available Volumes of the image. Inspect the image to know which volume is available for binding.
Instead of running the fastqc command, as a first step, we will now just list the content of the /data/ folder inside the container.
If nothing appears, it normal: the folder is empty and only serves as a “branching point”.
We now have the paths of the two folders we want to bind together.
Exercise : bind the two paths
To perform the folder mapping between the current folder and /data inside the image, the syntax is simple. The –v option takes one parameter made out of the two paths we want to map concatenated together, with the character : as separator.
Question :
Is the displayed list the same as what is in your current folder?
Finally, we can run FastQC on a FastQ file located in the Data folder. Change the name of the file to any of the provided files.
Congratulations, you know now how to run a command using a Docker image!
NB: DockerHub is the first but not the only registry available. You can also explore: