EB3I n1 2025 scRNAseq
-
Connexion at OoD
R/Rstudio : theory
-
EB3I n1 2025 scRNAseq
-
Connexion at OoD
R/Rstudio : theory
-
1 PREAMBLE
1.1 Purpose of this session
This file describes the theory inside R/Rstudio to perform data analysis, especially the first part of the single cell RNAseq data analysis training course for the EBAII n1 2025, covering these steps :
- Speaking in R
- Writing code in Rstudio
- Think about what you want with which tools?
1.2 What should you run ?
This training presentation, written in Rmarkdown, contains many chunks ( = code blocks)
You do not need to run them all !
Chunks follow a color scheme :
3 Start Rstudio
- Create a Rstudio session with the right resource requirements, thanks to the cheat sheet.
3.1 First sight
- A : global functions tools
- B : Quick-launch functions 1
- C : Script pane
- D : Console pane
- E : Monitoring console
- F : Work folder
- G : Work session
- H : Quick-launch function 2
- I : Environment console pane
- J : Environment monitoring console
- K : Multi-widgets interface
- L : Quick-launch function 3
3.2 Console Pane (D + E)
This is a standard R console. Open your bash terminal, enter the following command: R, and you will get the same console.
Warning: Here, we are in a RStudio session powered by the IFB. Your local RStudio might differ: the version of R, the list of available packages, etc. On your local machine, RStudio console will match with the R available in your terminal.
Let’s try to enter the command print():
Show output
[1] "Hello World"
We just used a function, called print. This function tries to print on screen everything provided between parenthesis ( and ). In this particular case, we gave the character string “Hello World”, and the function print successfully printed it on screen !
Now click on Session -> Save Workspace as and save the current work space. What happened in the R console pane? You saw it! A command has been automatically written. For me, it is:
When you need help with R, whether on a function error, on a script result or anything alike, please save your work space and send-it to your favorite R-developer. This contains everything you did in your session.
Info: There is a syntax coloration, there is a good autocompletion and parameter suggestion. If I ever see anyone writing down a complete command without typing the tabulation key, then I’ll have to steal their dessert. And I’m always hungry enough for desserts.
3.3 Environment/History/Connection/Git (F + G + H + I + J)
3.3.1 Environment
This pane has three tabs: Environment, History and Connections.
Environment lists every single variable, object or data loaded in R. This includes only what you typed yourself and does not include environment variables. Example; in you console pane, enter the following command:
What happened in the Environment pane ? You’re right: a variable is now available!
When a more complex object is declared in your work space, then some general information may be available. Example:
You can see the dataframe. Click on it to have a preview of the data it contains, then click on the light-blue arrow have a deeper insight of its content:
Now click on Session -> Clear Work space: and see your work disappear. This action cannot be undone. While it is useful to clear one work space from time to time in order to avoid name space collisions, it is better to save your work space before.
3.4 History
This tab is quite important: while you test and search in the console, your history keeps a track of each command line you entered. This will definitely help you to build your scripts, to pass your command lines to your coworkers, and to revert possible unfortunate errors.
Each history is related to a session. You may see many commands in your history. Some of them are not even listed in your console. R Studio in writes there every command, even the ones that were masked for the sake of your eyes (knitting commands, display commands, help commands, etc.)
Your history has a size limit. This limit is set by an environment variable called R_HISTSIZE (standing for: R History Size). It may be checked with the function Sys.getenv() and set with the function Sys.setenv():
3.5 The Help/Plot/Packages/File pane (K + L)
3.5.1 Help
This is maybe the most important pane of your R Studio. THIS is the difference between R Studio and another code editor. Search for any function here and not on the internet. This pane shows you the available help for YOUR version of R, YOUR version of a given package.
Different versions may have different default parameters and interfaces. Please be sure over the internet, to copy and type commands that are not harmfull for your computer.
3.5.2 File
Just like any file explorer, we can move accross directories, create folders and file, delete them, etc.
Where am I ?
Show output
[1] "/media/mna_bioinfo/MNA2_Stockage/EBAII"
Or use the function dir.create():
You should change your working directory right now:
Or use setwd():
3.5.3 The script pane (C)
This is where you write your R scripts. This also accepts other languages (e.g. bash, python, …), but R Studio shines for its R integration.
Please, please ! Write your commands in the Script pane, then execute them by hitting CTRL + Enter. This is very much like your lab-workbook: the history pane only keeps a limited number of function in memory while this script keeps your commands in a file on your disk. You may share it, edit it, comment it, etc.
TLDR – Too Long Didn’t Read
Graphic interface presentation :
Write command lines in Script pane
Execute command lines by hitting CTRL + Enter from script pane et see them in the console.
Have a look at the environment and history in case on the upper right pane
Search for help in the lower right pane.
4 R
4.1 Speaking R
4.1.1 Vocabulary
A variable is a container or a content named. A container is an object. A content is an element. Inside elements, we distinguish between data and metadata. We manipulate elements inside an object with functions.
We talk in R with commands to do a task in application of rules and in following best-practice guidelines. The process is not linear : it can be modified by different control structures. The whole describes an algorithm.
In the good conditions, we use a “work plane” as an IDE as Rstudio. Object have properties : types, data structure, dimension, class, functions
4.1.2 Packages
4.1.2.1 … in IFB core cluster
You may install a new package on your local computer. You shall not do it on a cluster. The IFB core cluster you are working on today is shared and highly valuable ; no one can install anything besides the official maintainers.
The following lines are written for instruction purpose and should not be used on IFB core cluster.
4.1.2.2 … in local
A package is a collection of functions organizing to make a job in a
field We can show these functions in cheat-sheets or in a tutorial. ###
How calling packages ? There are 2 methods to call a package :
library() and require()
library load in global environment and stop process if package don’t
installed Perfect at the start of a script
require load in global environment and return TRUE/FALSE if package
don’t installed Perfect for functions
4.1.3 How installing packages ?
You may install a new package on your local computer. You shall not do it on a cluster. The IFB core cluster you are working on today is shared and highly valuable ; no one can install anything besides the official maintainers.
The following lines are written for instruction purpose and should not be used on IFB core cluster.
There are different functions to install a package :
- classic
This will raise a prompt asking for simple questions : where to download from (choose somewhere in France), whether to update other packages or not.
Do not be afraid by the large amount of things prompted in the console and let R do the trick.
Alternatively, you can click Tool -> Install Packages in RStudio.
You can list installed packages with installed.packages(), and find for packages that can be updates with old.packages(). These packages can be updated with update.packages().
While the function install.packages() searches packages in the common R package list, many bioinformatics packages are available on other shared packages warehouses. Just like AppleStore and GoogleStore do not have the same applications on mobile, R has multiple sources for its packages. You need to know one of them, and one only Bioconductor.
bioconductor
devtools
remotes
Today, a package has dependencies that are also packages