EB3I n1 2025 scRNAseq
-
Connexion at OoD
R/Rstudio : theory
-
EB3I n1 2025 scRNAseq
-
Connexion at OoD
R/Rstudio : theory
-
1 PREAMBLE
1.1 Purpose of this session
This file describes the theory inside R/Rstudio to perform data analysis, especially the first part of the single cell RNAseq data analysis training course for the EBAII n1 2025, covering these steps :
- Speaking in R
- Writing code in Rstudio
- Think about what you want with which tools?
1.2 What should you run ?
This training presentation, written in Rmarkdown, contains many chunks ( = code blocks)
You do not need to run them all !
Chunks follow a color scheme :
3 Start Rstudio
- Create a Rstudio session with the right resource requirements, thanks to the cheat sheet.
3.1 First sight
- A : global functions tools
- B : Quick-launch functions 1
- C : Script pane
- D : Console pane
- E : Monitoring console
- F : Work folder
- G : Work session
- H : Quick-launch function 2
- I : Environment console pane
- J : Environment monitoring console
- K : Multi-widgets interface
- L : Quick-launch function 3
3.2 Console Pane (D + E)
This is a standard R console. Open your bash terminal, enter the following command: R, and you will get the same console.
Warning: Here, we are in a RStudio session powered by the IFB. Your local RStudio might differ: the version of R, the list of available packages, etc. On your local machine, RStudio console will match with the R available in your terminal.
Let’s try to enter the command print():
Show output
[1] "Hello World"
We just used a function, called print. This function tries to print on screen everything provided between parenthesis ( and ). In this particular case, we gave the character string “Hello World”, and the function print successfully printed it on screen !
Now click on Session -> Save Workspace as and save the current work space. What happened in the R console pane? You saw it! A command has been automatically written. For me, it is:
When you need help with R, whether on a function error, on a script result or anything alike, please save your work space and send-it to your favorite R-developer. This contains everything you did in your session.
Info: There is a syntax coloration, there is a good autocompletion and parameter suggestion. If I ever see anyone writing down a complete command without typing the tabulation key, then I’ll have to steal their dessert. And I’m always hungry enough for desserts.
3.3 Environment/History/Connection/Git (F + G + H + I + J)
3.3.1 Environment
This pane has three tabs: Environment, History and Connections.
Environment lists every single variable, object or data loaded in R. This includes only what you typed yourself and does not include environment variables. Example; in you console pane, enter the following command:
What happened in the Environment pane ? You’re right: a variable is now available!
When a more complex object is declared in your work space, then some general information may be available. Example:
You can see the dataframe. Click on it to have a preview of the data it contains, then click on the light-blue arrow have a deeper insight of its content:
Now click on Session -> Clear Work space: and see your work disappear. This action cannot be undone. While it is useful to clear one work space from time to time in order to avoid name space collisions, it is better to save your work space before.
3.4 History
This tab is quite important: while you test and search in the console, your history keeps a track of each command line you entered. This will definitely help you to build your scripts, to pass your command lines to your coworkers, and to revert possible unfortunate errors.
Each history is related to a session. You may see many commands in your history. Some of them are not even listed in your console. R Studio in writes there every command, even the ones that were masked for the sake of your eyes (knitting commands, display commands, help commands, etc.)
Your history has a size limit. This limit is set by an environment variable called R_HISTSIZE (standing for: R History Size). It may be checked with the function Sys.getenv() and set with the function Sys.setenv():
3.5 The Help/Plot/Packages/File pane (K + L)
3.5.1 Help
This is maybe the most important pane of your R Studio. THIS is the difference between R Studio and another code editor. Search for any function here and not on the internet. This pane shows you the available help for YOUR version of R, YOUR version of a given package.
Different versions may have different default parameters and interfaces. Please be sure over the internet, to copy and type commands that are not harmfull for your computer.
3.5.2 File
Just like any file explorer, we can move accross directories, create folders and file, delete them, etc.
Where am I ?
Show output
[1] "/media/mna_bioinfo/MNA2_Stockage/EBAII"
Or use the function dir.create():
You should change your working directory right now:
Or use setwd():
3.5.3 The script pane (C)
This is where you write your R scripts. This also accepts other languages (e.g. bash, python, …), but R Studio shines for its R integration.
Please, please ! Write your commands in the Script pane, then execute them by hitting CTRL + Enter. This is very much like your lab-workbook: the history pane only keeps a limited number of function in memory while this script keeps your commands in a file on your disk. You may share it, edit it, comment it, etc.
TLDR – Too Long Didn’t Read
Graphic interface presentation :
Write command lines in Script pane
Execute command lines by hitting CTRL + Enter from script pane et see them in the console.
Have a look at the environment and history in case on the upper right pane
Search for help in the lower right pane.
4 R
4.1 Speaking R
4.1.1 Vocabulary
A variable is a container or a content named. A container is an object. A content is an element. Inside elements, we distinguish between data and metadata. We manipulate elements inside an object with functions.
We talk in R with commands to do a task in application of rules and in following best-practice guidelines. The process is not linear : it can be modified by different control structures. The whole describes an algorithm.
In the good conditions, we use a “work plane” as an IDE as Rstudio. Object have properties : types, data structure, dimension, class, functions
4.1.2 Packages
4.1.2.1 … in IFB core cluster
You may install a new package on your local computer. You shall not do it on a cluster. The IFB core cluster you are working on today is shared and highly valuable ; no one can install anything besides the official maintainers.
The following lines are written for instruction purpose and should not be used on IFB core cluster.
4.1.2.2 … in local
A package is a collection of functions organizing to make a job in a
field We can show these functions in cheat-sheets or in a tutorial. ###
How calling packages ? There are 2 methods to call a package :
library() and require()
library load in global environment and stop process if package don’t
installed Perfect at the start of a script
require load in global environment and return TRUE/FALSE if package
don’t installed Perfect for functions
4.1.3 How installing packages ?
You may install a new package on your local computer. You shall not do it on a cluster. The IFB core cluster you are working on today is shared and highly valuable ; no one can install anything besides the official maintainers.
The following lines are written for instruction purpose and should not be used on IFB core cluster.
There are different functions to install a package :
- classic
This will raise a prompt asking for simple questions : where to download from (choose somewhere in France), whether to update other packages or not.
Do not be afraid by the large amount of things prompted in the console and let R do the trick.
Alternatively, you can click Tool -> Install Packages in RStudio.
You can list installed packages with installed.packages(), and find for packages that can be updates with old.packages(). These packages can be updated with update.packages().
While the function install.packages() searches packages in the common R package list, many bioinformatics packages are available on other shared packages warehouses. Just like AppleStore and GoogleStore do not have the same applications on mobile, R has multiple sources for its packages. You need to know one of them, and one only Bioconductor.
bioconductor
devtools
remotes
Today, a package has dependencies that are also packages
4.1.4 Few usefull packages for general use
4.1.5 … And in single cell/nuclei/spatial transcriptomic
4.2 Types
As any language, we use different symbols to different concepts :
- numbers (integer and real)
- character
- boolean (TRUE or FALSE)
- date
- factor
4.2.1 Numbers
With the code above, the number 3 is stored in a variable called “val1”. You can do this in R with anything. Literally anything. Whole files, pipelines, images, anything.
Maths in R works the same as your regular calculator:
Show output
[1] 6
Show output
[1] 6
Show output
[1] 2
Show output
[1] 12
Show output
[1] 3
4.2.2 Characters
Characters are delimited with quotes, either double " or single ' :
val2 <- "4"
val5 <- '5'
# The example below is a very good example of
# how to never ever name a variable.
シ <- "happy"Mathematics does not work with characters at all … Try the following:
You can try to turn characters in numbers with the function: as.numeric:
Show output
[1] 5
Show output
[1] 5
A function is a R command that is followed by parenthesis ( and ).
Between these parenthesis, we enter arguments. Use the help pane to have
information about the list of arguments expected and/or understood by a
given function.
As said previously, you can store any of the previously typed commands in a variable:
Show output
[1] 2
Please! Please! Give your variable a name understandable by humans. I
don’t want to see any of you calling their variable a, b, my_var
4.2.3 Boolean
Aside from characters and numbers, there is another very important type in R (and computer science in general): booleans. There are two boolean values : TRUE and FALSE.
Show output
[1] FALSE
Show output
[1] FALSE
Show output
[1] TRUE
4.2.4 Factor
Factor are similar as vector but… it’s a categorical vector.
factor_element <- factor(x = c("1","2","4","4","5","6","2","3"),
levels = c("1","2","3","4","5","6"))
print(factor_element)Show output
[1] 1 2 4 4 5 6 2 3
Levels: 1 2 3 4 5 6
Show output
factor_element
1 2 3 4 5 6
1 2 1 2 1 1
4.3 Data Structure
To manipulate information, we conserve data in a group and it depend of the type or a combination of type
- vector
- matrix
- array
- list
- data.frame
4.3.1 Vector
You can make vectors in R. Don’t panic, there will be no maths in this presentation.
In R, vectors are created with the function c:
Show output
[1] "1" "2" "3" "4" "10" "20"
Show output
[1] TRUE
One can select an element of the vector with squared brackets [ and
]:
Show output
[1] "1"
One can select multiple elements of a vector with ::
Show output
[1] "2" "3" "4"
Ok ? So, vector is simply data in one dimension #### Questions
Question 1: Is there a difference between these two vectors ?
Question 2: Can I include both text and numbers in a vector ?
4.3.2 Matrix/Array
Matrix is a table in two dimensions : row and column
Show output
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
To access in one element to cross row and column :
Show output
[1] 8
Array is a table in 3 and more dimension.
Basically, in 3D, your have a cube, in nD, your have a hypercube.
The advantage of matrices and arrays is that you can apply transformations to them, because these data structures accept only one type.
Show output
[,1] [,2] [,3] [,4]
[1,] 0.0000000 1.386294 1.945910 2.302585
[2,] 0.6931472 1.609438 2.079442 2.397895
[3,] 1.0986123 1.791759 2.197225 2.484907
4.3.3 Data.frame
In R, tables are created with the function data.frame:
Show output
c.1..3. c.2..4.
1 1 2
2 3 4
You can rename columns and row names respectively with function
colnames() and rownames().
colnames(dataframe_numbers) <- c("Col_1_3", "Col_2_4")
rownames(dataframe_numbers) <- c("Row_1_2", "Row_3_4")
print(dataframe_numbers)Show output
Col_1_3 Col_2_4
Row_1_2 1 2
Row_3_4 3 4
You can access a column and a line of the data frame using squared
brackets [ and ]. Use the following syntax: [row, column]. Use
either the name of the row/column or its position.
Show output
Col_1_3 Col_2_4
Row_1_2 1 2
Show output
Col_1_3 Col_2_4
Row_1_2 1 2
Show output
[1] 1 3
Show output
[1] 1 3
Show output
[1] 1
Show output
[1] 1 3
If you like maths, you will remember the order [row, column]. If
you’re not familiar with that, then you will do like 99% of all software
engineer: you will write [column, row], and you will get an error.
Trust me. 99%. Remember, an error is never a problem in informatics
4.3.4 List
In R, list is a collection of others data structures.
4.3.5 Question
Question : Who is what ?
4.3.5.1 Read a table as data frame
Exercise: Use the Help pane to find how to use the function read.csv.
You can find example_table.csv in
/shared/projects/2538_eb3i_n1_2025/atelier_scrnaseq/TD/
Use the function read.csv to:
- copy and open the file ./example_table.csv in your project directory.
- this table has a header (TRUE).
- this table has row names in the column called “ensembl_gene_id”.
Let all other parameters to their default values.
Save the opened table in a variable called example_table.
Now let us explore this dataset.
We can click on environment pane :
Be careful, large table may hang your session.
Alternatively, we can use the function head which prints the first
lines of a table:
Show output
ensembl_gene_id gene_name sample_1 sample_2 sample_3 sample_4 sample_5
1 ENSMUSG00000000001 Gnai3 22 33 24 16 24
2 ENSMUSG00000000003 Pbsn 0 0 0 0 0
3 ENSMUSG00000000028 Cdc45 5 11 8 11 9
4 ENSMUSG00000000031 H19 5483 5912 6876 4215 3200
5 ENSMUSG00000000037 Scml2 0 1 0 2 1
6 ENSMUSG00000000049 Apoh 3 3 3 2 0
sample_6 sample_7 sample_8 sample_9 sample_10 sample_11
1 33 39 28 17 70 15
2 0 0 0 0 0 0
3 5 7 7 7 11 3
4 3187 3076 1982 1236 3282 3159
5 1 0 0 0 0 0
6 1 3 1 0 1 4
Show output
# A tibble: 6 × 13
ensembl_gene_id gene_name sample_1 sample_2 sample_3 sample_4 sample_5
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 ENSMUSG00000000001 Gnai3 22 33 24 16 24
2 ENSMUSG00000000003 Pbsn 0 0 0 0 0
3 ENSMUSG00000000028 Cdc45 5 11 8 11 9
4 ENSMUSG00000000031 H19 5483 5912 6876 4215 3200
5 ENSMUSG00000000037 Scml2 0 1 0 2 1
6 ENSMUSG00000000049 Apoh 3 3 3 2 0
# ℹ 6 more variables: sample_6 <dbl>, sample_7 <chr>, sample_8 <chr>,
# sample_9 <dbl>, sample_10 <chr>, sample_11 <chr>
The function summary describes the dataset per sample:
Show output
ensembl_gene_id gene_name sample_1 sample_2
Length:28 Length:28 Length:28 Min. : 0.00
Class :character Class :character Class :character 1st Qu.: 0.75
Mode :character Mode :character Mode :character Median : 7.50
Mean : 351.89
3rd Qu.: 38.50
Max. :5912.00
sample_3 sample_4 sample_5 sample_6
Length:28 Length:28 Min. : 0.00 Min. : 0.00
Class :character Class :character 1st Qu.: 1.75 1st Qu.: 1.00
Mode :character Mode :character Median : 11.50 Median : 7.50
Mean : 285.75 Mean : 269.43
3rd Qu.: 49.25 3rd Qu.: 49.25
Max. :3660.00 Max. :3187.00
sample_7 sample_8 sample_9 sample_10
Length:28 Length:28 Min. : 0.0 Length:28
Class :character Class :character 1st Qu.: 0.0 Class :character
Mode :character Mode :character Median : 10.5 Mode :character
Mean : 195.5
3rd Qu.: 47.0
Max. :3278.0
sample_11
Length:28
Class :character
Mode :character
Have a look at the summary of the dataset per gene, using the function t to transpose:
Show output
[,1] [,2] [,3]
ensembl_gene_id "ENSMUSG00000000001" "ENSMUSG00000000003" "ENSMUSG00000000028"
gene_name "Gnai3" "Pbsn" "Cdc45"
sample_1 "22" "0" "5"
sample_2 " 33" " 0" " 11"
sample_3 "24" "0" "8"
sample_4 "16" "0" "11"
[,4] [,5] [,6]
ensembl_gene_id "ENSMUSG00000000031" "ENSMUSG00000000037" "ENSMUSG00000000049"
gene_name "H19" "Scml2" "Apoh"
sample_1 "5483" "0" "3"
sample_2 "5912" " 1" " 3"
sample_3 "6876" "0" "3"
sample_4 "4215" "2" "2"
[,7] [,8] [,9]
ensembl_gene_id "ENSMUSG00000000056" "ENSMUSG00000000058" "ENSMUSG00000000078"
gene_name "Narf" "Cav2" "Klf6"
sample_1 "18" "54" "184"
sample_2 " 24" " 71" " 169"
sample_3 "33" "63" "322"
sample_4 "45" "66" "223"
[,10] [,11] [,12]
ensembl_gene_id "ENSMUSG00000000085" "ENSMUSG00000000088" "ENSMUSG00000000093"
gene_name "Scmh1" "Cox5a" "Tbx2"
sample_1 "154" "2341" "4"
sample_2 " 199" "2788" " 4"
sample_3 "153" "2597" "4"
sample_4 "154" "2735" "3"
[,13] [,14] [,15]
ensembl_gene_id "ENSMUSG00000000094" "ENSMUSG00000000103" "ENSMUSG00000000120"
gene_name "Tbx4" "Zfy2" "Ngfr"
sample_1 "6" "0" "7"
sample_2 " 4" " 0" " 3"
sample_3 "11" "0" "7"
sample_4 "5" "0" "3"
[,16] [,17] [,18]
ensembl_gene_id "ENSMUSG00000000125" "ENSMUSG00000000126" "ENSMUSG00000000127"
gene_name "Wnt3" "Wnt9a" "Fer"
sample_1 "0" "30" "27"
sample_2 " 0" " 35" " 18"
sample_3 "0" "55" "59"
sample_4 "0" "44" "30.999"
[,19] [,20] [,21]
ensembl_gene_id "ENSMUSG00000000131" "ENSMUSG00000000134" "ENSMUSG00000000142"
gene_name "Xpo6" "Tfe3" "Axin2"
sample_1 "33.001" "34" "6"
sample_2 " 49" " 29" " 3"
sample_3 "60.001" "21" "11"
sample_4 "45" "17" "10"
[,22] [,23] [,24]
ensembl_gene_id "ENSMUSG00000000148" "ENSMUSG00000000149" "ENSMUSG00000000154"
gene_name "Brat1" "Gna12" "Slc22a18"
sample_1 "4" "26" "2"
sample_2 " 3" " 34" " 0"
sample_3 "3" "21" "1"
sample_4 "6" "19" "0"
[,25] [,26] [,27]
ensembl_gene_id "ENSMUSG00000000157" "ENSMUSG00000000159" "ENSMUSG00000000167"
gene_name "Itgb2l" "Igsf5" "Pih1d2"
sample_1 "0" "0" "0"
sample_2 " 0" " 0" " 0"
sample_3 "0" "0" "1"
sample_4 "0" "0" "0"
[,28]
ensembl_gene_id "ENSMUSG00000000168"
gene_name "Dlat"
sample_1 "360"
sample_2 " 460"
sample_3 "395"
sample_4 "457"
Show output
V1 V2 V3 V4
Length:13 Length:13 Length:13 Length:13
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
V5 V6 V7 V8
Length:13 Length:13 Length:13 Length:13
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
V9 V10 V11 V12
Length:13 Length:13 Length:13 Length:13
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
V13 V14 V15 V16
Length:13 Length:13 Length:13 Length:13
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
V17 V18 V19 V20
Length:13 Length:13 Length:13 Length:13
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
V21 V22 V23 V24
Length:13 Length:13 Length:13 Length:13
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
V25 V26 V27 V28
Length:13 Length:13 Length:13 Length:13
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
4.4 Object
In R, everything is an object !
An object is a container that has a defined class of objects having properties, such as dimensions or a data structure, and contains elements having a type and a value.
4.5 Functions
Functions are script blocks created to perform specific tasks.
We already see different functions :
as.numeric() - print() - is.vector() - data.frame() - head() -
summary() - t()
There are generic functions to specific class
#getS3method("print", "data.frame")
print.data.frame = function (x, ..., digits = NULL, quote = FALSE, right = TRUE,
row.names = TRUE, max = NULL)
{
n <- length(row.names(x))
if (length(x) == 0L) {
cat(sprintf(ngettext(n, "data frame with 0 columns and %d row",
"data frame with 0 columns and %d rows"), n), "\n",
sep = "")
}
else if (n == 0L) {
print.default(names(x), quote = FALSE)
cat(gettext("<0 rows> (or 0-length row.names)\n"))
}
else {
if (is.null(max))
max <- getOption("max.print", 99999L)
if (!is.finite(max))
stop("invalid 'max' / getOption(\"max.print\"): ",
max)
omit <- (n0 <- max%/%length(x)) < n
m <- as.matrix(format.data.frame(if (omit)
x[seq_len(n0), , drop = FALSE]
else x, digits = digits, na.encode = FALSE))
if (!isTRUE(row.names))
dimnames(m)[[1L]] <- if (isFALSE(row.names))
rep.int("", if (omit)
n0
else n)
else row.names
print(m, ..., quote = quote, right = right, max = max)
if (omit)
cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
n - n0, "rows ]\n")
}
invisible(x)
}We could be create new functions
calculate_toto <- function(x,y) {
if (x == 0 & y == x) {
return("tete a toto !")
} else {
return(x + y)
}
}
calculate_toto(x=0,y=0)Show output
[1] "tete a toto !"
Show output
[1] 1
Show output
[1] 1
Show output
[1] 2
R is your best friend but…sometimes, it want your gain time and arguments are “indexing” when you called function. BUT IT IS NOT RECOMMANDED
Show output
[1] "tete a toto !"
4.6 Control Structures
A script is just an execution flow.
We could be modify this flow with :
conditional structures
loop
intern flux structures
4.7 Special functions
Many functions are built to perform specific and optimized tasks
- Vectorise functions
Show output
[1] "Pass" "Fail" "Pass" "Fail" "Pass"
Show output
[1] "Pass"
- Iterative applicative functions
my_list <- list(a = 1:5, b = 6:10, c = 11:15)
result_lapply <- lapply(my_list, mean)
print(result_lapply)Show output
$a
[1] 3
$b
[1] 8
$c
[1] 13
Show output
a b c
3 8 13
Show output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Show output
[1] 2 5 8
Show output
[1] 4 5 6
- Functional functions
numbers <- 1:10
# Garder seulement les nombres pairs
even_numbers <- Filter(function(x) x %% 2 == 0, numbers)
print(even_numbers)Show output
[1] 2 4 6 8 10
list1 <- list(1, 2, 3)
list2 <- list(10, 20, 30)
# Ajouter les éléments correspondants
result <- Map(function(x, y) x + y, list1, list2)
print(result)Show output
[[1]]
[1] 11
[[2]]
[1] 22
[[3]]
[1] 33
4.8 Saves
Rstudio are two formats : .rds and .Rdata
4.8.1 Save
While working on your projects and leaning this week, you will process datasets in R. The results of these analyses will be stored on variables. This means, that when you close RStudio, some of this work might be lost.
We already saw the function save.image() to save a complete copy of
your working environment.
However, you can save only the content of a give variable. This is useful when you want to save the result of a function (or a pipeline) but not the whole 5 hours of work you’ve been spending on how-to-make-that-pipeline-work-correctly.
The format is called: RDS for R Data Serialization. This is done with
the function saveRDS():
4.8.2 Load
You can also load a RDS into a variable. This is useful when you receive
a RDS from a coworker, or you’d like to keep going your work from a
saved point. This is done with the function readRDS():
Show output
# A tibble: 6 × 13
ensembl_gene_id gene_name sample_1 sample_2 sample_3 sample_4 sample_5
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 ENSMUSG00000000001 Gnai3 22 33 24 16 24
2 ENSMUSG00000000003 Pbsn 0 0 0 0 0
3 ENSMUSG00000000028 Cdc45 5 11 8 11 9
4 ENSMUSG00000000031 H19 5483 5912 6876 4215 3200
5 ENSMUSG00000000037 Scml2 0 1 0 2 1
6 ENSMUSG00000000049 Apoh 3 3 3 2 0
# ℹ 6 more variables: sample_6 <dbl>, sample_7 <chr>, sample_8 <chr>,
# sample_9 <dbl>, sample_10 <chr>, sample_11 <chr>
But there are other solutions :
fst,qs,feather,parquetcsv,tsv,xlsx,yaml,jsonSQLite,DuckDB,HDF5,MonetDB Lite,Feather IPC/Arrow IPC streams,gz,bz2,xz,zipbinaire
4.10 Writing code in Rstudio
Ok, it’s time to write R code. But how do you do that ? Like a writing a book !
Best practises :
- https://style.tidyverse.org/syntax.html
- Comment
- Script structure
###############################################################################
# Script name : analysis_pipeline.R
# Author : Your Name
# Date : 2025-11-15
# Purpose : Example of clean R script structure
###############################################################################
########################
# 0. Load packages ----
########################
########################
# 1. Define parameters ----
########################
########################
# 2. Define custom functions ----
########################
########################
# 3. Load data ----
########################
########################
# 4. Data cleaning ----
########################
########################
# 5. Analysis ----
########################
########################
# 6. Visualizations ----
########################
########################
# 7. Save outputs ----
########################