Connect to RStudio

RStudio is a graphical interface designed for R programming language. Let us connect to RStudio and then, we will explore its content.

IFB web RStudio

Clikc → https://rstudio.cluster.france-bioinformatique.fr

rstudio_ifb

Enter the user name and password, then sign-in !

This is a RStudio web sever hosted by French Institute of Bioinformatics. You cannot install anything there, your work is stored there (in France), and must be declared as a research project to the IFB.

IFB Jupyter lab

Click → https://jupyterhub.cluster.france-bioinformatique.fr/hub/

Enter the user name and password, then sign-in !

jupyter_ifb

In Jupyterlab, you can do more than R, but we won’t cover it on this session. Just click on RStudio button and enjoy. This is a RStudio web sever hosted by French Institute of Bioinformatics. You cannot install anything there, your work is stored there (in France), and must be declared as a research project to the IFB.

Use your own RStudio server

Open your favorite terminal, and simply write:

rstudio

And that’s all. This is you local RStudio. No one can easily access it, so it will be more difficult to share your work. You also rely on your (small) computer rather than a big computing cluster. However, you can install anything you want.

Rstudio

First sight

rstudio

Studio displays 4 large panes. Their position may be changed based on your preference. Here are mines:

Pane names and postions
left right
upper Script pane Environment/History pane
lower Console pane Help/Files

Info: your four panes may be blank while two of mines are filled with text. We’ll come on that later.

The console pane (lower left)

console_pane_image

This is a simple R console. Open your bash terminal, enter the following command: R, and you will get the same console.

Warning: Here, we are in a RStudio session powered by the IFB. Your local RStudio might differ: the version of R, the list of available packages, etc. On your local machine, RStudio console will match with the R available in your terminal.

Let’s try to enter the command print():

print("Hello World")
[1] "Hello World"

We just used a function, called print. This function tries to print on screen everything provided between parenthesis ( and ). In this particular case, we gave the character string "Hello World", and the function print successfully printed it on screen !

Now click on Session -> Save Workspace as and save the current work space. What append in the R console pane? You saw it! A command has been automatically written. For me, it is:

save.image("./SingleCell.RData")

When you need help with R, whether on a function error, on a script result or anything alike, please save your work space and send-it to your favorite R-developer. This contains everything you did in your session.

Info: There is a syntax coloration, there is a good autocompletion and parameter suggestion. If I ever see anyone writing down a complete command without typing the tabulation key, then I’ll have to steal their dessert. And I’m always hungry enough for desserts.

The environment/history pane (upper right)

envhit_pane

Environment

This pane has three tabs: Environment, History and Connections.

Environment lists every single variable, object or data loaded in R. This includes only what you typed yourself and does not include environment variables. Example; in you console pane, enter the following command:

zero <- 0;  # May also be written zero = 0

What append in the Environment pane ? You’re right: a variable is now available!

env_my_var

When a more complex object is declared in your work space, then some general information may be available. Example:

small_table <- data.frame("col_a"=c(1, 3), "col_b"=c(2, 4));

You can see the dataframe. Click on it to have a preview of the data it contains, then click on the light-blue arrow have a deeper insight of its content:

df_expanded_env

Now click on Session -> Clear Work space: and see your work disappear. This action cannot be undone. While it is useful to clear one work space from time to time in order to avoid name space collisions, it is better to save your work space before.

History

goto_history

This tab is quite important: while you test and search in the console, your history keeps a track of each command line you entered. This will definitely help you to build your scripts, to pass your command lines to your coworkers, and to revert possible unfortunate errors.

Each history is related to a session. You may see many commands in your history. Some of them are not even listed in your console. R Studio in writes there every command, even the ones that were masked for the sake of your eyes (knitting commands, display commands, help commands, etc.)

Your history has a limit. This limit is set by an environment variable called R_HISTSIZE (standing for: R History Size). It may be checked with the function Sys.getenv() and set with the function Sys.setenv():

Sys.getenv("R_HISTSIZE")
Sys.setenv(R_HISTSIZE = new_number)

The help/plot/file pane

help_file_pane

Help

This is maybe the most important pane of your R Studio. THIS is the difference between R Studio and another code editor. Search for any function here and not on the internet. This pane shows you the available help for YOUR version of R, YOUR version of a given package.

Concurrent version might have both different default parameters and different interfaces. Please be sure over the internet, to copy and type commands that are not harmfull for your computer.

Never ever copy code from the internet right to your terminal. Why? Example: https://www.wizer-training.com/blog/copy-paste

File

Just like any file explorer, we can move accross directories, create folders and file, delete them, etc.

create_working_dir_rstudio

Or use the function dir.create():

dir.create("Intro_R")

You should change your working directory right now:

setwd

Or use setwd():

setwd("Intro_R")

You can send data from your computer to a distant RStudio (e.g. on the IFB):

upload

You can delete files:

download

or use the function file.remove():

file.remove("example.txt")

The script pane (upper left)

script_pane

This is where you write your R scripts. This also accepts other languages (e.g. bash, python, …), but R Studio shines for its R integration.

Please, please ! Write your commands in the Script pane, then execute them by hitting CTRL + Enter. This is very much like your lab-workbook: the history panel only keeps a limited number of function in memory while this script keeps your commands in a file on your disk. You may share it, edit it, comment it, etc.

script_pane

TLDR – Too Long Didn’t Read

Graphic interface presentation :

  1. Write command lines in Script pane (upper left)
  2. Execute command lines by hitting CTRL + Enter from script pane et see them in the console.
  3. Have a look at the environment and history in case on the upper right pane
  4. Search for help in the lower right pane.

R – Basics

Variables and types

Numbers

Remember, a variable is the name given to a value stored in memory. Example 3, the number three, exists in R. You can store it in a variable with the arrow operator <-:

three <- 3

With the code above, the number 3 is stored in a variable called “three”. You can do this in R with anything. Literally anything. Whole files, pipelines, images, anything.

Maths in R works the same as your regular calculator:

3 + three # Add
[1] 6
1 - 2 # Subtract
[1] -1
4 / 2 # Divide
[1] 2
3 * 4 # Multiply
[1] 12
7 %/% 2 # Floor division
[1] 3

Characters

Characters are delimited with quotes: either double " or ' simple:

four <- "4"
five <- '5'

# The example below is a very good example of
# how to never ever name a variable.
シ <- "happy"

Mathematics do not work with characters at all … Try the following:

"4" + 1
four + 1

You can try to turn characters in numbers with the function: as.numeric:

as.numeric("4") + 1
[1] 5
as.numeric(four) + 1
[1] 5

A function is a R command that is followed by parenthesis ( and ). Between these parenthesis, we enter arguments. Use the help pane to have information about the list of arguments expected and/or understood by a given function.

As said previously, you can store any of the previously typed commands in a variable:

five <- as.numeric("4") + 1
two <- 1 + (0.5 * 2)
print(five)
[1] 5
print(two)
[1] 2

Please! Please! Give your variable a name understandable by humans. I don’t want to see any of you calling their variable “a”, “b”, my_var”, …

Tricky Question:

I have two numbers: mysterious_number_7, and suspicious_number_7. When I apply the function print on them, it return 7. They are both numeric. However, they are not equal … Why ?

# Show the value of the variable mysterious_number_7
print(mysterious_number_7)
[1] 7
# Show the value of the number suspicious_number_7
print(suspicious_number_7)
[1] 7
# Check that mysterious_number_7 is a number
is.numeric(mysterious_number_7)
[1] TRUE
# Check that suspicious_number_7 is a number
is.numeric(suspicious_number_7)
[1] TRUE
# Check that values of mysterious_number_7 and suspicious_number_7 are equal
mysterious_number_7 == suspicious_number_7
[1] FALSE
# Check that values of mysterious_number_7 and suspicious_number_7 are identical
identical(mysterious_number_7, suspicious_number_7)
[1] FALSE

We will talk about difference between equality and identity later.

Answer

This is due to the number of digits displayed in R. You are very likely to have issues with that in the future, as all (bio)informatician around the world.

mysterious_number_7 <- 7.0000001
suspicious_number_7 <- 7
print(mysterious_number_7)
[1] 7
print(suspicious_number_7)
[1] 7
mysterious_number_7 == suspicious_number_7
[1] FALSE
identical(mysterious_number_7, suspicious_number_7)
[1] FALSE

You can change the number of displayed digits with the function options(): options(digits=100)

Boolean

Aside from characters and numeric, there is another very important type in R (and computer science in general): booleans. There are two booleans: TRUE and FALSE.

3 > 4
[1] FALSE
10 < 2
[1] FALSE
5 < 10
[1] TRUE

Data structures

Vector

You can make vectors and tables in R. Don’t panic, there will be no maths in this presentation.

In R, vectors are created with the function c:

one2three <- c("1", "2", "3", "4", "10", "20")
print(one2three)
[1] "1"  "2"  "3"  "4"  "10" "20"
is.vector(one2three)
[1] TRUE

One can select an element of the vector with squared brackets [ and ]:

one2three[1]
[1] "1"

One can select multiple elements of a vector with ::

one2three[2:4]
[1] "2" "3" "4"

Questions

Question 1: Is there a difference between these two vectors ?

c_vector <- c("1", "2", "3", "3")
n_vector <- c( 1,   2,   3,   3 )
Answer

There is a difference indeed: c_vector contains characters, n_vector contains numeric.

print(c_vector)
[1] "1" "2" "3" "3"
print(n_vector)
[1] 1 2 3 3
print(is.numeric(c_vector))
[1] FALSE
print(is.numeric(n_vector))
[1] TRUE
identical(c_vector, n_vector)
[1] FALSE

You can always use the function identical to test equality with robustness and exactitude.

You may have learned about the operator == for equality. But this is not perfect, look at our example:

c_vector == n_vector
[1] TRUE TRUE TRUE TRUE

The operator == is not aware of types.

Another example, mixing numeric and boolean:

1 == TRUE
[1] TRUE
identical(1, TRUE)
[1] FALSE

In computer science, there is a reason why boolean and integers are mixed. We won’t cover this reason now. It’s out of our scope. Feel free to ask if you’re interested in history and maths.


Question 2: Can I include both text and numbers in a vector ?

mixed_vector <- c(1, "2", 3)
Answer

No. We can not mix types in a vector. Either all its content is made of number or all its content is made of characters.

Here, all our values have been turned into characters:

print(mixed_vector)
[1] "1" "2" "3"
print(is.numeric(mixed_vector))
[1] FALSE
print(is.character(mixed_vector))
[1] TRUE
print(all(is.numeric((mixed_vector))))
[1] FALSE
print(all(is.character((mixed_vector))))
[1] TRUE

Above, the function all returns TRUE if all its content equals to TRUE.


Question 3: How to create an histogram from with a vector ?

Help

A simple way to visualize your data is to use a graph. The function hist may help you.

Answer
hist(c_vector)

Error in hist.default(c_vector) : ‘x’ must be numeric

Why this command is not working ? The error says : “‘x’ must be numeric”. The function accept only vector composed by numeric values.

hist(n_vector)

# worked perfectly !


Data Frame

In R, tables are created with the function data.frame:

one2three4 <- data.frame(c(1, 3), c(2, 4))
print(one2three4)
  c.1..3. c.2..4.
1       1       2
2       3       4

You can rename columns and row names respectively with function colnames and rownames.

colnames(one2three4) <- c("Col_1_3", "Col_2_4")
rownames(one2three4) <- c("Row_1_2", "Row_3_4")
print(one2three4)
        Col_1_3 Col_2_4
Row_1_2       1       2
Row_3_4       3       4

You can access a column and a line of the data frame using squared brackets [ and ]. Use the following syntax: [row, column]. Use either the name of the row/column or its position.

# Select a row by its name
print(one2three4["Row_1_2", ])
        Col_1_3 Col_2_4
Row_1_2       1       2
# Select a row by its index
print(one2three4[1, ])
        Col_1_3 Col_2_4
Row_1_2       1       2
 # Select a column by its name
print(one2three4[, "Col_1_3"])
[1] 1 3
 # Select a column by its index
print(one2three4[, 1])
[1] 1 3
 # Select a cell in the table
print(one2three4["Row_1_2", "Col_1_3"])
[1] 1
# Select the first two rows and the first column in the table
print(one2three4[1:2, 1]) 
[1] 1 3

If you like maths, you will remember the order [row, column]. If you’re not familiar with that, then you will do like 99% of all software engineer: you will write [column, row], and you will get an error. Trust me. 99%. Remember, an error is never a problem in informatics

Questions

Question 1: Can I mix characters and numbers in a data frame row ?

Answer

Yes, it is possible:

mixed_data_frame <- data.frame(
  "Character_Column" = c("a", "b", "c"),
  "Number_Column" = c(4, 5, 6)
)
print(mixed_data_frame)
  Character_Column Number_Column
1                a             4
2                b             5
3                c             6

The function str can be used to look at the types of each elements in an object.

str(mixed_data_frame)
'data.frame':   3 obs. of  2 variables:
 $ Character_Column: chr  "a" "b" "c"
 $ Number_Column   : num  4 5 6
str(one2three4)
'data.frame':   2 obs. of  2 variables:
 $ Col_1_3: num  1 3
 $ Col_2_4: num  2 4


Question 2: Can I mix characters and numbers in a data frame column ?

Answer

No:

mixed_data_frame <- data.frame(
  "Mixed_letters" = c(1, "b", "c"),
  "Mixed_numbers" = c(4, "5", 6)
)
print(mixed_data_frame)
  Mixed_letters Mixed_numbers
1             1             4
2             b             5
3             c             6
str(mixed_data_frame)
'data.frame':   3 obs. of  2 variables:
 $ Mixed_letters: chr  "1" "b" "c"
 $ Mixed_numbers: chr  "4" "5" "6"


Question 3: How can you add 2 for each cell of the dataframe ?

Answer
three4five6 <- one2three4 + 2
three4five6
        Col_1_3 Col_2_4
Row_1_2       3       4
Row_3_4       5       6

Read a table as data frame

Exercise: Use the Help pane to find how to use the function read.csv. You can find example_table.csv in /shared/projects/2325_ebaii/SingleCell/intro_R

Use the function read.csv to:

  1. copy and open the file ./example_table.csv in your project directory.
  2. this table has a header (TRUE).
  3. this table has row names in the column called “Gene_id”.

Let all other parameters to their default values.

Save the opened table in a variable called example_table.

Solution
example_table <- read.csv(
  file="./example_table.csv", 
  header=TRUE, 
  row.names="Gene_id"
)

Now let us explore this dataset.

We can click on environment pane:

see_in_the_env_pane

And if you click on it:

open_example_table

Be careful, large table may hang your session.

Alternatively, we can use the function head which prints the first lines of a table:

head(example_table)
        Sample1   Sample2   Sample3   Sample4
Caml   9.998194 10.004116  9.172489  9.139667
Scamp5 9.995917 10.818685 11.417558 14.907892
Dgki   9.993974 13.664396 16.132275 17.420057
Mas1   9.993956 11.370854 11.233629  9.912863
Apba1  9.992540 14.253438 14.001228 13.654701
Phkg2  9.980898  8.748654  8.714821  9.146529

The function summary describes the dataset per sample:

summary(example_table)
    Sample1          Sample2           Sample3           Sample4       
 Min.   : 9.944   Min.   :  6.838   Min.   :  5.551   Min.   :  5.844  
 1st Qu.: 9.953   1st Qu.:  9.000   1st Qu.: 10.120   1st Qu.:  9.779  
 Median : 9.971   Median : 10.954   Median : 11.326   Median : 11.905  
 Mean   :18.937   Mean   : 19.836   Mean   : 20.828   Mean   : 21.412  
 3rd Qu.: 9.994   3rd Qu.: 12.647   3rd Qu.: 12.650   3rd Qu.: 13.968  
 Max.   :99.784   Max.   :105.077   Max.   :112.188   Max.   :111.820  

Have a look at the summary of the dataset per gene, using the function t to transpose:

head(t(example_table))
             Caml    Scamp5      Dgki      Mas1    Apba1    Phkg2    Timm8b
Sample1  9.998194  9.995917  9.993974  9.993956  9.99254 9.980898  99.78373
Sample2 10.004116 10.818685 13.664396 11.370854 14.25344 8.748654 105.07739
Sample3  9.172489 11.417558 16.132275 11.233629 14.00123 8.714821 112.18819
Sample4  9.139667 14.907892 17.420057  9.912863 13.65470 9.146529 109.09544
            Capn7     Yrdc    Coq10a   Gm27000    Lrrc41    Acadsb    Pdzd11
Sample1  9.976005 9.971093  9.970835  9.965511  9.960667  9.959179  9.952750
Sample2 11.314599 8.905508  8.820582  7.414795  9.961954 11.261520  9.031553
Sample3 11.452421 7.367243 10.449131  7.709008 10.435298 12.336088 10.700876
Sample4 11.692871 9.375526 10.865062 13.126211  9.137375 12.703318 10.832218
          Smarca2   Gm26079     Ptpn5    Rexo2     Ifi27   Snhg20
Sample1  9.952224  99.51466  9.947524  9.94634  9.943989 9.943724
Sample2  9.272424 103.08963 11.090058 13.36391 12.407626 6.838499
Sample3 11.194709 109.85654 11.572261 11.47744 13.591186 5.551247
Sample4 12.117571 111.82050 10.255021 12.29288 14.906542 5.843670
summary(t(example_table))
      Caml            Scamp5            Dgki             Mas1       
 Min.   : 9.140   Min.   : 9.996   Min.   : 9.994   Min.   : 9.913  
 1st Qu.: 9.164   1st Qu.:10.613   1st Qu.:12.747   1st Qu.: 9.974  
 Median : 9.585   Median :11.118   Median :14.898   Median :10.614  
 Mean   : 9.579   Mean   :11.785   Mean   :14.303   Mean   :10.628  
 3rd Qu.:10.000   3rd Qu.:12.290   3rd Qu.:16.454   3rd Qu.:11.268  
 Max.   :10.004   Max.   :14.908   Max.   :17.420   Max.   :11.371  
     Apba1            Phkg2           Timm8b           Capn7             Yrdc      
 Min.   : 9.993   Min.   :8.715   Min.   : 99.78   Min.   : 9.976   Min.   :7.367  
 1st Qu.:12.739   1st Qu.:8.740   1st Qu.:103.75   1st Qu.:10.980   1st Qu.:8.521  
 Median :13.828   Median :8.948   Median :107.09   Median :11.384   Median :9.141  
 Mean   :12.975   Mean   :9.148   Mean   :106.54   Mean   :11.109   Mean   :8.905  
 3rd Qu.:14.064   3rd Qu.:9.355   3rd Qu.:109.87   3rd Qu.:11.513   3rd Qu.:9.524  
 Max.   :14.253   Max.   :9.981   Max.   :112.19   Max.   :11.693   Max.   :9.971  
     Coq10a          Gm27000           Lrrc41           Acadsb      
 Min.   : 8.821   Min.   : 7.415   Min.   : 9.137   Min.   : 9.959  
 1st Qu.: 9.683   1st Qu.: 7.635   1st Qu.: 9.755   1st Qu.:10.936  
 Median :10.210   Median : 8.837   Median : 9.961   Median :11.799  
 Mean   :10.026   Mean   : 9.554   Mean   : 9.874   Mean   :11.565  
 3rd Qu.:10.553   3rd Qu.:10.756   3rd Qu.:10.080   3rd Qu.:12.428  
 Max.   :10.865   Max.   :13.126   Max.   :10.435   Max.   :12.703  
     Pdzd11          Smarca2          Gm26079           Ptpn5       
 Min.   : 9.032   Min.   : 9.272   Min.   : 99.51   Min.   : 9.948  
 1st Qu.: 9.722   1st Qu.: 9.782   1st Qu.:102.20   1st Qu.:10.178  
 Median :10.327   Median :10.573   Median :106.47   Median :10.673  
 Mean   :10.129   Mean   :10.634   Mean   :106.07   Mean   :10.716  
 3rd Qu.:10.734   3rd Qu.:11.425   3rd Qu.:110.35   3rd Qu.:11.211  
 Max.   :10.832   Max.   :12.118   Max.   :111.82   Max.   :11.572  
     Rexo2            Ifi27            Snhg20     
 Min.   : 9.946   Min.   : 9.944   Min.   :5.551  
 1st Qu.:11.095   1st Qu.:11.792   1st Qu.:5.771  
 Median :11.885   Median :12.999   Median :6.341  
 Mean   :11.770   Mean   :12.712   Mean   :7.044  
 3rd Qu.:12.561   3rd Qu.:13.920   3rd Qu.:7.615  
 Max.   :13.364   Max.   :14.907   Max.   :9.944  
To go further
# number of column
ncol(example_table)
[1] 4
# number of row
nrow(example_table)
[1] 20
# get dimension
dim(example_table)
[1] 20  4
# type of each elements
str(example_table)
'data.frame':   20 obs. of  4 variables:
 $ Sample1: num  10 10 9.99 9.99 9.99 ...
 $ Sample2: num  10 10.8 13.7 11.4 14.3 ...
 $ Sample3: num  9.17 11.42 16.13 11.23 14 ...
 $ Sample4: num  9.14 14.91 17.42 9.91 13.65 ...

TLDR – Too Long Didn’t Read

# Declare a variable, and store a value in it:
three <- 3

# Basic operators: + - / * work as intended:
six <- 3 + 3

# Quotes are used to delimiter text:
seven <- "7"

# You cannot perform maths on text:
"7" + 8 # raises an error
seven + 8 # also raises an error
six + 8 # works fine

# You can change the type of your variable with:
as.numeric("4") # the character '4' becomes the number 4
as.character(10) # the number 10 becomes the character 10

# You can compare values with:
six < seven
six + 1 >= seven
identical(example_table, mixed_data_frame)


# You can load and save a dataframe with:
read.table(file = ..., sep = ..., header = TRUE)
write.table(x = ..., file = ...)

# Create a table with:
my_table <- data.frame(...)

# Create a vector with:
my_vector <- c(...)

# You can see the firs lines of a dataframe with:
head(example_table)

# Search for help in the help pane or with:
help('function')

R – Packages

What are modules and packages

Modules and package are considered to be the same thing in this lesson. The difference is technical and does not relates to our session.

Most of the work you are likely to do with R will require one or several packages. A Package is a list of functions or pipelines shipped under a given name. Avery single function you use through R comes from a package or another.

Read the very first line of the help pane:

help(head)

It reads: help {utils}. The function help comes from the package utils.

# Call the function "help", with the argument "example_table"
head(example_table, 1)
      Sample1  Sample2  Sample3  Sample4
Caml 9.998194 10.00412 9.172489 9.139667
# Call the function "help" ***from the package utils***, with the argument "example_table"
utils::head(example_table, 1)
      Sample1  Sample2  Sample3  Sample4
Caml 9.998194 10.00412 9.172489 9.139667

Warning: Sometime, two package may have a function with the same name. They are most certainly not doing the same thing. IMHO, it is a good habbit to always call a function while disambiguating the package name. utils::help() is better than help() alone.

Install a package

You may install a new package on your local computer. You shall not do it on a cluster. The IFB core cluster you are working on today is shared and highly valuable ; no one can install anything besides the official maintainers.

The following lines are written for instruction purpose and should not be used on IFB core cluster.

Use install.packages() to install a package.

# Install a package with the following function
install.packages("tibble")

This will raise a prompt asking for simple questions : where to download from (choose somewhere in France), whether to update other packages or not.

Do not be afraid by the large amount of things prompted in the console and let R do the trick.

Alternatively, you can click Tool -> Install Packages in RStudio.

You can list installed packages with installed.packages(), and find for packages that can be updates with old.packages(). These packages can be updated with update.packages().

While the function install.packages() searches packages in the common R package list, many bioinformatics packages are available on other shared packages warehouses. Just like AppleStore and GoogleStore do not have the same applications on mobile, R has multiple sources for its packages. You need to know one of them, and one only Bioconductor.

bioconductor

One can use Bioconductor with the function BiocManager::install():

# Install BiocManager, a package to use Bioconductor
install.packages("BiocManager")

Use a package

You can load a package with the function library():

library(package="Seurat")

If there is no error message, then you can try:

help(Read10X, package = "Seurat")

R – Single Cell

R for SingleCell does not differ from classic R work, but with the list of the packages and functions used.

Load and save R objects

While working on your projects and leaning this week, you will process datasets in R. The results of these analyses will be stored on variables. This means, that when you close RStudio, some of this work might be lost.

We already saw the function save.image() to save a complete copy of your working environment.

However, you can save only the content of a give variable. This is useful when you want to save the result of a function (or a pipeline) but not the whole 5 hours of work you’ve been spending on how-to-make-that-pipeline-work-correctly.

The format is called: RDS for R Data Serialization. This is done with the function saveRDS():

saveRDS(object = example_table, file = "example_table.RDS")

You can also load a RDS into a variable. This is useful when you receive a RDS from a coworker, or you’d like to keep going your work from a saved point. This is done with the function readRDS():

example_table <- readRDS(file = "example_table.RDS")
head(example_table)
        Sample1   Sample2   Sample3   Sample4
Caml   9.998194 10.004116  9.172489  9.139667
Scamp5 9.995917 10.818685 11.417558 14.907892
Dgki   9.993974 13.664396 16.132275 17.420057
Mas1   9.993956 11.370854 11.233629  9.912863
Apba1  9.992540 14.253438 14.001228 13.654701
Phkg2  9.980898  8.748654  8.714821  9.146529

Why R for EBAII SingleCell ?

No programming language is better than any other. Anyone saying the opposite is (over)-specialized in the language they are advertising. This week, we are going to use many packages written in R. You are already learning to write both bash and R scripts, let’s not add another one.

In the field of bioinformatics, languages used by the community are quite limited. While learning bash cannot be escaped nowadays, it is not enough to perform a complete analysis with publication ready figures and results. You should be interested in another programming language: R and/or Python.

Please, note that this advice is valid today, but may change. Other programming languages are used, some have lost their place on the podium, and others are trying to supersede bash, R, and Python.

Anyway Python is the best programming language in the WORLD. Don’t listen to Bastien.

Frequently asked questions (errors)

Errors

Error: object ‘xxx’ not found

The variable ‘xxx’ doesn’t exist, you should create it before using it.

Error in plot.new() : figure margins too large

The Help/Files panel is too small to print the image. Increase the panel size to visualize the plot.

Error in abs_path(input) : The file ‘xxx.R’ does not exist.

You should check :

  • if the file exist
  • if you didn’t have a typo in the file name and path
  • where you are running the script (getwd()) and change it if it’s not what you want (setwd())

Others

In IFB rstudio cluster when you are looking for the help of filter function you can find this function in 3 different packages : stats, dplyr and plotly. When you use this you should write :

help(filter, package = "stats")
stats::filter()
help(filter, package = "dplyr")
dplyr::filter()
help(filter, package = "plotly")
plotly::filter()
# Linear filtering on a time series
x <- 1:100
filter(x, rep(1, 3))

Error in UseMethod(“filter”) : no applicable method for ‘filter’ applied to an object of class “c(‘integer’, ‘numeric’)”

The last package downloaded was dplyr so the filter function come from dplyr and doesn’t accept numeric list.

Others way to declare variables:

c123 <- c(1,2,3)
c(2,3,4) -> c234
c345 = c(3,4,5)
c123
c234
c345

A bad way to name your variables is to use the same name as a function :

c <- c(3,4,5)