Forewords

I need three teams for this session: team Wilcoxon, team Student, and team ROC.

I will also need your help because I can’t make DESeq2 work correctly. But I’m sure, that we will solve my issue: you’re in the best session here at EBAII.

TLDR: R command lines

In this presentation, there will be screen captures for you to follow the lesson. There will also be every single R command lines. Do not take care of the command lines if you find them too challenging. Our goal here, is to understand the main mechanism of Differential Expression Analysis. R is just a tool.

Below are the libraries we need to perform this whole session:

base::library(package = "BiocParallel")    # Optionally multithread some steps
base::library(package = "DT")              # Display nice table in HTML
base::library(package = "ggplot2")         # Draw graphs and plots
base::library(package = "ggpubr")          # Draw nicer graphs
base::library(package = "rstatix")         # Base R statistics
base::library(package = "knitr")           # Build this presentation
base::library(package = "dplyr")           # Handle big tables
base::library(package = "Seurat")          # Handle SingleCell analyses
base::library(package = "SeuratObject")    # Handle SingleCell objects
base::library(package = "SingleCellExperiment") # Handle SingleCell file formats
base::library(package = "UpSetR")          # Nice venn-like graphs
base::library(package = "EnhancedVolcano") # Draw Volcano plot

First, we load Seurat object:

sobj <- base::readRDS(
  # Path to the RDS file
  file = "Scaled_Normalized_Seurat_Object.RDS"
)

Then we perform differential expression analysis:

sobj_de <- Seurat::FindMarkers(
  # Object on which to perform DEA
  object = sobj,
  # Name of factor in condition 1
  ident.1 = "TD3A",
  # Name of factor in condition 2
  ident.2 = "TDCT"
)

And that’s all. Our goal is to understand these lines, being able to write them is a bonus.

Purpose of this session

Up to now, we have:

Identified to which cell each sequenced reads come from
Identified to which gene each read come from
Identified possible bias in gene expression for each cell
Filtered and corrected these bias as well as we can

We would like to identify the list of genes that characterize differences between cell cycle phases G1 and S groups.

At the end of this session you will know:

how to select a differential analysis method
how to select the correct normalization (if any?) that must be provided to your differential analysis method
How to read differential expression results

Load RDS dataset

You already have a dataset loaded ? Good. Keep on going with it ! You don’t have one ? Use mine:

# We store in the variable `sobj`, the result of
# the function `readRDS` from package `base`.
sobj <- base::readRDS(
  # Path to the RDS file
  file = "Scaled_Normalized_Seurat_Object.RDS"
)

The command above, uses the function readRDS from base R package. and git it the path to the RDS file we built in previous session.

Insight

We are wondering what’s in our dataset. Let’s have a look, with the function print form the package base.

base::print(
  # Object to display
  x = sobj
)

An object of class Seurat 
6941 features across 3984 samples within 1 assay 
Active assay: RNA (6941 features, 2000 variable features)

We have 6941 features (aka genes), accross 3984 samples (aka cells).

Let us have a look at the RNA counts for 10 cells and their annotation, with the function head from the package utils.

utils::head(
  # Object to visualize
  x = sobj,
  # Number of lines to display
  n = 10
)

There is no counts, normalized or not. Where are they ?

In order to explore the content of sobj, use the function str from the package utils:

utils::str(
  # Object to explore
  object = sobj
)

Alternatively, in RStudio, you can click on the object pane and explore manually the content of the object. If we explore the slot assays, then we find the counts.

You can access them with:

utils::head(
  # Object to visualize
  x = sobj@assays[["RNA"]]@counts,
  # Number of rows to display
  n = 10
)

We have one gene per line, one cell per column, and RNA counts in each row. Are these counts normalized ? Are they scaled ? Are they filtered ? Are they corrected ?

Answer

These counts are normalized, scaled, filtered. This information is available in the seurat object itself, within the slot commands. See an example below:

names(sobj@commands)

[1] "NormalizeData.RNA"        "FindVariableFeatures.RNA"
[3] "ScaleData.RNA"

However, please be aware that counts in the slot count are raw counts. Normalized counts are in the slot data and scaled data are in the slot scaled.data. And it you do not find that clear, I totally agree with you.

Is it normal that we have so many zeros ? And what about surch low counts, is it because we downsampled the sequenced reads for this session ?

Answer

The large number of null counts is completely normal. In maths/stats we talk about matrix sparcity, i.e a table with lots of zeros. If the data were to be downsampled, we had had done this by cropping over a small chromosome, and not reducing the general amount of reads.

Select a DE method

Available methods

Seurat let us use multiple differential analysis methods with its function FindMarkers.

wilcox: The wilcoxon test tests the mean of expression and looks for a difference in these means.
MAST: This tool has been built for Single Cell. It is based on a statistical model called “Hurdle Model”, which excells with data that contains lots of zeros (which is our case in Single Cell RNA-Seq: most of the genes are not expressed.)
DESeq2: This tool has originally been built for bulk RNA-Seq but now includes specific funcitons for Single Cell. It performs well when counts are highly variable or when you wand to compare a handful of cells.
t-test: The t-test uses a comparison of means to assess differential expression probability.
roc: An AUC value of 1 means that expression values for this gene alone can perfectly classify the two groupings (i.e. Each of the cells in cells.1 exhibit a higher level than each of the cells in cells.2). An AUC value of 0 also means there is perfect classification, but in the other direction. A value of 0.5 implies that the gene has no predictive power to classify the two groups.

The main question now is how to choose the right test: spoilers, there are no option better than another in all ways.

From Soneson & Robinson (2018) Nature Methods:

Here, researchers have designed an artificial dataset where they knew in advance the list of differentially expressed genes. They have used all these algorithms and consigned the results.

DESeq2, Limma seems to have a higher number of false positives (genes called differentially expressed while they were not.)
Wilcoxon seems to be better in general
Mast performs well in absolute

ANOVA was not present in this study.

Case: Jund

The question is now to guess wether this gene is differnetially expressed or not.

Cell observations

Let’s have a look at the gene named ‘Jund’, involved in senescence and apoptosis. In order to plot its expression accross all cells, we are going to use the function VlnPlot from Seurat package. The input object is obviously contained in the sobj variable, since it is our only Seurat object. In addition, we are going to select the feature Jund, and split the graph according to the cell phases we annotated earlier.

Seurat::VlnPlot(
  # The Seurat object
  object = sobj,
  # The name of the gene of interest (feature = gene)
  features = "Jund",
  # The name of the Seurat cell-cycle annotation
  split.by = "cc_seurat.Phase",
  # Change color for presentation
  cols = c("darkslategray3", "olivedrab", "orangered3")
)

Using your ‘intuition’, is this gene differentially expressed between TD3A and TDCT ?

Answer

In TD3A, the violin plot highlights two peaks of expression levels, one at zero level, the other around 1.75. There are more null counts than non-null, but we cannot deny that a large range of cells have the gene Jund expressed.

In TDCT, most of the expression is equal to zero. While some cells do have an expression between 1 and 3, the density drawn by the violin plot confirms that these are the exception rather than the majority.

IMHO, and you can disagree, the expression of the gene Jund differs between TD3A and TDCT. This is purely intuitive.

Using your ‘intuition’, is this gene differentially expressed between G1 and S phases ?

Answer

Seurat::VlnPlot(
  # The Seurat object
  object = sobj,
  # The name of the gene of interest (feature = gene)
  features = "Jund",
  # The name of the Seurat cell-cycle annotation
  group.by = "cc_seurat.Phase",
  # Change color for presentation
  cols = c("darkslategray3", "olivedrab", "orangered3")
)

Again, most of the expression is null, but G1 phase seems to have a higher count density around 2 than the other phases. It is harder to decide …

Okay, let’s have some informations about these distributions.

# Store counts in a variable
counts <- base::as.data.frame(
  # The matrix to reformat into a dataframe
  x = sobj@assays[["RNA"]]@data
)
# Rename cells with cell cycle phase
base::colnames(counts) <- paste(
  # The names of the cell cycle phases for each cell
  sobj@meta.data$cc_seurat.Phase,
  # The names of the cells themselves
  colnames(sobj@assays[["RNA"]]@data),
  sep = "_"
)

We have 2744 cells within the G1 group:

countsG1 <- select(counts, matches("^G1."))

JundG1 <- base::as.data.frame(base::t(countsG1["Jund", ]))

base::summary(JundG1)

      Jund       
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.7087  
 3rd Qu.:1.6899  
 Max.   :3.5905

We have 746 cells withing the S group:

countsS <- select(counts, matches("^S."))

JundS <- base::as.data.frame(base::t(countsS["Jund", ]))

base::summary(JundS)

      Jund       
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.9374  
 3rd Qu.:1.8696  
 Max.   :3.2949

From biology to statistics

Okay, let us resort on statistics to evaluate our chances to be guess correctly, or our risks to guess wrong.

We have lots of observations: 2744 cells within the G1 phase, and 746 cells withing the S phase.Statisticians really like to have a lot of observations! Ideally, statisticians always want to have more observation than tests. We have a total of 3490 observations and we are testing 1 gene. For them, this is a dream come true!

Are our cells supposed to be interactig with each others ? Are they independent from each others ? This is very important, and usually, it requires a discussion.

oes the expression in our cells follow a normal distribution ? It’s easy to check. Let’s draw the expression like we did above.

First, we use the function rbind from base package. Be careful, the function rbind also exists in DelayedArray, data.table, and BiocGenerics packages. We want to use the basic one. Here is an example of odd behaviours that occurs when not using the package name before a function call.

# Add column adientifiers in each count dataframes
JundG1$phase <- "G1"
JundS$phase <- "S"
# Paste the rows beneith each other
Jund <- base::rbind(
  #  variable pointing to G1 counts
  JundG1,
  #  variable pointing to S counts
  JundS,
  # A boolean, requesting that strings/characters
  # should not be casted as `logical`. It breaks graphs.
  stringsAsFactors = FALSE
)

Secondly, we use the function gghistogram from package ggpubr in order to display relative abundance of gene expression:

ggpubr::gghistogram(
  Jund,
  x = "Jund",
  y = '..density..',
  fill = "steelblue",
  bins = 15,
  add_density = TRUE
)

Our distribution seems to be binomial we will have to rely on non-parametric tests.

Let’s run a non-parametric test base on the mean of distributions, since it’s the clothest to our ‘intuitive’ approach. Let there be Wilcoxon test.

In R, it’s quite straightforward: we have the function wilcoxon_test to perform the test, then we can plot the result.

# On the expression table stored in the varialbe `Jund`,
# first apply the function `wilcox_test` from package `rstatix`,
# then we apply the function `add_significance` from package `rstatix`
stat.test <- Jund %>% rstatix::wilcox_test(Jund ~ phase) %>% rstatix::add_significance()
# While doing so, we usually also compute effect size
eff.size <- Jund %>% rstatix::wilcox_effsize(Jund ~ phase)

Wilcoxon test says: the distributions are different, with a 8.7^{-6} % of risks of being wrong. The gene Jund can safely be said differentially expressed. We can compute a fold change and conclude.

Just out of curiosity, be aware that t_test from rstatix package, and DESeq from DESeq2 package also return good confidence intervals: respectively 8.7^{-8} and 4.7e-4.

Here, all methods lead to the same conclusion.

Case: Atad2

Cell observations

Let’s have a look at the gene named ‘Atad2’, an ATPase protein associated with diverse cellular activities.

Seurat::VlnPlot(
  # The Seurat object
  object = sobj,
  # The name of the gene of interest (feature = gene)
  features = "Atad2",
  # The name of the Seurat cell-cycle annotation
  split.by = "cc_seurat.Phase",
  # Change color for presentation
  cols = c("darkslategray3", "olivedrab", "orangered3")
)

Not very clear. Let’s have some informations about these distributions.

Within the G1 group:

Atad2G1 <- base::as.data.frame(
  # We transpose the table for future
  # display
  x = base::t(countsG1["Atad2", ])
)
base::summary(Atad2G1)

     Atad2        
 Min.   :0.00000  
 1st Qu.:0.00000  
 Median :0.00000  
 Mean   :0.08845  
 3rd Qu.:0.00000  
 Max.   :2.65333

Withing the S group:

Atad2S <- base::as.data.frame(
  # We transpose the table for future
  # display
  x = base::t(countsS["Atad2", ])
)
base::summary(Atad2S)

     Atad2       
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.2615  
 3rd Qu.:0.0000  
 Max.   :2.6503

Not differentially expressed, do we agree ?

Statistical results

Using the same parameters as for Jund, we can run Wilcoxon method, and have the following result:

# We add cell cycle annotation to the gene counts
Atad2G1$phase <- "G1"
Atad2S$phase <- "S"
# We concatenate tables from both phases
Atad2 <- base::rbind(Atad2G1, Atad2S)
# We perform wilcoxon test just like before:
# On the expression table stored in the varialbe `Atad2`,
# first apply the function `wilcox_test` from package `rstatix`,
# then we apply the function `add_significance` from package `rstatix`
stat.test <- Atad2 %>% rstatix::wilcox_test(Atad2 ~ phase) %>% rstatix::add_significance()
eff.size <- Atad2 %>% rstatix::wilcox_effsize(Atad2 ~ phase)

It’s significative ! What do t_test and DESeq2 think of it?

# On the expression table stored in the varialbe `Atad2`,
# first apply the function `t_test` from package `rstatix`,
# then we apply the function `add_significance` from package `rstatix`
stat.test <- Atad2 %>% rstatix::t_test(Atad2 ~ phase) %>% rstatix::add_significance()

Still positive with t_test.

And DESeq2 ? Adjusted p-value is 0.06891707. Not significative.

Conclusion

In conclusion, to select your method, use the following:

If you have already done a study with one of these methods, keep using the same. This is crutial if you ever want to compare your new result with the old ones.
If you want to compare your study with a published one, use the same method.
If you have no idea, use Wilcoxon.
If you have bulk data analyzed with DESeq2/Limma, use DESeq2/Limma. It will be easier to take both works in consideration.

Please, never use a simple Wilcoxon on bulk RNA-Seq data.

Select a dataset

Dataset depends on selected method

There it is quite easier:

FindMarkers

With the function FindMarkers from package Seurat, we want to make three groups:

One using wilcoxon to perform DEA between “G1” and “S” phases.
One using t-test to perform DEA between “G1” and “S” phases.
One using ROC to perform DEA between “G1” and “S” phases.

We will observe the results and compare our gene lists.

Hey, why are you looking at me? It’s your turn to work! Use the all the notions seen above to select the right counts (slot), the right input object, and the right arguments.

10 minutes practice !

Answers

Here are the code for each team:

sobj_wilcox <- Seurat::FindMarkers(
  # The variable that contains Seurat Object
  object = sobj,
  # Name of condition 1
  ident.1 = "G1",
  # Name of condition 2
  ident.2 = "S",
  # Factor name in the Seurat Object
  group.by = "cc_seurat.Phase",
  # Differential analysis method
  test.use = "wilcox"
)

sobj_t <- Seurat::FindMarkers(
  # The variable that contains Seurat Object
  object = sobj,
  # Name of condition 1
  ident.1 = "G1",
  # Name of condition 2
  ident.2 = "S",
  # Factor name in the Seurat Object
  group.by = "cc_seurat.Phase",
  # Differential analysis method
  test.use = "t"
)

sobj_roc <- Seurat::FindMarkers(
  # The variable that contains Seurat Object
  object = sobj,
  # Name of condition 1
  ident.1 = "G1",
  # Name of condition 2
  ident.2 = "S",
  # Factor name in the Seurat Object
  group.by = "cc_seurat.Phase",
  # Differential analysis method
  test.use = "roc"
)

In the function argument, there is a FoldChange threshold. Should we filter gene expression based on FoldChange? In case of positive answer, how much should that threshold be?

Answer

About thresholds on FDR (False Discovery Rate) and Log2(FC) (Log of the Fold Change), there are many discussions.

Remember, here the threshold on Fold Change is Logged. A log2(1) =0. And keep in mind the following:

If one selects a fold change threshold above 1.7, then their study concludes that smoking is not related to lung cancer.
If one selects a fold change threshold above 1, then their study concludes that a fast-food based diet does not lead to weight gain.
If one selects a fold change threshold above 1/25 000 000, then their study concludes: it is a complete hazard that mice have featal malformation when in presence of Bisphanol.

In conclusion, there one, and only one reason to filter on fold change: in my experience, a fold change below 0.7 will be hard to see/verify on wet-lab (qRT).

If you need to reduce a too large number of differentially expressed genes, then reduce the FDR to 0.01, or even better, to 0.001. With that, you reduce your number of false claims.

Can you help me with DEseq2?

When I run the following command line, I have an error :

sobj_deseq2 <- Seurat::FindMarkers(
  # The variable that contains Seurat Object
  object = sobj,
  # Name of condition 1
  ident.1 = "G1",
  # Name of condition 2
  ident.2 = "S",
  # Factor name in the Seurat Object
  group.by = "cc_seurat.Phase",
  # Differential analysis method
  test.use = "DESeq2",
  # Use non-normalized data with DESeq2
  slot = "counts"
)

Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means

Answer

This is normal for a single-cell dataset to have a lot of zeros. Here, all genes have at least one cell in the experiment, that do not express this particular gene. Matrix sparcity is a problem for DEseq2.

One way to fix this error consists in the addition of a small integer to the whole count matrix.

In order to do surch modification to the data, we use the function SetAssayData from package Seurat. It takes a new count matrix in input, se we have to build it first.

count_matrix_plus_one <- base::as.matrix(
  # Take the count matrix and add one,
  # This results in a data.frame, which
  # has to be casted into a matrix.
  sobj@assays[["RNA"]]@counts + 1
)

sobjp1 <- SeuratObject::SetAssayData(
  # The  variable pointing to the Seurat object
  object = sobj,
  # The slot to modify, as we use DESeq2 we modify raw counts
  slot = "counts",
  # The new matrix with +1 counts
  new.data = count_matrix_plus_one
)

sobj_deseq2 <- Seurat::FindMarkers(
  # The variable that contains Seurat Object
  object = sobjp1,
  # Name of condition 1
  ident.1 = "G1",
  # Name of condition 2
  ident.2 = "S",
  # Factor name in the Seurat Object
  group.by = "cc_seurat.Phase",
  # Differential analysis method
  test.use = "DESeq2",
  # Use non-normalized data with DESeq2
  slot = "counts"
)

Remark: by doing surch modification, some fold changes have been modified: remember the gene Atad2 with a mean expression of 0.08 in G1, and 0.2 in S phases? Mean expressions are now around 1.08 for G1, and 1.2 for S phases. This may be the reason why it was not differentially expressed in DESeq2, while Wilcoxon and T-test returned adjusted pvalues far below 0.05.

Explore results

Big tables

Let us have a look at the results:

We have the following columns:

p_val: Ignore this column. Always ignore raw p-values. Look at corrected ones, and if they are missing, then compute them.
avg_log2FC: Average Log2(FoldChange). Illustrates how much a gene is differentially expessed between samples in each condition.
pct.1: Percent of cells with gene expression in condition one, here in “G1” phase.
pct.2: Percent of cells with gene expression in condition two, here in “S” phase.
p_val_adj: The confidence we have in the result. The closer to 0, the lesser is the risk of error.

We have the following columns:

p_val: Ignore this column. Always ignore raw p-values. Look at corrected ones, and if they are missing, then compute them.
avg_log2FC: Average Log2(FoldChange). Illustrates how much a gene is differentially expessed between samples in each condition.
pct.1: Percent of cells with gene expression in condition one, here in “G1” phase.
pct.2: Percent of cells with gene expression in condition two, here in “S” phase.
p_val_adj: The confidence we have in the result. The closer to 0, the lesser is the risk of error.

We have the following columns:

myAUC: The area under the curve
avg_diff: Average difference in AUC
power: abs(AUC-0.5) * 2, usefull to sort genes based on AUC
pct.1: Percent of cells with gene expression in condition one, here in “G1” phase.
pct.2: Percent of cells with gene expression in condition two, here in “S” phase.

We have the following columns:

p_val: Ignore this column. Always ignore raw p-values. Look at corrected ones, and if they are missing, then compute them.
avg_log2FC: Average Log2(FoldChange). Illustrates how much a gene is differentially expessed between samples in each condition.
pct.1: Percent of cells with gene expression in condition one, here in “G1” phase.
pct.2: Percent of cells with gene expression in condition two, here in “S” phase.
p_val_adj: The confidence we have in the result. The closer to 0, the lesser is the risk of error.

Filter DEA results

What kind of threshold should be used to filter each results?

If we must label certain scores as good or bad, we can reference the following rule of thumb:

0.5 = No discrimination 0.5-0.7 = Poor discrimination 0.7-0.8 = Acceptable discrimination 0.8-0.9= Excellent discrimination 0.9 = Outstanding discrimination

Hosmer and Lemeshow in Applied Logistic Regression (p. 177)

Add results to Seurat objects

We’d like to store the results of differential expression analysis in the Seurat object. For that purpose, we use the function AddMetaData from Seurat package as follows:

sobj <- Seurat::AddMetaData(
  object = sobj,
  metadata = sobj_wilcox
)

# Check if the meta data are updated:
base::names(sobj@meta.data)

 [1] "orig.ident"               "nCount_RNA"              
 [3] "nFeature_RNA"             "sample"                  
 [5] "subsets_Ribo_percent"     "subsets_Mito_percent"    
 [7] "percent.top_50"           "percent.top_100"         
 [9] "percent.top_200"          "percent.top_500"         
[11] "subsets_Stress_percent"   "scDblFinder_doublet_call"
[13] "scds_hybrid_call"         "cc_seurat.S.Score"       
[15] "cc_seurat.G2M.Score"      "cc_seurat.SmG2M.Score"   
[17] "cc_seurat.Phase"          "p_val"                   
[19] "avg_log2FC"               "pct.1"                   
[21] "pct.2"                    "p_val_adj"

Common results

Now we can plot intersections in an up-set graph. It is like a venn diagramm:

UpSetR::upset(
  data = UpSetR::fromList(data),
  order.by = "freq"
)

Heatmap

We’d like to display the expression of genes identified by FindMarkers. Then we use the function DoHeatmap from the package Seurat.

In order to limit the graph to differentially expressed reads, we use the function rownames from R base package on the DEA result table. In this example, I use the results of wilcoxon, but you shall use any of the results you previously obtained.

Seurat::DoHeatmap(
  # variable pointing to seurat object
  object = sobj,
  # name of DE genes
  features = base::rownames(sobj_wilcox),
  # Cell cycle annotation
  group.by = "cc_seurat.Phase",
)

Volcano plot

A Volcano plot is usefull to identify differnetial expression analysis bias.

The package EnhancedVolcano has an eponym function for that:

EnhancedVolcano::EnhancedVolcano(
  #  variable pointing to the DEA results
  toptable = sobj_wilcox,
  # Gene names
  lab = rownames(sobj_wilcox),
  # Column in which to find Fold Change
  x = "avg_log2FC",
  # Column in which to find confidence interval
  y = "p_val_adj",
  # Lower fold-change cut-off
  FCcutoff = 0.2
)

Session Info

This list of all packages used while you work should be included in each en every R presentation:

utils::sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS/LAPACK: /home/silivren/Projects/EBAII/2023/ebaiin1/SingleCell/DEA_GSEA/ebaii/lib/libopenblasp-r0.3.24.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] EnhancedVolcano_1.18.0      ggrepel_0.9.4              
 [3] UpSetR_1.4.0                SingleCellExperiment_1.22.0
 [5] SummarizedExperiment_1.30.2 Biobase_2.60.0             
 [7] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
 [9] IRanges_2.34.1              S4Vectors_0.38.1           
[11] BiocGenerics_0.46.0         MatrixGenerics_1.12.2      
[13] matrixStats_1.0.0           SeuratObject_4.1.4         
[15] Seurat_4.4.0                dplyr_1.1.3                
[17] knitr_1.44                  rstatix_0.7.2              
[19] ggpubr_0.6.0                ggplot2_3.4.3              
[21] DT_0.28                     BiocParallel_1.34.2        

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.21        splines_4.3.1           later_1.3.1            
  [4] bitops_1.0-7            tibble_3.2.1            polyclip_1.10-6        
  [7] lifecycle_1.0.3         globals_0.16.2          lattice_0.22-5         
 [10] MASS_7.3-60             crosstalk_1.2.0         backports_1.4.1        
 [13] magrittr_2.0.3          limma_3.56.2            plotly_4.10.3          
 [16] sass_0.4.7              rmarkdown_2.25          jquerylib_0.1.4        
 [19] yaml_2.3.7              httpuv_1.6.12           sctransform_0.4.1      
 [22] sp_2.1-1                spatstat.sparse_3.0-3   reticulate_1.34.0      
 [25] cowplot_1.1.1           pbapply_1.7-2           RColorBrewer_1.1-3     
 [28] multcomp_1.4-25         abind_1.4-5             zlibbioc_1.46.0        
 [31] Rtsne_0.16              purrr_1.0.2             RCurl_1.98-1.12        
 [34] TH.data_1.1-2           sandwich_3.0-2          GenomeInfoDbData_1.2.10
 [37] irlba_2.3.5.1           listenv_0.9.0           spatstat.utils_3.0-4   
 [40] goftest_1.2-3           spatstat.random_3.2-1   fitdistrplus_1.1-11    
 [43] parallelly_1.36.0       leiden_0.4.3            codetools_0.2-19       
 [46] coin_1.4-3              DelayedArray_0.26.6     tidyselect_1.2.0       
 [49] farver_2.1.1            spatstat.explore_3.2-5  jsonlite_1.8.7         
 [52] ellipsis_0.3.2          progressr_0.14.0        ggridges_0.5.4         
 [55] survival_3.5-7          tools_4.3.1             ica_1.0-3              
 [58] Rcpp_1.0.11             glue_1.6.2              gridExtra_2.3          
 [61] DESeq2_1.40.2           xfun_0.40               withr_2.5.1            
 [64] fastmap_1.1.1           fansi_1.0.5             digest_0.6.33          
 [67] R6_2.5.1                mime_0.12               colorspace_2.1-0       
 [70] scattermore_1.2         tensor_1.5              spatstat.data_3.0-3    
 [73] utf8_1.2.4              tidyr_1.3.0             generics_0.1.3         
 [76] data.table_1.14.8       httr_1.4.7              htmlwidgets_1.6.2      
 [79] S4Arrays_1.0.4          uwot_0.1.16             pkgconfig_2.0.3        
 [82] gtable_0.3.4            modeltools_0.2-23       lmtest_0.9-40          
 [85] XVector_0.40.0          htmltools_0.5.6.1       carData_3.0-5          
 [88] scales_1.2.1            png_0.1-8               reshape2_1.4.4         
 [91] nlme_3.1-163            cachem_1.0.8            zoo_1.8-12             
 [94] stringr_1.5.0           KernSmooth_2.23-22      parallel_4.3.1         
 [97] miniUI_0.1.1.1          libcoin_1.0-10          pillar_1.9.0           
[100] grid_4.3.1              vctrs_0.6.4             RANN_2.6.1             
[103] promises_1.2.1          car_3.1-2               xtable_1.8-4           
[106] cluster_2.1.4           evaluate_0.22           locfit_1.5-9.8         
[109] mvtnorm_1.2-3           cli_3.6.1               compiler_4.3.1         
[112] rlang_1.1.1             crayon_1.5.2            future.apply_1.11.0    
[115] ggsignif_0.6.4          labeling_0.4.3          plyr_1.8.9             
[118] stringi_1.7.12          viridisLite_0.4.2       deldir_1.0-9           
[121] munsell_0.5.0           lazyeval_0.2.2          spatstat.geom_3.2-7    
[124] Matrix_1.6-1.1          patchwork_1.1.3         future_1.33.0          
[127] shiny_1.7.5.1           ROCR_1.0-11             igraph_1.5.1           
[130] broom_1.0.5             bslib_0.5.1

Differential expression analysis

Thibault DAYRIS

Bastien JOB

2023-10-25

Forewords

TLDR: R command lines

Purpose of this session

Load RDS dataset

Insight

Select a DE method

Available methods

Case: Jund

Cell observations

From biology to statistics

Case: Atad2

Cell observations

Statistical results

Conclusion

Select a dataset

Dataset depends on selected method

FindMarkers

Explore results

Big tables

Filter DEA results

Add results to Seurat objects

Common results

Heatmap

Volcano plot

Session Info