---
title: "
EB3I n1 2025 scRNAseq
-
INTEGRATION
-
"
date: "2025-16-21.22"
author:
- name: "EB3I n1 scRNAseq Team"
- name: "Bastien JOB"
email: "bastien.job@gustaveroussy.fr"
- name: "Séb MELLA"
email: "sebastien.mella@maestro.pasteur.fr"
output:
rmdformats::readthedown:
fig_width: 8
fig_height: 6
highlight: tango ## Theme for the code chunks
embed_fonts: TRUE
number_sections: true ## Adds number to headers (sections)
theme: flatly ## CSS theme for the HTML page
collapsed: true ## By default, the TOC is folded
toc_depth: 3
smooth_scroll: true ## Smooth scroll of the HTML page
self_contained: true ## Includes all plots/images within the HTML
code_download: true ## Adds a button to download the Rmd
code_folding: show
thumbnails: false
lightbox: true
fig_caption: false
gallery: true
use_bookdown: true
always_allow_html: true ## Allow plain HTML code in the Rmd
editor_options:
markdown:
wrap: 72
---
```{r knit_setup, echo = FALSE}
knitr::opts_chunk$set(
echo = TRUE, # Print the code
eval = TRUE, # Run command lines
message = FALSE, # Print messages
prompt = FALSE, # Do not display prompt
comment = NA, # No comments on this section
warning = FALSE, # Display warnings
tidy = FALSE,
fig.align="center",
# results = 'hide',
width = 100 # Number of characters per line
)
```
```{css, echo=FALSE}
.notrun {
background-color: lightgrey !important;
border: 3px solid black !important;
}
.notruno {
background-color: lightgrey !important;
color : black !important;
}
.question {
background-color: aquamarine !important;
color : black !important;
border: 3px solid limegreen !important;
}
.questiono {
background-color: aquamarine !important;
color : black !important;
}
.answer {
background-color: navajowhite !important;
border: 3px solid brown !important;
}
.answero {
background-color: navajowhite !important;
color : black !important;
}
.beyond {
background-color: violet !important;
border: 3px solid purple !important;
}
.beyondo {
background-color: violet !important;
color : black !important;
}
```
```{r knit_hook, echo = FALSE}
hooks = knitr::knit_hooks$get()
hook_foldable = function(type) {
force(type)
function(x, options) {
res = hooks[[type]](x, options)
if (isFALSE(options[[paste0("fold.", type)]])) return(res)
paste0(
"Show ", type, "
\n\n",
res,
"\n\n "
)
}
}
knitr::knit_hooks$set(
output = hook_foldable("output"),
plot = hook_foldable("plot")
)
```
------------------------------------------------------------------------
------------------------------------------------------------------------

------------------------------------------------------------------------
------------------------------------------------------------------------
# PREAMBLE
## Purpose of this session
This file describes the different steps to perform **sixth** part of data
processing for the single cell RNAseq data analysis training course for
the **EBAII n1 2024**, covering these steps :
* **Integration** of a pair of samples
## The integration data set
- We will still use data from the [Paiva et
al.](https://febs.onlinelibrary.wiley.com/doi/full/10.1111/febs.14651)
publication
- While we focused on the **KO sample `TD3a`** until now, we will integrate
it with its **wild-type** counterpart **`TDCT`**
- `TDCT` was processed in a very **similar manner** (QC, doublers, filtering,
LogNormalization, HVGs and scaling) to what you just performed for `TD3A`
- Both samples were halved for their number of cells, for practical reasons
in the context of this training (computation speed)
- As the purpose is to integrate both samples into a single analytical entity,
the objects we will use as input are Seurat objects up to the data **scaling
and regression** (thus, there are no dimension reduction data in these)
------------------------------------------------------------------------
------------------------------------------------------------------------
# Start Rstudio
- Using the [OpenOnDemand cheat
sheet](https://ifb-elixirfr.github.io/EBAII/2023/ebaiin1/SingleCell/2024_TD_OpenOnDemand.html),
connect to the [OpenOnDemand
portal](https://ondemand.cluster.france-bioinformatique.fr) and
**create a Rstudio session** with the right resource requirements.
# Warm-up
- We set **common parameters** we will use throughout this session :
```{r setparam}
## Seed for the RNG (our sole parameter for this session)
my_seed <- 1337L
## Required for our big objects
options(future.globals.maxSize = 1E+09)
```
```{r load libraries}
library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
```
------------------------------------------------------------------------
------------------------------------------------------------------------
# Prepare the data structure
We will do the same as for former steps, just changing the session name
:
## Main directory
```{r maindir, fold.output = FALSE}
## Setting the project name
project_name <- "ebaii_sc_teachers" # Do not copy-paste this ! It's MY project !!
## Preparing the path
TD_dir <- paste0("/shared/projects/", project_name, "/SC_TD")
## Creating the root directory
dir.create(path = TD_dir, recursive = TRUE)
## Print the root directory on-screen
print(TD_dir)
```
## Current session
```{r sessiondir, fold.output = FALSE}
## Creating the session (Preproc.1) directory
session_dir <- paste0(TD_dir, "/06_Integration")
dir.create(path = session_dir, recursive = TRUE)
## Print the session directory on-screen
print(session_dir)
```
## Input directory
```{r indir, fold.output = FALSE}
## Creating the INPUT data directory
input_dir <- paste0(session_dir, "/DATA")
dir.create(path = input_dir, recursive = TRUE)
## Print the input directory on-screen
print(input_dir)
```
## Output directory
```{r outdir, fold.output = FALSE}
## Creating the OUTPUT data directory
output_dir <- paste0(session_dir, "/RESULTS")
dir.create(path = output_dir, recursive = TRUE)
## Print the output directory on-screen
print(output_dir)
```
------------------------------------------------------------------------
------------------------------------------------------------------------
# Load input data
- The data consists into a **pair of Seurat objects** (one per sample)
as RDS, hosted in a Zenodo respository (Id :
[14081572](https://zenodo.org/records/14081572 "Zenodo Integration"){#14081572})
## Download data
- We will directly retrieve data from Zenodo to your `input_dir` :
```{r dl_int, message = TRUE, echo = TRUE, warning = FALSE, fold.output = FALSE, fold.plot = FALSE}
### Named files (will be used later on !)
TD3A_rds <- "TD3A_Filtered_12508.2018.RDS"
TDCT_rds <- "TDCT_Filtered_12254.1868.RDS"
## Filename(s) to retrieve
toget_files <- c(TD3A_rds,
TDCT_rds)
## Folder to store retrieved files
local_folder <- input_dir
## Use local backup ?
backup <- FALSE
if(backup) message("Using local backup !")
## Force download ?
force <- FALSE
if(force) message("Forcing (re)download !")
## Zenodo ID
zen_id <- '14094752'
### Define remote folder
remote_folder <- if (backup) "/shared/projects/2422_ebaii_n1/atelier_scrnaseq/TD/BACKUP/INTEGRATION/" else paste0("https://zenodo.org/records/", zen_id, "/files/")
### Reconstruct the input paths
remote_path <- paste0(remote_folder, "/", toget_files)
### Reconstruct the output paths
local_path <- paste0(local_folder, "/", toget_files)
## Retrieve files (if they don't exist), in loop
for (tg in seq_along(toget_files)) {
## If the file does not locally exist
if (!file.exists(local_path[tg]) | force) {
## Retrieve data
if(backup) {
file.copy(from = remote_path[tg],
to = local_path[tg])
} else {
download.file(url = remote_path[tg],
destfile = local_path[tg])
}
## Check if downloaded files exist locally
if(file.exists(local_path[tg])) message("\tOK")
} else message(paste0(toget_files[tg], " already downloaded !"))
}
```
## Load into R
```{r rds_load}
## Loading TD3A
TD3A_so <- readRDS(file = paste0(input_dir,
"/",
TD3A_rds))
## Loading TDCT
TDCT_so <- readRDS(file = paste0(input_dir,
"/",
TDCT_rds))
```
------------------------------------------------------------------------
------------------------------------------------------------------------
# Merge datasets
- We simply merge the two objects into a single one :
```{r merge}
TD3A_TDCT_merge <- merge(
x = TD3A_so, ## First object to merge
y = TDCT_so, ## Second object to merge
add.cell.ids = c("TD3A", "TDCT"), ## Keep track of sample of origin for cells (optional)
)
```
- We can take a look at its summary
```{r sobjm_see}
## See the object summary
TD3A_TDCT_merge
```
```{r merge_dims, warning = FALSE, fold.output = FALSE, fold.plot = FALSE}
## TD3A
dim(TD3A_so)
## TDCT
dim(TDCT_so)
## Merged
dim(TD3A_TDCT_merge)
```
```{r}
## Clean unneeded objects
rm(TD3A_so, TDCT_so)
```
## Process the merged dataset
In order to visualize the data in a 2-D space, one needs to process it the way
one learnt :
```{r mproc}
## All steps
TD3A_TDCT_merge <- NormalizeData(object = TD3A_TDCT_merge, verbose = FALSE)
TD3A_TDCT_merge <- FindVariableFeatures(object = TD3A_TDCT_merge, nfeatures = 2000, verbose = FALSE)
TD3A_TDCT_merge <- ScaleData(object = TD3A_TDCT_merge, verbose = FALSE)
TD3A_TDCT_merge <- RunPCA(object = TD3A_TDCT_merge, npcs = 50, verbose = FALSE)
```
```{r elbow}
ElbowPlot(object = TD3A_TDCT_merge, ndims = 50, reduction = "pca")
```
## Visualization of batch effect
```{r settingSampleCols, echo=FALSE}
sample_cols <- c("grey30", "indianred1")
```
```{r , message = FALSE}
DimPlot(object = TD3A_TDCT_merge,
reduction = "pca",
group.by = "orig.ident",
cols = sample_cols)
```
```{r}
TD3A_TDCT_merge <- RunUMAP(object = TD3A_TDCT_merge, dims = 1:20, verbose = FALSE)
```
```{r}
DimPlot(object = TD3A_TDCT_merge,
reduction = "umap",
group.by = "orig.ident",
cols = sample_cols)
```
CCL ??
# Integration
## Seurat : [CCA](https://satijalab.org/seurat/reference/ccaintegration)
### CCA integration
```{r}
computing_time <- data.frame()
```
```{r cca integration, eval=TRUE}
start <- Sys.time()
TD3A_TDCT_merge <- IntegrateLayers(object = TD3A_TDCT_merge,
orig.reduction = "pca",
method = CCAIntegration,
new.reduction = "integrated.cca",
normalization.method = "LogNormalize")
end <- Sys.time()
computing_time["cca","comptime"] <- end-start
```
### Seurat : [RPCA](https://satijalab.org/seurat/articles/integration_rpca)
```{r rPCA integration}
start <- Sys.time()
TD3A_TDCT_merge <- IntegrateLayers(object = TD3A_TDCT_merge,
orig.reduction = "pca",
method = RPCAIntegration,
new.reduction = "integrated.rpca",
normalization.method = "LogNormalize")
end <- Sys.time()
computing_time["rPCA","comptime"] <- end-start
```
### Seurat : [Harmony](https://satijalab.org/seurat/reference/harmonyintegration)
```{r harmony integration}
start <- Sys.time()
TD3A_TDCT_merge <- IntegrateLayers(object = TD3A_TDCT_merge,
orig.reduction = "pca",
method = HarmonyIntegration,
new.reduction = "harmony",
normalization.method = "LogNormalize")
end <- Sys.time()
computing_time["Harmony","comptime"] <- end-start
```
```{r, echo=FALSE}
knitr::kable(
computing_time
)
```
Reductions(object = TD3A_TDCT_merge)
## Computing umap from "corrected" spaces
```{r cca umap, eval=TRUE}
TD3A_TDCT_merge <- RunUMAP(object = TD3A_TDCT_merge,
reduction = "integrated.cca",
dims = 1:20,
reduction.name = "umapcca")
```
```{r rpca umap, eval=TRUE}
TD3A_TDCT_merge <- RunUMAP(object = TD3A_TDCT_merge,
reduction = "integrated.rpca",
dims = 1:20,
reduction.name = "umaprpca")
```
```{r harmony umap, eval=TRUE}
TD3A_TDCT_merge <- RunUMAP(object = TD3A_TDCT_merge,
reduction = "harmony",
dims = 1:20,
reduction.name = "umapharm")
```
```{r}
Reductions(object = TD3A_TDCT_merge)
```
## Effect of the correction/integration
## Visualization of the different methods of integration: Projecting the sample id onto the different umap spaces obtained
```{r, echo=FALSE}
no_corr <- DimPlot(object = TD3A_TDCT_merge, group.by = "orig.ident", reduction = "umap", cols = sample_cols)+
ggtitle("No Correction")+
theme_linedraw()+
theme(plot.title = element_text(face = "bold", hjust = .5))
cca_corr <- DimPlot(object = TD3A_TDCT_merge, group.by = "orig.ident", reduction = "umapcca", cols = sample_cols)+
ggtitle("CCA Integration")+
theme_linedraw()+
theme(plot.title = element_text(face = "bold", hjust = .5))
rpca_corr <- DimPlot(object = TD3A_TDCT_merge, group.by = "orig.ident", reduction = "umaprpca", cols = sample_cols)+
ggtitle("rPCA Integration")+
theme_linedraw()+
theme(plot.title = element_text(face = "bold", hjust = .5))
harm_corr <- DimPlot(object = TD3A_TDCT_merge, group.by = "orig.ident", reduction = "umapharm", cols = sample_cols)+
ggtitle("Harmony Integration")+
theme_linedraw()+
theme(plot.title = element_text(face = "bold", hjust = .5))
```
```{r, fig.width=12, fig.height=9, echo=FALSE}
no_corr + cca_corr + rpca_corr + harm_corr + plot_layout(ncol = 2)
```
### Clustering
##### Graph based on CCA Intgration
```{r, cca clustering, eval=TRUE}
TD3A_TDCT_merge <- FindNeighbors(object = TD3A_TDCT_merge,
reduction = "integrated.cca",
dims = 1:20,
graph.name = "snn_cca")
TD3A_TDCT_merge <- FindClusters(object = TD3A_TDCT_merge,
graph.name = "snn_cca",
resolution = c(.5,.8,1),
algorithm = 4)
```
##### Graph based on rPCA Intgration
```{r rpca clustering, eval=TRUE}
TD3A_TDCT_merge <- FindNeighbors(object = TD3A_TDCT_merge,
reduction = "integrated.rpca",
dims = 1:20,
graph.name = "snn_rpca")
TD3A_TDCT_merge <- FindClusters(object = TD3A_TDCT_merge,
graph.name = "snn_rpca",
resolution = c(.5,.8,1),
algorithm = 4)
```
##### Graph based on Harmony Intgration
```{r, harm clustering, eval=TRUE}
TD3A_TDCT_merge <- FindNeighbors(object = TD3A_TDCT_merge,
reduction = "harmony",
dims = 1:20,
graph.name = "snn_harm")
TD3A_TDCT_merge <- FindClusters(object = TD3A_TDCT_merge,
graph.name = "snn_harm",
resolution = c(.5,.8,1),
algorithm = 4)
```
```{r, fig.width=12, fig.height=4,echo=FALSE}
DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_cca_res.0.5",
reduction = "umapcca",
label = TRUE)+NoLegend() | DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_rpca_res.0.5",
reduction = "umaprpca",
label = TRUE)+NoLegend() | DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_harm_res.0.5",
reduction = "umapharm",
label = TRUE)+NoLegend()
```
```{r, fig.width=12, fig.height=4,echo=FALSE}
DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_cca_res.0.8",
reduction = "umapcca",
label = TRUE)+NoLegend() | DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_rpca_res.0.8",
reduction = "umaprpca",
label = TRUE)+NoLegend() | DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_harm_res.0.8",
reduction = "umapharm",
label = TRUE)+NoLegend()
```
```{r, fig.width=12, fig.height=4,echo=FALSE}
DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_cca_res.1",
reduction = "umapcca",
label = TRUE)+NoLegend() | DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_rpca_res.1",
reduction = "umaprpca",
label = TRUE)+NoLegend() | DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_harm_res.1",
reduction = "umapharm",
label = TRUE)+NoLegend()
```
```{r, fig.width=12, fig.height=5,echo=FALSE}
DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_cca_res.1",
reduction = "umapcca",
label = TRUE,
split.by = "orig.ident")+
theme_linedraw()+
NoLegend()+
theme(strip.text = element_text(size = 16, face = "bold", color = "indianred"),
plot.title = element_text(size = 18, face = "bold", color = "steelblue4", hjust = .5))
```
```{r, fig.width=12, fig.height=5,echo=FALSE}
DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_rpca_res.1",
reduction = "umaprpca",
label = TRUE,
split.by = "orig.ident")+
theme_linedraw()+
NoLegend()+
theme(strip.text = element_text(size = 16, face = "bold", color = "indianred"),
plot.title = element_text(size = 18, face = "bold", color = "steelblue4", hjust = .5))
```
```{r, fig.width=12, fig.height=5,echo=FALSE}
DimPlot(object = TD3A_TDCT_merge,
group.by = "snn_harm_res.1",
reduction = "umapharm",
label = TRUE,
split.by = "orig.ident")+
theme_linedraw()+
NoLegend()+
theme(strip.text = element_text(size = 16, face = "bold", color = "indianred"),
plot.title = element_text(size = 18, face = "bold", color = "steelblue4", hjust = .5))
```
```{r}
table("snn_harm_res.1" = TD3A_TDCT_merge$snn_harm_res.1,
"snn_rpca_res.1" = TD3A_TDCT_merge$snn_rpca_res.1)
```
```{r}
table(TD3A_TDCT_merge$orig.ident, TD3A_TDCT_merge$snn_cca_res.1)
```
```{r}
table(TD3A_TDCT_merge$orig.ident, TD3A_TDCT_merge$snn_harm_res.1)
```
```{r}
TD3A_TDCT_merge <- JoinLayers(object = TD3A_TDCT_merge, assay = "RNA")
```
If we have extra-time
```{r}
Idents(object = TD3A_TDCT_merge) <- "snn_harm_res.1"
harm_res1_markers <- FindAllMarkers(
object = TD3A_TDCT_merge,
only.pos = TRUE,
logfc.threshold = 1
)
```
```{r}
harm_res1_topmarkers <- harm_res1_markers %>%
group_by(cluster) %>%
dplyr::slice_head(n = 20)
```
```{r, fig.width=12, fig.height=24}
DoHeatmap(object = TD3A_TDCT_merge,
features = harm_res1_topmarkers$gene,
group.by = "snn_harm_res.1")
```
# Save the Seurat object
```{r saverds2, warning = FALSE, fold.output = FALSE, fold.plot = FALSE}
## A name for our Seurat object file (rich naming)
out_name <- paste0(
output_dir, "/", paste(
c("Twelve", Seurat::Project(TD3A_TDCT_merge), "S5",
"Integrated", paste(
dim(TD3A_TDCT_merge), collapse = '.'
)
), collapse = "_"),
".RDS")
## Check (I like this)
print(out_name)
```
```{r saverds2b, warning = FALSE, fold.output = FALSE, fold.plot = FALSE}
## Write on disk
saveRDS(object = TD3A_TDCT_merge,
file = out_name)
```
------------------------------------------------------------------------
------------------------------------------------------------------------
# Rsession
```{r}
utils::sessionInfo()
```