EB3I n1 2025 scRNAseq
-
INTEGRATION
-
EB3I n1 2025 scRNAseq
-
INTEGRATION
-
1 PREAMBLE
1.1 Purpose of this session
This file describes the different steps to perform sixth part of data processing for the single cell RNAseq data analysis training course for the EBAII n1 2024, covering these steps :
- Integration of a pair of samples
1.2 The integration data set
We will still use data from the Paiva et al. publication
While we focused on the KO sample
TD3auntil now, we will integrate it with its wild-type counterpartTDCTTDCTwas processed in a very similar manner (QC, doublers, filtering, LogNormalization, HVGs and scaling) to what you just performed forTD3ABoth samples were halved for their number of cells, for practical reasons in the context of this training (computation speed)
As the purpose is to integrate both samples into a single analytical entity, the objects we will use as input are Seurat objects up to the data scaling and regression (thus, there are no dimension reduction data in these)
2 Start Rstudio
- Using the OpenOnDemand cheat sheet, connect to the OpenOnDemand portal and create a Rstudio session with the right resource requirements.
3 Warm-up
- We set common parameters we will use throughout this session :
## Seed for the RNG (our sole parameter for this session)
my_seed <- 1337L
## Required for our big objects
options(future.globals.maxSize = 1E+09)4 Prepare the data structure
We will do the same as for former steps, just changing the session name :
4.1 Main directory
## Setting the project name
project_name <- "ebaii_sc_teachers" # Do not copy-paste this ! It's MY project !!
## Preparing the path
TD_dir <- paste0("/shared/projects/", project_name, "/SC_TD")
## Creating the root directory
dir.create(path = TD_dir, recursive = TRUE)
## Print the root directory on-screen
print(TD_dir)[1] "/shared/projects/ebaii_sc_teachers/SC_TD"
4.2 Current session
## Creating the session (Preproc.1) directory
session_dir <- paste0(TD_dir, "/06_Integration")
dir.create(path = session_dir, recursive = TRUE)
## Print the session directory on-screen
print(session_dir)[1] "/shared/projects/ebaii_sc_teachers/SC_TD/06_Integration"
5 Load input data
- The data consists into a pair of Seurat objects (one per sample) as RDS, hosted in a Zenodo respository (Id : 14081572)
5.1 Download data
- We will directly retrieve data from Zenodo to your
input_dir:
### Named files (will be used later on !)
TD3A_rds <- "TD3A_Filtered_12508.2018.RDS"
TDCT_rds <- "TDCT_Filtered_12254.1868.RDS"
## Filename(s) to retrieve
toget_files <- c(TD3A_rds,
TDCT_rds)
## Folder to store retrieved files
local_folder <- input_dir
## Use local backup ?
backup <- FALSE
if(backup) message("Using local backup !")
## Force download ?
force <- FALSE
if(force) message("Forcing (re)download !")
## Zenodo ID
zen_id <- '14094752'
### Define remote folder
remote_folder <- if (backup) "/shared/projects/2422_ebaii_n1/atelier_scrnaseq/TD/BACKUP/INTEGRATION/" else paste0("https://zenodo.org/records/", zen_id, "/files/")
### Reconstruct the input paths
remote_path <- paste0(remote_folder, "/", toget_files)
### Reconstruct the output paths
local_path <- paste0(local_folder, "/", toget_files)
## Retrieve files (if they don't exist), in loop
for (tg in seq_along(toget_files)) {
## If the file does not locally exist
if (!file.exists(local_path[tg]) | force) {
## Retrieve data
if(backup) {
file.copy(from = remote_path[tg],
to = local_path[tg])
} else {
download.file(url = remote_path[tg],
destfile = local_path[tg])
}
## Check if downloaded files exist locally
if(file.exists(local_path[tg])) message("\tOK")
} else message(paste0(toget_files[tg], " already downloaded !"))
}TD3A_Filtered_12508.2018.RDS already downloaded !
TDCT_Filtered_12254.1868.RDS already downloaded !
6 Merge datasets
- We simply merge the two objects into a single one :
TD3A_TDCT_merge <- merge(
x = TD3A_so, ## First object to merge
y = TDCT_so, ## Second object to merge
add.cell.ids = c("TD3A", "TDCT"), ## Keep track of sample of origin for cells (optional)
)- We can take a look at its summary
Show output
An object of class Seurat
12926 features across 3886 samples within 1 assay
Active assay: RNA (12926 features, 0 variable features)
4 layers present: counts.TD3A, counts.TDCT, data.TD3A, data.TDCT
[1] 12508 2018
[1] 12254 1868
[1] 12926 3886
6.1 Process the merged dataset
In order to visualize the data in a 2-D space, one needs to process it the way one learnt :
## All steps
TD3A_TDCT_merge <- NormalizeData(object = TD3A_TDCT_merge, verbose = FALSE)
TD3A_TDCT_merge <- FindVariableFeatures(object = TD3A_TDCT_merge, nfeatures = 2000, verbose = FALSE)
TD3A_TDCT_merge <- ScaleData(object = TD3A_TDCT_merge, verbose = FALSE)
TD3A_TDCT_merge <- RunPCA(object = TD3A_TDCT_merge, npcs = 50, verbose = FALSE)Show plot
6.2 Visualization of batch effect
Show plot
Show plot
CCL ??
7 Integration
7.1 Seurat : CCA
7.1.1 CCA integration
7.1.2 Seurat : RPCA
7.1.3 Seurat : Harmony
start <- Sys.time()
TD3A_TDCT_merge <- IntegrateLayers(object = TD3A_TDCT_merge,
orig.reduction = "pca",
method = HarmonyIntegration,
new.reduction = "harmony",
normalization.method = "LogNormalize")
end <- Sys.time()
computing_time["Harmony","comptime"] <- end-startShow output
| comptime | |
|---|---|
| cca | 6.603909 secs |
| rPCA | 8.666372 secs |
| Harmony | 2.732028 secs |
Reductions(object = TD3A_TDCT_merge)
7.2 Computing umap from “corrected” spaces
TD3A_TDCT_merge <- RunUMAP(object = TD3A_TDCT_merge,
reduction = "integrated.cca",
dims = 1:20,
reduction.name = "umapcca")TD3A_TDCT_merge <- RunUMAP(object = TD3A_TDCT_merge,
reduction = "integrated.rpca",
dims = 1:20,
reduction.name = "umaprpca")TD3A_TDCT_merge <- RunUMAP(object = TD3A_TDCT_merge,
reduction = "harmony",
dims = 1:20,
reduction.name = "umapharm")Show output
[1] "pca" "umap" "integrated.cca" "integrated.rpca"
[5] "harmony" "umapcca" "umaprpca" "umapharm"
7.3 Effect of the correction/integration
7.4 Visualization of the different methods of integration: Projecting the sample id onto the different umap spaces obtained
Show plot
7.4.1 Clustering
7.4.1.0.1 Graph based on CCA Intgration
7.4.1.0.2 Graph based on rPCA Intgration
7.4.1.0.3 Graph based on Harmony Intgration
TD3A_TDCT_merge <- FindNeighbors(object = TD3A_TDCT_merge,
reduction = "harmony",
dims = 1:20,
graph.name = "snn_harm")
TD3A_TDCT_merge <- FindClusters(object = TD3A_TDCT_merge,
graph.name = "snn_harm",
resolution = c(.5,.8,1),
algorithm = 4)Show plot
Show plot
Show plot
Show plot
Show plot
Show plot
table("snn_harm_res.1" = TD3A_TDCT_merge$snn_harm_res.1,
"snn_rpca_res.1" = TD3A_TDCT_merge$snn_rpca_res.1)Show output
snn_rpca_res.1
snn_harm_res.1 1 2 3 4 5 6 7 8 9 10 11 12 13
1 448 1 18 124 50 1 0 3 4 0 0 0 0
2 74 0 32 58 72 0 0 294 6 0 0 0 0
3 21 0 454 0 0 0 0 13 4 0 0 0 0
4 0 439 0 2 3 5 0 0 1 0 0 0 12
5 3 60 0 41 303 6 0 1 1 0 0 0 0
6 10 0 0 0 0 373 0 0 1 1 0 0 0
7 5 20 3 224 4 0 0 2 3 0 0 0 0
8 0 0 0 0 0 0 243 0 0 0 0 0 1
9 8 5 6 5 6 1 0 2 116 0 0 0 0
10 0 0 0 0 0 0 82 0 0 0 0 0 0
11 0 0 0 3 0 0 0 2 0 69 0 0 0
12 0 0 0 0 0 1 0 0 0 0 64 0 0
13 0 0 0 0 0 1 0 0 0 1 1 47 0
14 0 0 0 0 0 0 0 0 0 0 0 0 22
Show output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
TD3A 422 184 226 193 199 239 144 75 113 69 24 59 54 7 10
TDCT 286 326 229 207 197 114 136 168 31 32 58 8 9 43 24
Show output
1 2 3 4 5 6 7 8 9 10 11 12 13 14
TD3A 423 218 303 144 142 190 231 77 130 24 67 56 6 7
TDCT 226 318 189 318 273 195 30 167 19 58 7 9 44 15
If we have extra-time
Idents(object = TD3A_TDCT_merge) <- "snn_harm_res.1"
harm_res1_markers <- FindAllMarkers(
object = TD3A_TDCT_merge,
only.pos = TRUE,
logfc.threshold = 1
)DoHeatmap(object = TD3A_TDCT_merge,
features = harm_res1_topmarkers$gene,
group.by = "snn_harm_res.1")Show plot
8 Save the Seurat object
## A name for our Seurat object file (rich naming)
out_name <- paste0(
output_dir, "/", paste(
c("Twelve", Seurat::Project(TD3A_TDCT_merge), "S5",
"Integrated", paste(
dim(TD3A_TDCT_merge), collapse = '.'
)
), collapse = "_"),
".RDS")
## Check (I like this)
print(out_name)[1] "/shared/projects/ebaii_sc_teachers/SC_TD/06_Integration/RESULTS/Twelve_SeuratProject_S5_Integrated_12926.3886.RDS"
9 Rsession
Show output
R version 4.4.1 (2024-06-14)
Platform: x86_64-conda-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.4.1/lib/libopenblasp-r0.3.29.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Paris
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] future_1.49.0 dplyr_1.1.4 patchwork_1.3.0 ggplot2_3.5.2
[5] Seurat_5.3.0 SeuratObject_5.1.0 sp_2.2-0
loaded via a namespace (and not attached):
[1] deldir_2.0-4 pbapply_1.7-2 gridExtra_2.3
[4] rlang_1.1.6 magrittr_2.0.3 RcppAnnoy_0.0.22
[7] spatstat.geom_3.4-1 matrixStats_1.5.0 ggridges_0.5.6
[10] compiler_4.4.1 png_0.1-8 vctrs_0.6.5
[13] reshape2_1.4.4 stringr_1.5.1 pkgconfig_2.0.3
[16] fastmap_1.2.0 labeling_0.4.3 rmdformats_1.0.4
[19] promises_1.3.2 rmarkdown_2.29 purrr_1.0.4
[22] xfun_0.52 cachem_1.1.0 jsonlite_2.0.0
[25] goftest_1.2-3 later_1.4.2 spatstat.utils_3.1-4
[28] irlba_2.3.5.1 parallel_4.4.1 cluster_2.1.6
[31] R6_2.6.1 ica_1.0-3 spatstat.data_3.1-6
[34] bslib_0.9.0 stringi_1.8.7 RColorBrewer_1.1-3
[37] limma_3.60.6 reticulate_1.42.0 spatstat.univar_3.1-3
[40] parallelly_1.45.0 lmtest_0.9-40 jquerylib_0.1.4
[43] scattermore_1.2 Rcpp_1.0.14 bookdown_0.39
[46] knitr_1.50 tensor_1.5 future.apply_1.11.3
[49] zoo_1.8-14 sctransform_0.4.2 httpuv_1.6.15
[52] Matrix_1.7-3 splines_4.4.1 igraph_2.1.4
[55] tidyselect_1.2.1 abind_1.4-8 rstudioapi_0.17.1
[58] dichromat_2.0-0.1 yaml_2.3.10 spatstat.random_3.4-1
[61] codetools_0.2-20 miniUI_0.1.2 spatstat.explore_3.4-3
[64] listenv_0.9.1 lattice_0.22-6 tibble_3.2.1
[67] plyr_1.8.9 withr_3.0.2 shiny_1.10.0
[70] ROCR_1.0-11 evaluate_1.0.3 Rtsne_0.17
[73] fastDummies_1.7.5 survival_3.7-0 polyclip_1.10-7
[76] fitdistrplus_1.2-2 pillar_1.10.2 leidenbase_0.1.35
[79] KernSmooth_2.23-24 plotly_4.10.4 generics_0.1.4
[82] RcppHNSW_0.6.0 scales_1.4.0 globals_0.18.0
[85] xtable_1.8-4 RhpcBLASctl_0.23-42 glue_1.8.0
[88] lazyeval_0.2.2 tools_4.4.1 data.table_1.17.4
[91] RSpectra_0.16-2 RANN_2.6.2 dotCall64_1.2
[94] cowplot_1.1.3 grid_4.4.1 tidyr_1.3.1
[97] nlme_3.1-165 presto_1.0.0 cli_3.6.5
[100] spatstat.sparse_3.1-0 spam_2.11-1 viridisLite_0.4.2
[103] uwot_0.2.3 gtable_0.3.6 sass_0.4.10
[106] digest_0.6.37 progressr_0.15.1 ggrepel_0.9.6
[109] htmlwidgets_1.6.4 farver_2.1.2 htmltools_0.5.8.1
[112] lifecycle_1.0.4 httr_1.4.7 statmod_1.5.0
[115] mime_0.13 harmony_1.2.3 MASS_7.3-65