Allows to hide the TOC by default, display it with a button, move it to the right or left of the page



Load the latest Seurat object

Start a Rstudio session


Load the latest Seurat objects saved as RDS.

data_dir <- '/shared/projects/2325_ebaii/SingleCell/'
s1 <- readRDS(paste0(data_dir, 'TD_DATA/DATA_START/INTEGRATION/TD3A_LogNorm.RDS'))
s2 <- readRDS(paste0(data_dir, 'TD_DATA/DATA_START/INTEGRATION/TDCT_LogNorm.RDS'))

Have a look at the description of the data

EBAII.n1.SC.helper::seurat4_descriptor(sobj = s1, describe = "assay")
OBJECT VERSION :    5.0.0 
PROJECT :   [TD3A] 

[ASSAYS]
   ASSAY 1 :    [RNA] [ACTIVE] 
      SLOT 1 :  [counts]    Dims:[12508 x 4038]  Range:[0.00-1588.00] 
         Counts :   14263194 
         Sparsity : 86.87712% 
      SLOT 2 :  [data]  Dims:[12508 x 4038]  Range:[0.00-7.71] 
         Sparsity : 86.87712% 
      SLOT 3 :  [scale.data]    Dims:[0 x 0] 
EBAII.n1.SC.helper::seurat4_descriptor(sobj = s2, describe = "assay")
OBJECT VERSION :    5.0.0 
PROJECT :   [TDCT] 

[ASSAYS]
   ASSAY 1 :    [RNA] [ACTIVE] 
      SLOT 1 :  [counts]    Dims:[12254 x 3671]  Range:[0.00-1010.00] 
         Counts :   15936670 
         Sparsity : 87.47972% 
      SLOT 2 :  [data]  Dims:[12254 x 3671]  Range:[0.00-7.20] 
         Sparsity : 87.47972% 
      SLOT 3 :  [scale.data]    Dims:[0 x 0] 

Evaluation of batch effect

First, we will merge (concatenate) the data.

sobj <- merge(s1, s2, merge.data = TRUE)

dim(s1)
[1] 12508  4038
dim(s2)
[1] 12254  3671
dim(sobj)
[1] 12926  7709
rm(s1, s2)

We will have a look at their low dimension representation. With the help of the quick visualization function, we will plot UMAP and color the cells according to their sample of origin.

VIZ <- EBAII.n1.SC.helper::QnD_viz(sobj = sobj, assay = 'RNA', slot = 'data', group_by = 'orig.ident', return_object = TRUE)

VIZ
An object of class Seurat 
12926 features across 7709 samples within 1 assay 
Active assay: RNA (12926 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Cells are clearly separated depending on their sample of origin. There is a batch effect that needs to be corrected.

Integration with Harmony

Scaling and bias regression

Harmony works on normalized and scaled data. We will need to scale the new merged object. Scaling is also the good time to try to correct technical sources of biases.

Indeed, some technical factors can bias the data: they will affect the UMAP, the future clustering…

These technical factors can be the number of UMIs(nCount_RNA), the percent of mitochondrial RNA… All the factor that we see in the metadata table could be a source of bias.

head(sobj@meta.data)
                     orig.ident nCount_RNA nFeature_RNA log10_nCount_RNA
AAACCTGAGACGCTTT.1_1       TD3A       2807         1588         3.449324
AAACCTGAGGCATTGG.1_1       TD3A       2072         1365         3.316599
AAACCTGGTCAACATC.1_1       TD3A       2024         1340         3.306639
AAACCTGTCGAGGTAG.1_1       TD3A       1877         1241         3.273696
AAACCTGTCGATCCCT.1_1       TD3A       2213         1438         3.345766
AAACGGGCACTTACGA.1_1       TD3A       2445         1428         3.388456
                     percent_mt percent_rb percent_st CC_Seurat_S.Score
AAACCTGAGACGCTTT.1_1 0.01670814 0.12193388 0.02950587        -0.3206662
AAACCTGAGGCATTGG.1_1 0.03426641 0.05260618 0.02702703        -0.2089766
AAACCTGGTCAACATC.1_1 0.01975309 0.08592593 0.02419753        -0.2021102
AAACCTGTCGAGGTAG.1_1 0.02930208 0.08044752 0.03036761        -0.2546360
AAACCTGTCGATCCCT.1_1 0.03203971 0.07626354 0.02933213        -0.2495052
AAACGGGCACTTACGA.1_1 0.03312883 0.08957055 0.03353783        -0.3101915
                     CC_Seurat_G2M.Score CC_Seurat_Phase CC_Seurat_SmG2M.Score
AAACCTGAGACGCTTT.1_1         -0.14298427              G1           -0.17768197
AAACCTGAGGCATTGG.1_1         -0.15818260              G1           -0.05079398
AAACCTGGTCAACATC.1_1         -0.17575067              G1           -0.02635950
AAACCTGTCGAGGTAG.1_1         -0.01628821              G1           -0.23834777
AAACCTGTCGATCCCT.1_1         -0.13375529              G1           -0.11574990
AAACGGGCACTTACGA.1_1         -0.18062013              G1           -0.12957140
                     doublet_scds.hybrid doublet_scDblFinder doublet_union
AAACCTGAGACGCTTT.1_1               FALSE               FALSE         FALSE
AAACCTGAGGCATTGG.1_1               FALSE               FALSE         FALSE
AAACCTGGTCAACATC.1_1               FALSE               FALSE         FALSE
AAACCTGTCGAGGTAG.1_1               FALSE               FALSE         FALSE
AAACCTGTCGATCCCT.1_1               FALSE               FALSE         FALSE
AAACGGGCACTTACGA.1_1               FALSE               FALSE         FALSE
                     doublet_viz
AAACCTGAGACGCTTT.1_1     singlet
AAACCTGAGGCATTGG.1_1     singlet
AAACCTGGTCAACATC.1_1     singlet
AAACCTGTCGAGGTAG.1_1     singlet
AAACCTGTCGATCCCT.1_1     singlet
AAACGGGCACTTACGA.1_1     singlet

We will check if this is the case with a helper function

EBAII.n1.SC.helper::dimred_covar_cor(sobj = VIZ, dimred = 'pca', ncomp = 20)

The heatmap shows that the contribution of the technical factors to the PCA dimension. The strength of this contribution is indicated by the intensity of the blue color.

Ideally, if the factor is not a source of bias, its contribution to all dimensions is roughly the same. If the factor contributes a lot to a few dimensions, we should consider regressing it out.

Here we will need to remove the influence of these technical factor. We say that we regress out some variables. This is done within the ScaleData() function.


! technical issue: this step take time, we will not run the following command. We will load a scaled object instead.

## technical issue: on the cluster it take too much time, we will not run this command
## we will load a preconstructed file instead.
vtr = c('nCount_RNA', 'percent_rb')
sobj <- Seurat::ScaleData(object = sobj, assay = 'RNA', nfeatures = 2000, vars.to.regress = vtr)
sobj <- Seurat::FindVariableFeatures(sobj)

saveRDS(sobj, paste0(data_dir, 'TD_DATA/DATA_START/INTEGRATION/object_to_integrate.RDS'))
sobj <- readRDS(paste0(data_dir, 'TD_DATA/DATA_START/INTEGRATION/object_to_integrate.RDS'))
sobj <- Seurat::RunPCA(object = sobj, npcs = 50, seed.use = my_seed, verbose = FALSE)
EBAII.n1.SC.helper::dimred_covar_cor(sobj = sobj, dimred = 'pca', ncomp = 20)