EB3I n1 2025 scRNAseq
-
PRE-PROCESSING (I)
-
Load a count matrix, empty droplets & ambient RNA filtering

Question

# chunk_question ## What's the Answer to Life ? The Universe ? Everything ??

Answer 1

# chunk_answer cat("The Answer to the Great Question... is ... Forty-two. Six by nine : forty-two. That's it. That's all there is. (I always thought something was fundamentally wrong with the universe.)")

Answer 2

# a_r10X2 ## Reading the function help page ?Seurat::Read10X

Answer 3

# a_whatis ## To know the type of an R object : methods::is() ?methods::is

Answer 4

# a_struc ## To get a basic structure description : utils::str() ?utils::str

Answer 5

# a_knee ## . The "cliff" in the kneeplot is at a ## rank value a bit below ~ 1,000.

Answer 6

# a_kneered ## . You may observe that the selection of "true" cells (red) ## does not strictly correspond to a "cut" in the kneeplot ## descending curve. ## ## . This is because emptyDrops does not only "follow" the curve ## to determine a threshold corresponding to a "minimal count" ## value to consider a barcode as a cell, but also performs a ## statistical analysis for each barcode, considering how much ## its expression profile ressembles the one of other cells, ## even with a very different (lower) global level of expression.

Answer 7

# a_souprate The best way to know is to try it out =D

Answer 8

# a_descdiff ## . Total counts, max value, counts per cell and number of ## expressed features per cell, all have decreased. ## ## . The sparsity level has slightly increased (removing the soup ## obliterated all counts for some features in some cells). ## ## . All of this is absolutely expected.

Answer 9

# soupfeaturesdiff ## Compute the DIFFERENCE matrix (PRE / POST) ## NOTE : we may have 0 counts in the divider, ## so we increment both matrix by +1 ! cont_dif <- scmat_cells - scmat_unsoup ## Fraction of counts decreased by soup removal, per feature (ordered) feat_dif <- sort(sparseMatrixStats::rowMeans2(cont_dif), decreasing = TRUE) ### Display Top 5 features utils::head(feat_dif, decreasing = TRUE)

Answer 10

## Removing objects to free some RAM rm(cont_dif, feat_dif)

Answer 11

# a_umapsoupx ## . The level of expression measured in cells with smaller expression ## before SoupX has gone to almost none after : the ambient expression ## of this gene has efficiently been removed. ## ## . Hopefully, the cluster(s) with higher expression remain(s) high. ## ## . The expression scale upper bound after SoupX is higher than before ?! ## But we REMOVED some counts ?! While being counter-intuitive, this is ## a positive consequence of the ambient removal on the scaling of ## barcodes expression (see later). ## ## . The range of both X and Y axes have increased, depicting a better ## separation of / distance between some of the clusters. ## ## . The global topology of clusters remains close (but not identical). ## ## . Clusters compacity has increased.

EB3I n1 2025 scRNAseq-PRE-PROCESSING (I)-Load a count matrix, empty droplets & ambient RNA filtering