This file describes the different steps to perform first part of data processing for the single cell RNAseq data analysis training course for the EBAII n1 2023, covering these steps :
First, we will set some parameters
## Fixed seed
my_seed <- 1337
## Your project name on the IFB cluster (CHANGE IT FOR YOURS !)
my_project <- 'golf'You can load the latest Seurat object you saved as a RDS at the preceding step (Proc.1 : normalization with LN, and scaling using 2000 HVGs) :
This step originates from the observation that we do not want nor need to characterize each of our thousends of cells, but groups of them (clusters ? cell types ? other ?). Thus, we do no need all data, and even may benefit from such a reduction :
Here, we will use the grand-mother of all : the PCA (Principal Component Analysis)
But how ? (did you see it coming ?)
Question 1 : How many principal components (PC) will be generated by default ?
## . The answer is 50 (npcs parameter)
## . We will use this default value.
## . Warning : in some (rare) contexts, this
##   may not be enough !Question 2 : Which data type (ie, which Seurat object slot) will be used to generate the components ?
## . Data from the @scale.data will be used
## . This is unfortunately not explicit
##   from the Seurat::RunPCA help page !Perform PCA on our data
Visualization of the very first two components :
Question 1 : Give me your interpretation / feelings from this plot !
Question 2 : Should we stop at using 2 dimensions to interpret our data ?
Description :
OBJECT VERSION :    5.0.0 
PROJECT :   [TD3A] 
[DIMREDS]
   DIMRED 1 : [pca]  Dims:[4038 x 50] 
This final processing step required to finaly observe our data requires a novel dimension reduction method with a very high challeng to overcome : reduce a space of dozens of dimensions to just 2, or 3 !
We will use the UMAP method.
BUT HOW ? (I’m pretty sure you saw it coming this time)
So, we now have to choose the number of PCA dimensions to use for this UMAP reduction.
Question : Do you have an idea of this number ?
## . Impossible to guess with our current knowledge.
## . But we can get some help from the PCA data itself
## . If you said a value above the 50 components we
##   generated for our PCA, you should wear the
##   cone of shame !We will use a very simple graphical method : the observation of the amount of global variance explained by each component.
Question : Any more precise idea ?
To demonstrate the effect of the number of PC dimensions used as input to the UMAP generation, I will perform a DEMO using 4 different PC values : 3, 7, 23 and 49.
Question : Your conclusion ?
## . Very few PCs are not able to retrieve
##   a sufficiently defined structure.
## . The differences between 25 and 49 are
##   limited in the global structure.
##   This may imply that the additional
##   components above 25 do not add enough
##   noise to alter this structure.You can now perform a final UMAP with the PC dimensions of your choice.
For the next steps of the training, I used 20 dimensions.
You can now save the results of your hard work :
While by default Seurat::RunUMAP will produce 2-dimension reductions, the method can generate further components.
Despite our limited brain, this is sometimes interesting and useful to attempt a reduction to 3 dimensions instead of 2. This can be very effective when looking for trajectories.
We can generate a UMAP from 25 PCs, requesting 3 UMAP components :
## UMAP from 25 PCs, 3 components requested
sobj <- Seurat::RunUMAP(
  object = sobj, assay = 'RNA', 
  graph.name = 'RNA_snn', 
  reduction = 'pca', dims = 1:25, 
  seed.use = my_seed,
  n.components = 3)
## DimPlot of the first 2 UMAP components
Seurat::DimPlot(
  object = sobj, 
  reduction = 'umap')Question : Isn’t there something striking ?
## . The plot is not the same as when
##   using 25 PCs and requesting 2 UMAP
##   components instead of 3 here !
## . The 2 components of a 2D UMAP are not
##   the same as the two first components
##   of a dim>2 UMAP.Let’s perform a 3D representation of our UMAP
## Structure data to plot in a data.frame
df3d <- as.data.frame(
  sobj@reductions$umap@cell.embeddings)
## 3D plot
plotly::plot_ly(
  data = df3d, x = ~UMAP_1, 
  y = ~UMAP_2, z = ~UMAP_3, 
  type = 'scatter3d', 
  marker = list(size = 2, width=2))Rsession
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
time zone: Europe/Paris
tzcode source: system (glibc)
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] Biobase_2.62.0      BiocGenerics_0.48.0
loaded via a namespace (and not attached):
  [1] matrixStats_1.0.0             spatstat.sparse_3.0-3        
  [3] bitops_1.0-7                  webshot_0.5.5                
  [5] doParallel_1.0.17             httr_1.4.7                   
  [7] RColorBrewer_1.1-3            Rgraphviz_2.46.0             
  [9] tools_4.3.1                   sctransform_0.4.1            
 [11] backports_1.4.1               DT_0.30                      
 [13] utf8_1.2.4                    R6_2.5.1                     
 [15] lazyeval_0.2.2                uwot_0.1.16                  
 [17] GetoptLong_1.0.5              withr_2.5.2                  
 [19] sp_2.1-1                      prettyunits_1.2.0            
 [21] gridExtra_2.3                 progressr_0.14.0             
 [23] cli_3.6.1                     spatstat.explore_3.2-5       
 [25] TSP_1.2-4                     topGO_2.54.0                 
 [27] labeling_0.4.3                sass_0.4.7                   
 [29] Seurat_4.4.0                  spatstat.data_3.0-3          
 [31] genefilter_1.84.0             ggridges_0.5.4               
 [33] pbapply_1.7-2                 foreign_0.8-85               
 [35] AnnotationForge_1.44.0        parallelly_1.36.0            
 [37] limma_3.58.1                  rstudioapi_0.15.0            
 [39] RSQLite_2.3.2                 shape_1.4.6                  
 [41] GOstats_2.68.0                generics_0.1.3               
 [43] crosstalk_1.2.0               ica_1.0-3                    
 [45] spatstat.random_3.2-1         dendextend_1.17.1            
 [47] dplyr_1.1.3                   GO.db_3.18.0                 
 [49] Matrix_1.6-1.1                fansi_1.0.5                  
 [51] S4Vectors_0.40.1              abind_1.4-5                  
 [53] lifecycle_1.0.3               SoupX_1.6.2                  
 [55] yaml_2.3.7                    edgeR_4.0.1                  
 [57] SummarizedExperiment_1.30.2   SparseArray_1.2.0            
 [59] BiocFileCache_2.10.1          Rtsne_0.16                   
 [61] grid_4.3.1                    blob_1.2.4                   
 [63] promises_1.2.1                dqrng_0.3.1                  
 [65] ExperimentHub_2.10.0          crayon_1.5.2                 
 [67] shinydashboard_0.7.2          miniUI_0.1.1.1               
 [69] lattice_0.22-5                beachmat_2.18.0              
 [71] cowplot_1.1.1                 annotate_1.80.0              
 [73] KEGGREST_1.42.0               ComplexHeatmap_2.18.0        
 [75] pillar_1.9.0                  knitr_1.45                   
 [77] metapod_1.10.0                GenomicRanges_1.54.1         
 [79] rjson_0.2.21                  future.apply_1.11.0          
 [81] codetools_0.2-19              leiden_0.4.3                 
 [83] glue_1.6.2                    data.table_1.14.8            
 [85] vctrs_0.6.4                   png_0.1-8                    
 [87] spam_2.10-0                   gtable_0.3.4                 
 [89] assertthat_0.2.1              cachem_1.0.8                 
 [91] xfun_0.41                     pcaExplorer_2.28.0           
 [93] S4Arrays_1.2.0                mime_0.12                    
 [95] survival_3.5-7                pheatmap_1.0.12              
 [97] seriation_1.5.1               SingleCellExperiment_1.22.0  
 [99] iterators_1.0.14              statmod_1.5.0                
[101] bluster_1.12.0                interactiveDisplayBase_1.40.0
[103] ellipsis_0.3.2                fitdistrplus_1.1-11          
[105] ROCR_1.0-11                   Category_2.68.0              
[107] nlme_3.1-163                  bit64_4.0.5                  
[109] threejs_0.3.3                 progress_1.2.2               
[111] filelock_1.0.2                RcppAnnoy_0.0.21             
[113] GenomeInfoDb_1.38.0           bslib_0.5.1                  
[115] irlba_2.3.5.1                 KernSmooth_2.23-22           
[117] rpart_4.1.21                  colorspace_2.1-0             
[119] DBI_1.1.3                     Hmisc_5.1-1                  
[121] celldex_1.12.0                nnet_7.3-19                  
[123] DESeq2_1.42.0                 tidyselect_1.2.0             
[125] bit_4.0.5                     compiler_4.3.1               
[127] curl_5.1.0                    graph_1.80.0                 
[129] htmlTable_2.4.2               BiocNeighbors_1.20.0         
[131] SparseM_1.81                  xml2_1.3.5                   
[133] DelayedArray_0.28.0           plotly_4.10.3                
[135] checkmate_2.2.0               scales_1.2.1                 
[137] lmtest_0.9-40                 RBGL_1.78.0                  
[139] NMF_0.26                      rappdirs_0.3.3               
[141] stringr_1.5.0                 digest_0.6.33                
[143] goftest_1.2-3                 shinyBS_0.61.1               
[145] spatstat.utils_3.0-4          rmarkdown_2.25               
[147] ca_0.71.1                     XVector_0.42.0               
[149] htmltools_0.5.6.1             pkgconfig_2.0.3              
[151] base64enc_0.1-3               SingleR_2.4.0                
[153] sparseMatrixStats_1.14.0      MatrixGenerics_1.14.0        
[155] highr_0.10                    dbplyr_2.4.0                 
[157] fastmap_1.1.1                 GlobalOptions_0.1.2          
[159] rlang_1.1.1                   htmlwidgets_1.6.2            
[161] shiny_1.7.5.1                 DelayedMatrixStats_1.24.0    
[163] farver_2.1.1                  jquerylib_0.1.4              
[165] zoo_1.8-12                    jsonlite_1.8.7               
[167] BiocParallel_1.36.0           BiocSingular_1.18.0          
[169] RCurl_1.98-1.12               magrittr_2.0.3               
[171] Formula_1.2-5                 scuttle_1.12.0               
[173] GenomeInfoDbData_1.2.11       dotCall64_1.1-0              
[175] patchwork_1.1.3               munsell_0.5.0                
[177] Rcpp_1.0.11                   viridis_0.6.4                
[179] reticulate_1.34.0             stringi_1.7.12               
[181] zlibbioc_1.48.0               MASS_7.3-60                  
[183] AnnotationHub_3.10.0          plyr_1.8.9                   
[185] parallel_4.3.1                listenv_0.9.0                
[187] ggrepel_0.9.4                 deldir_1.0-9                 
[189] Biostrings_2.70.1             splines_4.3.1                
[191] tensor_1.5                    circlize_0.4.15              
[193] hms_1.1.3                     locfit_1.5-9.8               
[195] igraph_1.5.1                  spatstat.geom_3.2-7          
[197] rngtools_1.5.2                reshape2_1.4.4               
[199] biomaRt_2.58.0                stats4_4.3.1                 
[201] ScaledMatrix_1.10.0           BiocVersion_3.18.0           
[203] XML_3.99-0.14                 evaluate_0.23                
[205] SeuratObject_5.0.0            BiocManager_1.30.22          
[207] scran_1.30.0                  foreach_1.5.2                
[209] httpuv_1.6.12                 RANN_2.6.1                   
[211] tidyr_1.3.0                   purrr_1.0.2                  
[213] polyclip_1.10-6               clue_0.3-65                  
[215] heatmaply_1.5.0               future_1.33.0                
[217] scattermore_1.2               ggplot2_3.4.4                
[219] gridBase_0.4-7                rsvd_1.0.5                   
[221] xtable_1.8-4                  later_1.3.1                  
[223] viridisLite_0.4.2             tibble_3.2.1                 
[225] EBAII.n1.SC.helper_0.0.2      registry_0.5-1               
[227] memoise_2.0.1                 AnnotationDbi_1.64.0         
[229] IRanges_2.36.0                cluster_2.1.4                
[231] globals_0.16.2                GSEABase_1.64.0              
[233] shinyAce_0.4.2