--- title: "PROCESSING (II) Student's version" date: "2023-11-05.10" author: - name: "YOU" email: "you@you.you" output: html_document: highlight: true ## Theme for the code chunks number_sections: true ## Adds number to headers (sections) theme: flatly ## CSS theme for the HTML page toc: false ## Adds a table of content self_contained: true ## Includes all plots/images within the HTML code_folding: show thumbnails: false lightbox: true fig_caption: false gallery: true use_bookdown: true always_allow_html: true ## Allow plain HTML code in the Rmd --- This file describes the different steps to perform first part of data processing for the single cell RNAseq data analysis training course for the EBAII n1 2023, covering these steps : * Reduction dimension of the expression data * Visualization targeted at a human brain # Load the latest Seurat object Start a **Rstudio** session First, we will set some parameters ```{r seed} ## Fixed seed my_seed <- 1337 ## Your project name on the IFB cluster (CHANGE IT FOR YOURS !) my_project <- 'golf' ``` You can load the latest Seurat object you saved as a RDS at the preceding step (Proc.1 : normalization with LN, and scaling using 2000 HVGs) : ```{r read_trick} sobj <- readRDS(file = paste0('/shared/projects/', my_project, '/TD/RESULTS/TD3A_02_Scaled.2K.RDS')) ``` ***OR***_, if you did not (could not) create it earlierly, you can load it from here :_ ```{r read_safe} sobj <- readRDS(file = '/shared/projects/2325_ebaii/SingleCell/TD_DATA/DATA_START/DIMENSION_REDUCTION/TD3A_02_Scaled.2K.RDS') ``` # Dimension reduction This step originates from the observation that we do not want nor need to characterize **each** of our **thousends of cells**, but **groups** of them (clusters ? cell types ? other ?). Thus, we do no need all data, and even may benefit from such a reduction : * Reduce the data complexity * For interpretation * For computations * Increase the quality of information contained in the data * **Enriching** "good biological **signals**" * **Discarding noise** / cell-specific signals There is a **multitude of methods** for dimension reduction Here, we will use the grand-mother of all : the PCA (Principal Component Analysis) _But how ? (did you see it coming ?)_ ```{r h_RunPCA, class.source = "fold-hide"} ?Seurat::RunPCA ``` ***Question 1*** : How many principal components (PC) will be generated by default ? ***Question 2*** : Which data type (ie, which Seurat object ***slot***) will be used to generate the components ? Perform PCA on our data ```{r PCA} sobj <- Seurat::RunPCA( object = sobj, assay = 'RNA', seed.use = my_seed, verbose = FALSE) ``` Visualization of the very first two components : ```{r PCAplot, fig.align='center'} Seurat::DimPlot( object = sobj, reduction = 'pca', dims = c(1,2), group.by = 'CC_Seurat_Phase') ``` ***Question 1*** : Give me your interpretation / feelings from this plot ! ***Question 2*** : Should we stop at using 2 dimensions to interpret our data ? Description : ```{r PCAdesc} EBAII.n1.SC.helper::seurat4_descriptor(sobj = sobj, describe = 'dimred') ``` * Maybe we shoud reduce information a tad more, just for the sake of ... * ... understanding our data ... * ...with our poooooor human brains ... * ... born and raised in a 3D euclidean world. # Visualization This final processing step required to finaly **observe** our data requires a novel dimension reduction method with a very high challeng to overcome : reduce a space of dozens of dimensions to just 2, or 3 ! We will use the UMAP method. _**BUT HOW ?** (I'm pretty sure you saw it coming this time)_ ```{r, class.source = "fold-hide"} ?Seurat::RunUMAP ``` ## Selecting dimensions So, we now have to choose the number of PCA dimensions to use for this UMAP reduction. ***Question*** : Do you have an idea of this number ? We will use a very simple graphical method : the observation of the amount of global variance explained by each component. ```{r h_elbow} ?Seurat::ElbowPlot ``` ```{r elbow, fig.align='center'} Seurat::ElbowPlot( object = sobj, ndims = 50) ``` ***Question*** : Any more precise idea ? ## Assessing dimensions To demonstrate the effect of the number of PC dimensions used as input to the UMAP generation, I will perform a DEMO using 4 different PC values : **3, 7, 23 and 49**. **3 PCs** ```{r umap3} sobj <- Seurat::RunUMAP( object = sobj, assay = 'RNA', graph.name = 'RNA_snn', reduction = 'pca', dims = 1:3, seed.use = my_seed) Seurat::DimPlot( object = sobj, reduction = 'umap') ``` **7 PCs** ```{r umap7} sobj <- Seurat::RunUMAP( object = sobj, assay = 'RNA', graph.name = 'RNA_snn', reduction = 'pca', dims = 1:7, seed.use = my_seed) Seurat::DimPlot( object = sobj, reduction = 'umap') ``` **25 PCs** ```{r umap25} sobj <- Seurat::RunUMAP( object = sobj, assay = 'RNA', graph.name = 'RNA_snn', reduction = 'pca', dims = 1:25, seed.use = my_seed) Seurat::DimPlot( object = sobj, reduction = 'umap') ``` **49 PCs** ```{r umap49} ## OH NO !! ## SOMEONE DELETED THE Seurat::RunUMAP COMMAND OF THAT CHUNK :( ## IT'S UP TO YOU :) Seurat::DimPlot( object = sobj, reduction = 'umap') ``` ***Question*** : Your conclusion ? You can now perform a final UMAP with the PC dimensions of your choice. For the next steps of the training, I used 20 dimensions, but we will stop using TD3A solely for now. _**Towards level 2** : Try creating your chunk for 20 dims._ # Save You can now save the results of your hard work : ```{r save} saveRDS(object = sobj, file = paste0('/shared/projects/', my_project, '/TD/RESULTS/TD3A_03_DimRed.RDS'), compress = 'bzip2') ``` _Rsession_ ```{r rsession} utils::sessionInfo() ```