Setting our variables
## Remote input file
remote.h5ad.file <- 'https://zenodo.org/record/4726927/files/Final_cell_annotated_object.h5ad?download=1'
## Working directory
user.id <- Sys.getenv('USER')
root.dir <- '/shared/projects/sincellte_2022'
data.dir <- paste(c(root.dir, 'Courses', 'Secondary_analyses', 'input', 'trajectory_analysis'), collapse = '/')
work.dir <- paste(c(root.dir, user.id, 'Courses', 'Secondary_analysis', 'trajectory_analysis'), collapse = '/')
## Seurat assay name
assay <- 'RNA'
## Contaminating cell types
contam.types <- c('Macrophages', 'RBC')
## RNG seed (for irlba PCA)
my.seed <- 1337L
## HVGs to keep
nfeatures <- 3000
## 1st reduction method
reduction <- 'pca'
## Number of PCs to produce
pca.dims <- 100
## Number of PCs to retain for 2nd reduction
maxdim <- 30
## Use scaled data instead of unscaled
use.scaled <- TRUE
## Cell type to use as root to orient the trajectory
root.group <- 'DN'
Loading source data
- First, the dataset was retrieved from Zenodo
dir.create(path = work.dir, recursive = TRUE, showWarnings = FALSE)
local.root.file <- paste0(work.dir, '/Original_scanpy_object.h5')
local.h5ad.file <- paste0(local.root.file, 'ad')
download.file(url = h5ad.file, destfile = local.h5ad.file,
method = 'wget', quiet = TRUE)
SeuratDisk::Convert(source = local.h5ad.file, dest = 'h5seurat',
assay = assay, verbose = FALSE, overwrite = TRUE)
- The converted object was loaded into R as a SeuratObject using the SeuratDisk::LoadH5Seurat() function. All optional slots (neighbors, graphs, reductions, images), won’t be loaded as we want to reprocess them.
local.h5s.file <- paste0(local.root.file, 'seurat')
sobj <- SeuratDisk::LoadH5Seurat(file = local.h5s.file, assays = assay,
reductions = FALSE, graphs = FALSE,
neighbors = FALSE, images = FALSE)
Prepping the object to our context
- The downloaded object contains cell-type annotation from the original analysis. From the publication, we learnt that two cell types are not related to T cells, thus considered as contaminants : macrophages and red blood cells (RBC). As their presence in the analysis space can modify the shape of the cells position, thus impact trajectory analysis, we decided to remove them :
sobj <- sobj[, !(sobj$cell_type %in% contam.types)]
- We then follow the Seurat4 canonical pipeline, applying log-normalization to the count data …
sobj <- Seurat::NormalizeData(object = sobj, assay = assay)
- … then scaling. As this object is a merge of 7 experiments (labelled as ‘batch’ in the object metadata), we want to regress this effect.
sobj <- Seurat::ScaleData(object = sobj, assay = assay,
vars.to.regress = c('batch'))
- … then identify HVGs. As we focus on subtypes of a single, main cell type (T-cells) in a differentiation context, we anticipate that we may have limited variation of gene expression (at least compared to, by example, multiple cell types of a differentiated organ). Thus, we increase the default number of features to output from 2,000 to 3,000.
sobj <- Seurat::FindVariableFeatures(object = sobj, assay = assay,
verbose = FALSE, nfeatures = nfeatures)
- … then reducing dimensions (through PCA). We request 100 PCs as output.
sobj <- Seurat::RunPCA(object = sobj, assay = assay, npcs = pca.dims,
reduction.name = reduction, seed.use = my.seed,
verbose = FALSE)
- … then building additional reductions for exploration / visualization (t-SNE, uMAP). After manual exploration, we decided to retain 30 PCs as input.
## TSNE
reduction2t <- paste0(reduction, '.', maxdim, '_tsne')
sobj <- Seurat::RunTSNE(object = sobj, dims = 1:maxdim, assay = assay,
reduction = reduction, seed.use = my.seed,
verbose = FALSE, reduction.name = reduction2t)
## uMAP
reduction2u <- paste0(reduction, '.', maxdim, '_umap')
sobj <- Seurat::RunUMAP(object = sobj, dims = 1:maxdim, assay = assay,
reduction = reduction, seed.use = my.seed,
verbose = FALSE, reduction.name = reduction2u)
Seurat::DimPlot(object = sobj, reduction = reduction2u, group.by = 'cell_type',
seed = my.seed)
- Lastly, we save the processed object (as a RDS file).
saveRDS(object = sobj, file = paste0(work.dir,'/Thymus_neonat_mm_Seurat_ENSEMBL.RDS'),
compress = 'bzip2')