suppressPackageStartupMessages({
library(tidyverse)
library(SingleCellExperiment)
library(scater)
library(velociraptor)
library(pheatmap)
library(ggplot2)
library(viridis)
})sincellTE 2024 - RNA velocity analysis with velociraptor
Objectives
RNA velocity is a popular single cell trajectory inference method. The objective of this session is to get an overview of the basic principles of performing RNA velocity analysis in R, using the velociraptor package, which provides a wrapper around the scVelo python package.
1. Data preparation
For this session, we will use an example data set from Hermann et al. (2018) about spermatogenesis https://doi.org/10.1016/j.celrep.2018.10.026.
For RNA velocity, spliced and unspliced RNA count matrices are needed. They can be obtained by making sure that introns are kept when running cellranger (--include-introns = TRUE) then performing velocyto on the output bam file. This will provide a .loom file containing expression matrices for both spliced and unspliced RNA.
Then, the data can be loaded into R and analysed. One problem appears: velociraptor takes SingleCellExperiment objects as input and is not compatible with Seurat object.
Just like Seurat objects, SingleCellExperiment class is a lightweight Bioconductor container for storing and manipulating single-cell genomics data. Its organization and syntax is just a little different.
In order to avoid confusion, we will not go into the details. If you need to use this class in the future, you can find documentation here: https://www.bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html
Besides, the single cell data need to be prepared for RNA velocity analysis, according to the steps we already know: filtering, normalization, identification of highly variable genes, dimensionality reduction… Several strategies are possible:
- use packages compatible with
SingleCellExperimentto prepare the data. The following code illustrates these steps but will not be run.
library(scRNAseq)
library(scuttle)
library(scater)
library(scran)
library(LoomExperiment)
# Import velocyto output
loom <- import("path/to/file.loom")
assayNames(loom)
# Extract the matrices from the appropriate assays (adjust the assay names as needed)
count_matrix <- assay(loom, "matrix") # This is typically the raw count matrix
spliced_matrix <- assay(loom, "spliced") # The spliced matrix (if present)
unspliced_matrix <- assay(loom, "unspliced") # The unspliced matrix (if present)
# Check the dimensions of the expression matrix
dim(count_matrix)
dim(spliced_matrix)
dim(unspliced_matrix)
# Create SingleCellExperiment object
sce <- SingleCellExperiment(
assays = list(
counts = count_matrix,
spliced = spliced_matrix,
unspliced = unspliced_matrix
)
)
cell_metadata <- colData(loom)
colData(sce) <- cell_metadata
gene_metadata <- rowData(loom)
rowData(sce) <- gene_metadata
# Inspect the object
sce
## Calculate log-normalized counts
sce <- scuttle::logNormCounts(sce)
## Find highly variable genes
dec <- scran::modelGeneVar(sce)
top.hvgs <- scran::getTopHVGs(dec, n = 2000)
## Perform dimensionality reduction
set.seed(100)
sce <- scater::runPCA(sce, subset_row = top.hvgs)
sce <- scater::runTSNE(sce, dimred = "PCA")
sce <- scater::runUMAP(sce, dimred = "PCA", n_neighbors = 30)let
velociraptorprepare the data automatically. The user will have less freedom regarding the option values.analyse the data with Seurat and construct a
SingleCellExperimentobject with the results.
For this session, we will load and visualize data already prepared.
user.id <- Sys.getenv("USER")
input_file <-paste("/shared/home2", user.id, "velocity/data/hermann.rds", sep = "/")
## load SingleCellExperiment object
sce <- readRDS(input_file)
## Plot tSNE representation
scater::plotTSNE(sce, colour_by = "celltype")scater::plotUMAP(sce, colour_by = "celltype")2. RNA velocity estimation
Next, we will perform the RNA velocity estimation.
In velociraptor, this is done using the scvelo() function, which provides a wrapper around the functionality of scVelo.
Take a moment to study the documentation of scvelo() - you will see for example that all the modes of scVelo (steady-state, stochastic, dynamical) can be used.
For our example, we will use the dynamical mode. It takes a bit longer to run than the steady state option, but typically gives more reliable results.
# too long, do not run
top.hvgs <- scran::getTopHVGs(dec, n = 2000)
velo.out <- velociraptor::scvelo(sce, subset.row = top.hvgs,
assay.X = "counts",
mode = "dynamical")results_file <-paste("/shared/home2", user.id, "velocity/data/velo.out.rds", sep = "/")
velo.out <- readRDS(results_file)The argument use.theirs (default = FALSE) can be set to TRUE to perform gene filtering and normalization with scVelo.
The scvelo() function performs the full RNA velocity estimation analysis, and in the process calls several functions from scVelo. Many steps are computed during the function execution:
- compute neighbors and moments
- estimate the kinetics parameters
- deduce the velocity (i.e. infer the future cell state)
- estimate velocity pseudotime and latent time. Both estimates try to provide a relative ordering of the cells, only the computation method is different