suppressPackageStartupMessages({
library(tidyverse)
library(SingleCellExperiment)
library(scater)
library(velociraptor)
library(pheatmap)
library(ggplot2)
library(viridis)
})
sincellTE 2024 - RNA velocity analysis with velociraptor
Objectives
RNA velocity is a popular single cell trajectory inference method. The objective of this session is to get an overview of the basic principles of performing RNA velocity analysis in R, using the velociraptor package, which provides a wrapper around the scVelo python package.
1. Data preparation
For this session, we will use an example data set from Hermann et al. (2018) about spermatogenesis https://doi.org/10.1016/j.celrep.2018.10.026.
For RNA velocity, spliced and unspliced RNA count matrices are needed. They can be obtained by making sure that introns are kept when running cellranger (--include-introns = TRUE
) then performing velocyto on the output bam file. This will provide a .loom
file containing expression matrices for both spliced and unspliced RNA.
Then, the data can be loaded into R and analysed. One problem appears: velociraptor
takes SingleCellExperiment
objects as input and is not compatible with Seurat
object.
Besides, the single cell data need to be prepared for RNA velocity analysis, according to the steps we already know: filtering, normalization, identification of highly variable genes, dimensionality reduction… Several strategies are possible:
- use packages compatible with
SingleCellExperiment
to prepare the data. The following code illustrates these steps but will not be run.
library(scRNAseq)
library(scuttle)
library(scater)
library(scran)
library(LoomExperiment)
# Import velocyto output
<- import("path/to/file.loom")
loom
assayNames(loom)
# Extract the matrices from the appropriate assays (adjust the assay names as needed)
<- assay(loom, "matrix") # This is typically the raw count matrix
count_matrix <- assay(loom, "spliced") # The spliced matrix (if present)
spliced_matrix <- assay(loom, "unspliced") # The unspliced matrix (if present)
unspliced_matrix
# Check the dimensions of the expression matrix
dim(count_matrix)
dim(spliced_matrix)
dim(unspliced_matrix)
# Create SingleCellExperiment object
<- SingleCellExperiment(
sce assays = list(
counts = count_matrix,
spliced = spliced_matrix,
unspliced = unspliced_matrix
)
)
<- colData(loom)
cell_metadata colData(sce) <- cell_metadata
<- rowData(loom)
gene_metadata rowData(sce) <- gene_metadata
# Inspect the object
sce
## Calculate log-normalized counts
<- scuttle::logNormCounts(sce)
sce
## Find highly variable genes
<- scran::modelGeneVar(sce)
dec <- scran::getTopHVGs(dec, n = 2000)
top.hvgs
## Perform dimensionality reduction
set.seed(100)
<- scater::runPCA(sce, subset_row = top.hvgs)
sce <- scater::runTSNE(sce, dimred = "PCA")
sce <- scater::runUMAP(sce, dimred = "PCA", n_neighbors = 30) sce
let
velociraptor
prepare the data automatically. The user will have less freedom regarding the option values.analyse the data with Seurat and construct a
SingleCellExperiment
object with the results.
For this session, we will load and visualize data already prepared.
<- Sys.getenv("USER")
user.id <-paste("/shared/home2", user.id, "velocity/data/hermann.rds", sep = "/")
input_file
## load SingleCellExperiment object
<- readRDS(input_file)
sce
## Plot tSNE representation
::plotTSNE(sce, colour_by = "celltype") scater
::plotUMAP(sce, colour_by = "celltype") scater
2. RNA velocity estimation
Next, we will perform the RNA velocity estimation.
In velociraptor
, this is done using the scvelo()
function, which provides a wrapper around the functionality of scVelo
.
Take a moment to study the documentation of scvelo()
- you will see for example that all the modes of scVelo (steady-state, stochastic, dynamical) can be used.
For our example, we will use the dynamical mode. It takes a bit longer to run than the steady state option, but typically gives more reliable results.
# too long, do not run
<- scran::getTopHVGs(dec, n = 2000)
top.hvgs
<- velociraptor::scvelo(sce, subset.row = top.hvgs,
velo.out assay.X = "counts",
mode = "dynamical")
<-paste("/shared/home2", user.id, "velocity/data/velo.out.rds", sep = "/")
results_file <- readRDS(results_file) velo.out
The scvelo()
function performs the full RNA velocity estimation analysis, and in the process calls several functions from scVelo. Many steps are computed during the function execution:
- compute neighbors and moments
- estimate the kinetics parameters
- deduce the velocity (i.e. infer the future cell state)
- estimate velocity pseudotime and latent time. Both estimates try to provide a relative ordering of the cells, only the computation method is different
Its output of scvelo()
is another SingleCellExperiment
object.
velo.out
class: SingleCellExperiment
dim: 2000 2325
metadata(5): neighbors recover_dynamics velocity_params velocity_graph
velocity_graph_neg
assays(10): X spliced ... velocity velocity_u
rownames(2000): ENSMUSG00000038015.6 ENSMUSG00000022501.6 ...
ENSMUSG00000033813.15 ENSMUSG00000029088.16
rowData names(18): fit_r2 fit_alpha ... velocity_genes varm
colnames(2325): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... ATCCACCCACCACCAG
ATTGGTGGTTACCGAT
colData names(8): velocity_self_transition root_cells ...
velocity_confidence velocity_confidence_transition
reducedDimNames(1): X_pca
mainExpName: NULL
altExpNames(0):
The results of scvelo()
are not added to the input SingleCellExperiment
object, since the two objects can have different dimensions (in particular note the difference in the number of genes). Thus, in order to use information from the velocity calculations together with information from the original object, we need to copy slots from one object to the other.
Here, we copy the reduced dimension representations from the original SingleCellExperiment
object to the output of scvelo()
, and use the latter to construct tSNE representations colored by various parameters.
## copy dimension reduction into scvelo output
reducedDims(velo.out) <- reducedDims(sce)
## copy cell annotation into scvelo output
$celltype <- sce$celltype velo.out
3. Velocity plots
3.1 Trajectory overview
The results of the velocity estimation can be plotted on the previous low-dimension representation of the data
## plot scvelo results
::plot_grid(
cowplot::plotTSNE(velo.out, colour_by = "celltype"),
scater::plotTSNE(velo.out, colour_by = "velocity_pseudotime"),
scater::plotTSNE(velo.out, colour_by = "latent_time"),
scaterncol = 1, align = "v"
)
The velociraptor package also provides plotting functions to visualize the RNA velocity results.
For example, we can embed the estimated velocities into a provided low-dimensional space and plot them as a vector field.
<- velociraptor::embedVelocity(reducedDim(velo.out, "TSNE"), velo.out) embedded
ℹ Using the 'X' assay as the X matrix
computing velocity embedding
finished (0:00:00) --> added
'velocity_target', embedded velocity vectors (adata.obsm)
<- velociraptor::gridVectors(reducedDim(velo.out, "TSNE"), embedded)
grid.df
plotTSNE(velo.out, dimred = "TSNE", colour_by = "velocity_pseudotime") +
geom_segment(data = grid.df,
mapping = aes(x = start.TSNE1, y = start.TSNE2,
xend = end.TSNE1, yend = end.TSNE2),
arrow = arrow(length = unit(0.05, "inches")))
3.2 Quality checks
Next, we will look at individual genes via their phase portraits. This is often useful in order to understand how the velocities are affected/supported by specific genes.
We can of course plot genes that we are already familiar with and that we know are related to the process of interest. We can also extract genes with particularly strong influence on the velocity results. ‘Driver genes,’ which display a strong dynamic behavior, can for example be detected via their high likelihoods in the dynamic model.
A score called likelihood value of the fit (fit_likelihood
) indicates how well a cell is described by the learned phase trajectory. Averaged across all the cells, it can be used to estimate the quality of a gene in the velocity estimation. Here, we select the genes with the largest and smallest likelihood value for the fit and plot their phase portrait.
<- rowData(velo.out) %>%
toplh as.data.frame() %>%
arrange(desc(fit_likelihood)) %>%
slice_head(n=100) %>%
rownames()
plotVelocity(velo.out, use.dimred = "TSNE", genes = toplh[1:6],
color_by = "celltype")
<- rowData(velo.out) %>%
botlh as.data.frame() %>%
arrange(fit_likelihood) %>%
slice_head(n=3) %>%
rownames()
plotVelocity(velo.out, use.dimred = "TSNE", genes = botlh,
color_by = "celltype")
Warning: Removed 10 rows containing missing values or values outside the scale range
(`geom_point()`).
We will also illustrate two summary statistics returned by scVelo:
the length of the velocity vector encodes the speed or rate of differentiation.
the ‘velocity confidence’ provides a measure of the coherence of the velocity vector field, that is, how well the velocity vector for a cell correlates with those of its neighbors.
::plot_grid(
cowplot::plotTSNE(velo.out, colour_by = "velocity_length"),
scater::plotTSNE(velo.out, colour_by = "velocity_confidence"),
scaterncol = 1, align = "v"
)
3.3 Gene expression heatmap
A classical plot provided by the python version of scVelo
is the expression of genes as a function of pseudotime. In R, it will need a little more efforts.
## add genes counts and log normalized counts to the scvelo object
assay(velo.out, "counts") <- assay(velo.out, "X")
<- scuttle::logNormCounts(velo.out)
velo.out
## order cells according to pseudo-time
<- velo.out@colData %>%
cell_order as.data.frame() %>%
arrange(velocity_pseudotime) %>%
rownames()
<- assay(velo.out, "Ms")[toplh, cell_order]
df
## smooth signal with convolution
<- 30
kernel_size <- rep(1/kernel_size, kernel_size) # Simple moving average kernel
kernel
<- apply(df, 1, function(x) {
df_convolved ::filter(x, filter = kernel, sides = 2) # Apply convolution with a centered kernel
stats
})
<- colnames(df_convolved)[order(apply(df_convolved, 2, which.max))]
genes_order
## extract ordered matrix
<- (assay(velo.out, "logcounts"))[genes_order,cell_order]
df
## prepare heatmap annotation
<- velo.out@colData[,c("celltype", "latent_time")] %>%
annot as.data.frame()
<- list(latent_time = c("Blue", "Yellow"))
annot_color
<- function(xs, n = 10) {
quantile_breaks <- quantile(xs, probs = seq(0, 1, length.out = n))
breaks !duplicated(breaks)]
breaks[
}
<- quantile_breaks(scale(df), n = 52)
mat_breaks
## plot heatmap
::pheatmap(df,
pheatmapcolor = viridis(n=51),
scale = "row",
breaks = mat_breaks,
cluster_col = F,
annotation_col = annot,
annotation_colors = annot_color,
show_colnames = F,
cluster_rows = F,
fontsize = 6)
R/python interoperability
In this tutorial, we will use the velociraptor R/Bioconductor package to run the RNA velocity analysis. This package provides a wrapper around the scVelo python package, and lets you run most of the essential analysis steps directly from your R session.
However, in other situations you may want to use scVelo or other python packages directly, despite performing the initial analysis in R.
A large number of tools from the python-based single-cell ecosystem are based around the so callled AnnData format, which is a type of python object that can hold largely similar types of information as SingleCellExperiment
or Seurat
objects. In order to use our preprocessed data in such python tools, we first need to convert our SingleCellExperiment
to AnnData
format.
The zellkonverter
package provides a convenient way to convert between these two formats. To illustrate how it works, let us convert the SingleCellExperiment
object to AnnData
format and save it to a file that can be directly read into python.
suppressPackageStartupMessages({
library(zellkonverter)
})
::writeH5AD(hermann, file = "Hermann_AnnData.h5ad",
zellkonverterX_name = "counts")
In a python session, we could read this object back and continue with any downstream analysis. More information about the AnnData
format can be found in AnnData documentation.
zellkonverter
can also be used to read an .h5ad file (e.g., that was saved from python) into an R session:
<- zellkonverter::readH5AD("Hermann_AnnData.h5ad") sce
Aknowledgement
This document is an adaptation of C. Soneson’s previous work for the school sincellTE 2022. You can consult the complete and very detailed course and practical session at this address: https://csoneson.github.io/sincellTE2022/
References and documentation
scVelo: https://scvelo.readthedocs.io/en/stable/index.html
velociraptor: https://www.bioconductor.org/packages/release/bioc/vignettes/velociraptor/inst/doc/velociraptor.html
Bergen, Volker, Marius Lange, Stefan Peidli, F Alexander Wolf, and Fabian J Theis. 2020. “Generalizing RNA Velocity to Transient Cell States Through Dynamical Modeling.” Nature Biotechnology 38: 1408–14.
Hermann, Brian P, Keren Cheng, Anukriti Singh, Lorena Roa-De La Cruz, Kazadi N Mutoji, I-Chung Chen, Heidi Gildersleeve, et al. 2018. “The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids.” Cell Rep. 25: 1650–1667.e8.
Session info
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-conda-linux-gnu
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS/LAPACK: /shared/software/miniconda/envs/r-4.4.1/lib/libopenblasp-r0.3.27.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Paris
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] viridis_0.6.5 viridisLite_0.4.2
[3] pheatmap_1.0.12 velociraptor_1.14.3
[5] scater_1.32.1 scuttle_1.14.0
[7] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
[9] Biobase_2.64.0 GenomicRanges_1.56.1
[11] GenomeInfoDb_1.40.1 IRanges_2.38.1
[13] S4Vectors_0.42.1 BiocGenerics_0.50.0
[15] MatrixGenerics_1.16.0 matrixStats_1.4.1
[17] lubridate_1.9.3 forcats_1.0.0
[19] stringr_1.5.1 dplyr_1.1.4
[21] purrr_1.0.2 readr_2.1.5
[23] tidyr_1.3.1 tibble_3.2.1
[25] ggplot2_3.5.1 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 farver_2.1.2
[3] vipor_0.4.7 filelock_1.0.3
[5] fastmap_1.2.0 digest_0.6.37
[7] rsvd_1.0.5 timechange_0.3.0
[9] lifecycle_1.0.4 magrittr_2.0.3
[11] compiler_4.4.1 rlang_1.1.4
[13] tools_4.4.1 utf8_1.2.4
[15] yaml_2.3.10 knitr_1.48
[17] labeling_0.4.3 S4Arrays_1.4.1
[19] htmlwidgets_1.6.4 reticulate_1.39.0
[21] DelayedArray_0.30.1 RColorBrewer_1.1-3
[23] abind_1.4-8 BiocParallel_1.38.0
[25] zellkonverter_1.14.1 withr_3.0.1
[27] grid_4.4.1 fansi_1.0.6
[29] beachmat_2.20.0 colorspace_2.1-1
[31] scales_1.3.0 cli_3.6.3
[33] rmarkdown_2.28 crayon_1.5.3
[35] generics_0.1.3 rstudioapi_0.16.0
[37] httr_1.4.7 tzdb_0.4.0
[39] DelayedMatrixStats_1.26.0 ggbeeswarm_0.7.2
[41] zlibbioc_1.50.0 parallel_4.4.1
[43] XVector_0.44.0 basilisk_1.16.0
[45] vctrs_0.6.5 Matrix_1.7-0
[47] dir.expiry_1.12.0 jsonlite_1.8.9
[49] BiocSingular_1.20.0 patchwork_1.3.0
[51] hms_1.1.3 BiocNeighbors_1.22.0
[53] ggrepel_0.9.6 irlba_2.3.5.1
[55] beeswarm_0.4.0 glue_1.8.0
[57] codetools_0.2-20 cowplot_1.1.3
[59] stringi_1.8.4 gtable_0.3.5
[61] UCSC.utils_1.0.0 ScaledMatrix_1.12.0
[63] munsell_0.5.1 pillar_1.9.0
[65] basilisk.utils_1.16.0 htmltools_0.5.8.1
[67] GenomeInfoDbData_1.2.12 R6_2.5.1
[69] sparseMatrixStats_1.16.0 evaluate_1.0.0
[71] lattice_0.22-6 png_0.1-8
[73] Rcpp_1.0.13 gridExtra_2.3
[75] SparseArray_1.4.8 xfun_0.48
[77] pkgconfig_2.0.3