---
title: "sincellTE 2024 - RNA velocity analysis with velociraptor"
editor: source
date: 10/22/2024
date-format: "MMM D, YYYY"
format:
  html:
    toc: true
    toc-depth: 4
    toc-expand: 4
    theme: default
    fontsize: 1em
    linestretch: 1.5
    code-fold: false
    self-contained: true
    df-print: kable
    math: mathjax
execute:
  eval: true
---

# Objectives

RNA velocity is a popular single cell trajectory inference method. The objective of this session is to get an overview of the basic principles of performing RNA velocity analysis in R, using the velociraptor package, which provides a wrapper around the scVelo python package.

# 1. Data preparation

For this session, we will use an example data set from Hermann et al. (2018) about spermatogenesis [https://doi.org/10.1016/j.celrep.2018.10.026](https://doi.org/10.1016/j.celrep.2018.10.026).

![](./resources/images/illustration_spermatogenesis.png)

For RNA velocity, spliced and unspliced RNA count matrices are needed. They can be obtained by making sure that introns are kept when running cellranger (`--include-introns = TRUE`) then performing [velocyto](https://velocyto.org/) on the output bam file. This will provide a `.loom` file containing expression matrices for both spliced and unspliced RNA.

Then, the data can be loaded into R and analysed. One problem appears: `velociraptor` takes `SingleCellExperiment` objects as input and is not compatible with `Seurat` object.

:::{.callout-note}

![](./resources/images/illustration_singlecellexperiment.png)

Just like `Seurat` objects, `SingleCellExperiment`  class is a lightweight Bioconductor container for storing and manipulating single-cell genomics data. Its organization and syntax is just a little different. 

In order to avoid confusion, we will not go into the details. If you need to use this class in the future, you can find documentation here: [https://www.bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html](https://www.bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html)

:::

Besides, the single cell data need to be prepared for RNA velocity analysis, according to the steps we already know: filtering, normalization, identification of highly variable genes, dimensionality reduction...  Several strategies are possible:

 - use packages compatible with `SingleCellExperiment` to prepare the data. The following code illustrates these steps but will not be run.

```r
library(scRNAseq)
library(scuttle)
library(scater)
library(scran)
library(LoomExperiment)

# Import velocyto output
loom <- import("path/to/file.loom")

assayNames(loom)

# Extract the matrices from the appropriate assays (adjust the assay names as needed)
count_matrix <- assay(loom, "matrix")  # This is typically the raw count matrix
spliced_matrix <- assay(loom, "spliced")  # The spliced matrix (if present)
unspliced_matrix <- assay(loom, "unspliced")  # The unspliced matrix (if present)

# Check the dimensions of the expression matrix
dim(count_matrix)
dim(spliced_matrix)
dim(unspliced_matrix)

# Create SingleCellExperiment object
sce <- SingleCellExperiment(
    assays = list(
        counts = count_matrix,
        spliced = spliced_matrix,
        unspliced = unspliced_matrix
    )
)

cell_metadata <- colData(loom)
colData(sce) <- cell_metadata

gene_metadata <- rowData(loom)
rowData(sce) <- gene_metadata

# Inspect the object
sce

## Calculate log-normalized counts
sce <- scuttle::logNormCounts(sce)

## Find highly variable genes
dec <- scran::modelGeneVar(sce)
top.hvgs <- scran::getTopHVGs(dec, n = 2000)

## Perform dimensionality reduction
set.seed(100)
sce <- scater::runPCA(sce, subset_row = top.hvgs)
sce <- scater::runTSNE(sce, dimred = "PCA")
sce <- scater::runUMAP(sce, dimred = "PCA", n_neighbors = 30)
```

- let `velociraptor` prepare the data automatically. The user will have less freedom regarding the option values.

- analyse the data with Seurat and construct a `SingleCellExperiment` object with the results.

For this session, we will load and visualize data already prepared.

```{r setup}
suppressPackageStartupMessages({
  library(tidyverse)
  library(SingleCellExperiment)
  library(scater)
  library(velociraptor)
  library(pheatmap)
  library(ggplot2)
  library(viridis)
})
```

```{r load data}
user.id <- Sys.getenv("USER")
input_file <-paste("/shared/home2", user.id, "velocity/data/hermann.rds", sep = "/")

## load SingleCellExperiment object
sce <- readRDS(input_file)

## Plot tSNE representation
scater::plotTSNE(sce, colour_by = "celltype")
scater::plotUMAP(sce, colour_by = "celltype")
```


# 2. RNA velocity estimation

Next, we will perform the RNA velocity estimation. 

In `velociraptor`, this is done using the `scvelo()` function, which provides a wrapper around the functionality of `scVelo`.

Take a moment to study the documentation of `scvelo()` - you will see for example that all the modes of scVelo (steady-state, stochastic, dynamical) can be used.

For our example, we will use the dynamical mode. It takes a bit longer to run than the steady state option, but typically gives more reliable results.

```r
# too long, do not run
velo.out <- velociraptor::scvelo(sce, subset.row = top.hvgs,
                                 assay.X = "counts",
                                 mode = "dynamical")
```

```{r load velociraptor results}
results_file <-paste("/shared/home2", user.id, "velocity/data/velo.out.rds", sep = "/")
velo.out <- readRDS(results_file)
```

:::{.callout-note}

The argument `use.theirs` (default = `FALSE`) can be set to `TRUE` to perform gene filtering and normalization with scVelo.

:::

The `scvelo()` function performs the full RNA velocity estimation analysis, and in the process calls several functions from scVelo. Many steps are computed during the function execution: 

 - compute neighbors and moments
 - estimate the kinetics parameters
 - deduce the velocity (i.e. infer the future cell state)
 - estimate velocity pseudotime and latent time. Both estimates try to provide a relative ordering of the cells, only the computation method is different

::: {.content-visible when-format="html"}

![](./resources/images/illustration_phase_portrait.gif)

:::


Its output of `scvelo()` is another `SingleCellExperiment` object.


```{r}
velo.out
```

The results of `scvelo()` are not added to the input `SingleCellExperiment` object, since the two objects can have different dimensions (in particular note the difference in the number of genes). Thus, in order to use information from the velocity calculations together with information from the original object, we need to copy slots from one object to the other. 

Here, we copy the reduced dimension representations from the original `SingleCellExperiment` object to the output of `scvelo()`, and use the latter to construct tSNE representations colored by various parameters.

```{r add information to scvelo output}
## copy dimension reduction into scvelo output
reducedDims(velo.out) <- reducedDims(sce)

## copy cell annotation into scvelo output
velo.out$celltype <- sce$celltype
```

# 3. Velocity plots

## 3.1 Trajectory overview

The results of the velocity estimation can be plotted on the previous low-dimension representation of the data

```{r plot scvelo results}
#| fig-height: 8

## plot scvelo results
cowplot::plot_grid(
    scater::plotTSNE(velo.out, colour_by = "celltype"),
    scater::plotTSNE(velo.out, colour_by = "velocity_pseudotime"),
    scater::plotTSNE(velo.out, colour_by = "latent_time"),
    ncol = 1, align = "v"
)
```

The velociraptor package also provides plotting functions to visualize the RNA velocity results. 

For example, we can embed the estimated velocities into a provided low-dimensional space and plot them as a vector field.

```r 
# plot velocity vector field
embedded <- velociraptor::embedVelocity(reducedDim(velo.out, "TSNE"), velo.out)

grid.df <- velociraptor::gridVectors(reducedDim(velo.out, "TSNE"), embedded)

plotTSNE(velo.out, dimred = "TSNE", colour_by = "velocity_pseudotime") +
    geom_segment(data = grid.df,
                 mapping = aes(x = start.TSNE1, y = start.TSNE2,
                               xend = end.TSNE1, yend = end.TSNE2),
                 arrow = arrow(length = unit(0.05, "inches")))
```
![](./resources/images/velocity_vector_field.png)

## 3.2 Quality checks

Next, we will look at individual genes via their **phase portraits**. This is often useful in order to understand how the velocities are affected/supported by specific genes. 

We can of course plot genes that we are already familiar with and that we know are related to the process of interest. We can also extract genes with particularly strong influence on the velocity results. ‘Driver genes,’ which display a strong dynamic behavior, can for example be detected via their high likelihoods in the dynamic model. 

A score called likelihood value of the fit (`fit_likelihood`) indicates how well a cell is described by the learned phase trajectory. Averaged across all the cells, it can be used to estimate the quality of a gene in the velocity estimation. Here, we select the genes with the largest and smallest likelihood value for the fit and plot their phase portrait.

```{r velocity good phase portraits}
#| fig-height: 12

toplh <- rowData(velo.out) %>% 
  as.data.frame() %>% 
  arrange(desc(fit_likelihood)) %>% 
  slice_head(n=100) %>% 
  rownames()

plotVelocity(velo.out, use.dimred = "TSNE", genes = toplh[1:6],
             color_by = "celltype")
```


```{r velocity bad phase portraits}
#| fig-height: 6
botlh <- rowData(velo.out) %>% 
  as.data.frame() %>% 
  arrange(fit_likelihood) %>% 
  slice_head(n=3) %>% 
  rownames()

plotVelocity(velo.out, use.dimred = "TSNE", genes = botlh,
             color_by = "celltype")
```

We will also illustrate two summary statistics returned by scVelo:

 - the length of the velocity vector encodes the speed or rate of differentiation.
 
  - the ‘velocity confidence’ provides a measure of the coherence of the velocity vector field, that is, how well the velocity vector for a cell correlates with those of its neighbors.

```{r plot velocity speed and confidence}
#| fig-height: 6

cowplot::plot_grid(
    scater::plotTSNE(velo.out, colour_by = "velocity_length"),
    scater::plotTSNE(velo.out, colour_by = "velocity_confidence"),
    ncol = 1, align = "v"
)
```

## 3.3 Gene expression heatmap

A classical plot provided by the python version of `scVelo` is the expression of genes as a function of pseudotime. In R, it will need a little more efforts.

```{r plot heatmap gene expression}
## add genes counts and log normalized counts to the scvelo object
assay(velo.out, "counts") <- assay(velo.out, "X")
velo.out <- scuttle::logNormCounts(velo.out)

## order cells according to pseudo-time
cell_order <- velo.out@colData %>% 
  as.data.frame() %>% 
  arrange(velocity_pseudotime) %>% 
  rownames()

df <- assay(velo.out, "Ms")[toplh, cell_order]

## smooth signal with convolution
kernel_size <- 30
kernel <- rep(1/kernel_size, kernel_size)  # Simple moving average kernel

df_convolved <- apply(df, 1, function(x) {
  stats::filter(x, filter = kernel, sides = 2)  # Apply convolution with a centered kernel
})

genes_order <- colnames(df_convolved)[order(apply(df_convolved, 2, which.max))]

## extract ordered matrix
df <- (assay(velo.out, "logcounts"))[genes_order,cell_order]

## prepare heatmap annotation
annot <- velo.out@colData[,c("celltype", "latent_time")] %>% 
  as.data.frame()

annot_color <- list(latent_time = c("Blue", "Yellow"))

quantile_breaks <- function(xs, n = 10) {
  breaks <- quantile(xs, probs = seq(0, 1, length.out = n))
  breaks[!duplicated(breaks)]
}

mat_breaks <- quantile_breaks(scale(df), n = 52)

## plot heatmap
pheatmap::pheatmap(df,
         color = viridis(n=51),
         scale = "row",
         breaks = mat_breaks,
         cluster_col = F,
         annotation_col = annot,
         annotation_colors = annot_color,
         show_colnames = F,
         cluster_rows = F,
         fontsize = 6)
```

# R/python interoperability

In this tutorial, we will use the velociraptor R/Bioconductor package to run the RNA velocity analysis. This package provides a wrapper around the scVelo python package, and lets you run most of the essential analysis steps directly from your R session. 

However, in other situations you may want to use scVelo or other python packages directly, despite performing the initial analysis in R. 

A large number of tools from the python-based single-cell ecosystem are based around the so callled AnnData format, which is a type of python object that can hold largely similar types of information as `SingleCellExperiment` or `Seurat` objects. In order to use our preprocessed data in such python tools, we first need to convert our `SingleCellExperiment` to `AnnData` format. 

The `zellkonverter` package provides a convenient way to convert between these two formats. To illustrate how it works, let us convert the `SingleCellExperiment` object to `AnnData` format and save it to a file that can be directly read into python.

```r
suppressPackageStartupMessages({
    library(zellkonverter)
})

zellkonverter::writeH5AD(hermann, file = "Hermann_AnnData.h5ad",
                         X_name = "counts")
```

In a python session, we could read this object back and continue with any downstream analysis. More information about the `AnnData` format can be found in [AnnData documentation](https://anndata.readthedocs.io/en/latest/).

`zellkonverter` can also be used to read an .h5ad file (e.g., that was saved from python) into an R session:

```r
sce <- zellkonverter::readH5AD("Hermann_AnnData.h5ad")
```

# Aknowledgement

This document is an adaptation of C. Soneson's previous work for the school sincellTE 2022. You can consult the complete and very detailed course and practical session at this address: [https://csoneson.github.io/sincellTE2022/](https://csoneson.github.io/sincellTE2022/)

# References and documentation

scVelo: [https://scvelo.readthedocs.io/en/stable/index.html](https://scvelo.readthedocs.io/en/stable/index.html)

velociraptor: [https://www.bioconductor.org/packages/release/bioc/vignettes/velociraptor/inst/doc/velociraptor.html](https://www.bioconductor.org/packages/release/bioc/vignettes/velociraptor/inst/doc/velociraptor.html)

Bergen, Volker, Marius Lange, Stefan Peidli, F Alexander Wolf, and Fabian J Theis. 2020. “Generalizing RNA Velocity to Transient Cell States Through Dynamical Modeling.” Nature Biotechnology 38: 1408–14.

Hermann, Brian P, Keren Cheng, Anukriti Singh, Lorena Roa-De La Cruz, Kazadi N Mutoji, I-Chung Chen, Heidi Gildersleeve, et al. 2018. “The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids.” Cell Rep. 25: 1650–1667.e8.

# Session info

```{r session info}
sessionInfo()
```