Despite scRNA-Seq data provides a snapshot of gene expression at a particular time point, expression levels are captured across individual cells. It captures the heterogeneity of cells within a population. If biological tissue was not a steady state, it may be possible to retrieve trajectories, such as developmental lineage or disease progression. Trajectory inferences and RNA velocity leverage the information within that snapshot to infer such cellular trajectories. Ordering cells is a first step to understand the underlying biological processes driving cell fate decisions and differentiation events.
Abbreviations :
TI : trajectory inference
kNN
: k-nearest neighbors
In section, we make a overview of two main methods to order cells : trajectory inference and RNA velocity.
Trajectory inference is based on the gene expression profile. It looks at distance between cells : cells with similar transcriptomic profile are close, and cells with distinct profile are distant. Based on transcriptomic profiles, it finds paths between cells, leading to trajectories.
Figure from Wolf et al. (2019). Partition-based graph abstraction generates a topology-preserving map of single cells. High-dimensional gene expression data is represented as a kNN graph by choosing a suitable low-dimensional representation and an associated distance metric for computing neighborhood relations. […] We use PCA-based representations and Euclidean distance. […] By discarding spurious edges with low weights, PAGA graphs reveal the denoised topology of the data at a chosen resolution and reveal its connected and disconnected regions. […] We order cells within each partition according to their distance from a root cell. A PAGA path then averages all single-cell paths that pass through the corresponding groups of cells. This allows to trace gene expression changes along complex trajectories at single-cell resolution
There are more than 50 methods to perform trajectory inference because : - many input types are possible (count matrix, normalized matrix, reduced space) - many ways to compute distance - many ways to find a path
How to choose a trajectory inference method ?
Figure from Saelens et al. (2019). Practical guidelines for method users. As the performance of a method mostly depends on the topology of the trajectory, the choice of TI method will be primarily influenced by the user’s existing knowledge about the expected topology in the data.
How to test several TI methods ?
Screenshot from https://dynverse.org/, a collection of R packages to apply trajectory inference using the same code structure.
RNA velocity requires spliced and unspliced count matrices as input. They are identified as described on the figure below :
Figure from La Manno et al. (2018). Spliced and unspliced counts are estimated by separately counting reads that incorporate the intronic sequence. Multiple reads associated with a given molecule are grouped (boxes with asterisks) for unique molecular identifier (UMI)-based protocols. Pie charts show typical fractions of unspliced molecules.
RNA velocity predicts the future transcriptional state of individual cells by estimating the ratio of unspliced and spliced RNA from sequencing data, which indicates the rate of RNA synthesis and degradation and can be used to predict the future state of a cell.
Figure from Bergen et al. (2020). Modeling transcriptional dynamics captures transcriptional induction and repression (‘on’ and ‘off’ phase) of unspliced pre-mRNAs, their conversion into mature, spliced mRNAs and their eventual degradation.
The table below summarizes the input, principle and output for the two method families.
Method | Input | Principle | Output |
---|---|---|---|
Trajectory inference | summarized* gene expression matrix | make a topology and find shortest path between cells | lineage tree, pseudotime* |
RNA velocity | spliced and unspliced count matrices | relative difference between spliced and unspliced molecules | vector field plots, latent time* |
*summarized gene expression matrix : Some TI methods do not take the full gene expression matrix as input, but either a subset of genes, or a reduced space built from the gene expression matrix (PCA, batch-effect corrected space…)
*pseudotime : “A cell’s pseudotime, then, is defined as the length of the lineage from its inception up to the projection of a cell onto that lineage. If a trajectory consists of multiple lineages, cells may have multiple pseudotime values, depending on which lineage the cell is projected upon. […] Pseudotime is based solely on transcriptional information, so it cannot be interpreted as an estimator of the true time since initial differentiation” [Weiler et al. (2023)]
Figure from Weiler et al. (2023). Introduction to trajectory inference terminology. A trajectory is a collection of (one or more) lineages. A lineage is a single differentiation path. The pseudotime of a cell is the length of the lineage from its inception up to the projection of the respective cell onto that lineage.
*latent time : “Each cell is associated with a time. As such, gene-specific times need to be pooled together into a single, gene-shared latent time for every cell” [Weiler et al. (2023)]. So there may be a gap between two latent time points if no cells were captured with the associated unspliced/spliced profile.
Figure from Bergen et al. (2020). On simulated splicing kinetics, latent time is able to reconstruct the underlying real time at near-perfect correlation and correct scale, clearly outperforming diffusion pseudo-time. In contrast to pseudo-time methods, our latent time is grounded on transcriptional dynamics and internally accounts for speed and direction of motion.
Note :
We will apply two TI methods written in R, and RNA velocity as
implemented in scvelo Python module. To access a Python environment
within Rmarkdown, we built a Conda environment containing Python
languages and all modules (packages) of interest, using the following
commands (chunk is not evaluated because of eval = FALSE
)
:
conda create --name scvelo_env -c conda-forge \
matplotlib=3.5.0 \
numpy=1.21.1 \
pandas=1.5.3 \
numba scipy
conda activate scvelo_env
pip install -U scvelo
pip install ipywidgets
conda install -c conda-forge scanpy python-igraph leidenalg
conda deactivate scvelo_env
We load the Conda environment :
reticulate::use_condaenv("scvelo_env", required = TRUE)
reticulate::py_config()
## python: /home/nf1/Tools/miniconda3/envs/scvelo_env/bin/python
## libpython: /home/nf1/Tools/miniconda3/envs/scvelo_env/lib/libpython3.9.so
## pythonhome: /home/nf1/Tools/miniconda3/envs/scvelo_env:/home/nf1/Tools/miniconda3/envs/scvelo_env
## version: 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32) [GCC 12.3.0]
## numpy: /home/nf1/Tools/miniconda3/envs/scvelo_env/lib/python3.9/site-packages/numpy
## numpy_version: 1.21.1
##
## NOTE: Python version was forced by use_python() function
We import all the Python modules of interest :
import scvelo as scv # RNA velocity implementation
import scanpy # scRNA-Seq data handling
import scipy # sparse matrix
import numpy # table
import pandas as pd # table
import matplotlib.pyplot as plt # visualization
We can call the R libraries of interest :
library(Seurat) # scRNA-Seq data handling
library(ggplot2) # visualization
library(patchwork) # plot arrangement
library(slingshot) # trajectory inference
## Loading required package: princurve
library(TInGa) # trajectory inference
library(dynwrap) # TI
library(dynplot) # TI visualization
library(Matrix) # sparse matrix
library(viridis) # colors
## Loading required package: viridisLite
RNA velocity requires both spliced and unspliced count matrices. Since you already have aligned your reads on a reference transcriptome, you may have access to a BAM file (mapping information for each read) and the GTF file (reference transcriptome annotation). The first RNA velocity method (velocyto from La Manno et al. (2018)) implements the spliced and unspliced matrices construction.
We load a published dataset that already contains such matrices, and is stored in the scvelo module :
adata = scv.datasets.pancreas()
adata
## AnnData object with n_obs × n_vars = 3696 × 27998
## obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
## var: 'highly_variable_genes'
## uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
## obsm: 'X_pca', 'X_umap'
## layers: 'spliced', 'unspliced'
## obsp: 'distances', 'connectivities'
The dataset contains 3,696 cells. We can see the two layers corresponding to both spliced and unspliced count matrices. A UMAP is also stored in the object and enable 2D visualization of all cells. We visualize the dataset :
fig, axs = plt.subplots(1, 2, gridspec_kw = {'width_ratios': [1, 0.2]})
ax = axs[0]
scv.pl.scatter(adata, basis = "umap", color = "clusters",
legend_loc = 'right', size = 60, ax = ax, show = False)
ax.set_aspect('equal')
fig.delaxes(axs[1])
plt.subplots_adjust(left = 0.05, right = 0.9, top = 0.99, bottom = 0.05)
plt.show()
As, we will need also this object in R so save it. We save it in a
directory called pancreas
:
mkdir -p ./pancreas
We save the spliced, unspliced, and full (spliced + unspliced) matrices :
scipy.io.mmwrite(a = adata.layers['spliced'],
target = './pancreas/matrix_spliced.mtx')
scipy.io.mmwrite(a = adata.layers['unspliced'],
target = './pancreas/matrix_unspliced.mtx')
scipy.io.mmwrite(a = adata.layers['spliced'] + adata.layers['unspliced'],
target = './pancreas/matrix.mtx')
We save the 2D representation, which a UMAP in this dataset :
numpy.savetxt(X = adata.obsm['X_umap'],
fname = './pancreas/umap.csv',
delimiter=',')
We save also the PCA, which is an input from trajectory inference method :
numpy.savetxt(X = adata.obsm['X_pca'],
fname = './pancreas/pca.csv',
delimiter=',')
We save the annotations, i.e. clusters, cell barcodes and features :
pd.DataFrame(adata.obs['clusters']).to_csv('./pancreas/clusters.csv', index = False)
pd.DataFrame(adata.obs_names).to_csv('./pancreas/barcodes.csv', index = False)
pd.DataFrame(adata.var_names).to_csv('./pancreas/features.csv', index = False)
We make a Seurat object from the saved data. First, we load the matrix :
mat = Matrix::readMM('./pancreas/matrix.mtx')
barcodes = as.character(read.csv('./pancreas/barcodes.csv')[, 1])
features = as.character(read.csv('./pancreas/features.csv')[, 1])
rownames(mat) = barcodes # set matrix row names (cells)
colnames(mat) = features # set matrix column names (gene names)
dim(mat)
## [1] 3696 27998
We load the annotation :
clusters = as.character(read.csv('./pancreas/clusters.csv')[, 1])
head(clusters)
## [1] "Pre-endocrine" "Ductal" "Alpha" "Ductal"
## [5] "Ngn3 high EP" "Ductal"
We build a Seurat object from the count matrix :
sobj = Seurat::CreateSeuratObject(counts = Matrix::t(mat),
meta.data = data.frame(row.names = barcodes,
clusters = clusters))
sobj
## An object of class Seurat
## 27998 features across 3696 samples within 1 assay
## Active assay: RNA (27998 features, 0 variable features)
We add the PCA :
the_pca = read.csv('./pancreas/pca.csv',
header = FALSE)
the_pca = as.matrix(the_pca)
colnames(the_pca) = paste0("PC_", c(1:ncol(the_pca)))
rownames(the_pca) = barcodes
sobj@reductions[["pca"]] = Seurat::CreateDimReducObject(embeddings = the_pca,
key = "pca_")
## Warning: No assay specified, setting assay as RNA by default.
sobj
## An object of class Seurat
## 27998 features across 3696 samples within 1 assay
## Active assay: RNA (27998 features, 0 variable features)
## 1 dimensional reduction calculated: pca
We add the UMAP :
the_umap = read.csv('./pancreas/umap.csv',
header = FALSE)
the_umap = as.matrix(the_umap)
colnames(the_umap) = c("UMAP_1", "UMAP_2")
rownames(the_umap) = barcodes
sobj@reductions[["umap"]] = Seurat::CreateDimReducObject(embeddings = the_umap,
key = "umap_")
## Warning: No assay specified, setting assay as RNA by default.
sobj
## An object of class Seurat
## 27998 features across 3696 samples within 1 assay
## Active assay: RNA (27998 features, 0 variable features)
## 2 dimensional reductions calculated: pca, umap
We visualize the dataset :
Seurat::DimPlot(sobj, group.by = "clusters", reduction = "umap") +
ggplot2::labs(title = "Clusters annotation") +
Seurat::NoAxes() +
ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5))
Now, we have both Seurat (R) and AnnData (Python) objects.
In this section, we used to TI tools, slingshot and TInGa.
First, we run slingshot, then, we visualize results.
We infer trajectories :
slingshot_traj = slingshot::slingshot(data = sobj@reductions[["pca"]]@cell.embeddings)
## No cluster labels provided. Continuing with one cluster.
## Using full covariance matrix
slingshot_traj
## class: SlingshotDataSet
##
## Samples Dimensions
## 3696 50
##
## lineages: 1
## Lineage1: 1
##
## curves: 1
## Curve1: Length: 76,7 Samples: 3696
Slingshot outputs 1 lineage.
We add the pseudotime associated with this lineage in the Seurat object :
sobj$pseudotime_slingshot = slingshot::slingPseudotime(slingshot_traj)[, 1][colnames(sobj)]
head(sobj@meta.data)
## orig.ident nCount_RNA nFeature_RNA clusters
## AAACCTGAGAGGGATA SeuratProject 6529 3115 Pre-endocrine
## AAACCTGAGCCTTGAT SeuratProject 8049 3128 Ductal
## AAACCTGAGGCAATTA SeuratProject 5165 2552 Alpha
## AAACCTGCATCATCCC SeuratProject 10017 3670 Ductal
## AAACCTGGTAAGTGGC SeuratProject 6040 2686 Ngn3 high EP
## AAACCTGGTATTAGCC SeuratProject 9734 3530 Ductal
## pseudotime_slingshot
## AAACCTGAGAGGGATA 23,78922
## AAACCTGAGCCTTGAT 69,63663
## AAACCTGAGGCAATTA 19,90596
## AAACCTGCATCATCCC 76,70007
## AAACCTGGTAAGTGGC 40,29643
## AAACCTGGTATTAGCC 71,15815
We visualize pseudotime :
Seurat::FeaturePlot(sobj, features = "pseudotime_slingshot",
reduction = "umap",
cols = viridis::viridis(n = 100)) +
ggplot2::ggtitle("Slingshot pseudotime") +
ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) +
Seurat::NoAxes()
The root corresponds to the cells with pseudotime 0. We could have set manually a root. We run the method with an alternative setting.
We manually set a root using the following command :
slingshot_traj = slingshot::slingshot(data = sobj@reductions[["pca"]]@cell.embeddings,
clusterLabels = sobj$clusters,
start.clus = "Ductal")
## Using full covariance matrix
slingshot_traj
## class: SlingshotDataSet
##
## Samples Dimensions
## 3696 50
##
## lineages: 2
## Lineage1: Ductal Ngn3 low EP Ngn3 high EP Pre-endocrine Beta Alpha Epsilon
## Lineage2: Ductal Ngn3 low EP Ngn3 high EP Pre-endocrine Delta
##
## curves: 2
## Curve1: Length: 76,337 Samples: 3451,29
## Curve2: Length: 82,676 Samples: 2534,14
Now, there are two lineages. We add the two associated pseudotimes in the object :
slingshot_pdt = slingshot::slingPseudotime(slingshot_traj)
n_lineages = ncol(slingshot_pdt)
sobj@meta.data[, paste0("pseudotime_slingshot_", c(1:n_lineages))] = slingshot::slingPseudotime(slingshot_traj)[colnames(sobj), ]
head(sobj@meta.data)
## orig.ident nCount_RNA nFeature_RNA clusters
## AAACCTGAGAGGGATA SeuratProject 6529 3115 Pre-endocrine
## AAACCTGAGCCTTGAT SeuratProject 8049 3128 Ductal
## AAACCTGAGGCAATTA SeuratProject 5165 2552 Alpha
## AAACCTGCATCATCCC SeuratProject 10017 3670 Ductal
## AAACCTGGTAAGTGGC SeuratProject 6040 2686 Ngn3 high EP
## AAACCTGGTATTAGCC SeuratProject 9734 3530 Ductal
## pseudotime_slingshot pseudotime_slingshot_1
## AAACCTGAGAGGGATA 23,78922 54,2616250
## AAACCTGAGCCTTGAT 69,63663 9,2787305
## AAACCTGAGGCAATTA 19,90596 58,5088384
## AAACCTGCATCATCCC 76,70007 0,9812956
## AAACCTGGTAAGTGGC 40,29643 38,9445443
## AAACCTGGTATTAGCC 71,15815 7,8948470
## pseudotime_slingshot_2
## AAACCTGAGAGGGATA 59,2412330
## AAACCTGAGCCTTGAT 9,2804274
## AAACCTGAGGCAATTA NA
## AAACCTGCATCATCCC 0,9812002
## AAACCTGGTAAGTGGC 40,4065783
## AAACCTGGTATTAGCC 7,9017158
We visualize pseudotimes on the UMAP :
plot_list = lapply(paste0("pseudotime_slingshot_", c(1:n_lineages)), FUN = function(pdt) {
p = Seurat::FeaturePlot(sobj, features = pdt,
reduction = "umap",
cols = viridis::viridis(n = 100)) +
ggplot2::ggtitle(pdt) +
ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) +
Seurat::NoAxes()
return(p)
})
patchwork::wrap_plots(plot_list)
By setting a root in the Ductal population, there are two distinct lineages ending in two populations. All cells are not included in both lineages.
Now, we run TInGa, then, we visualize results.
We normalize the count matrix :
sobj = Seurat::NormalizeData(sobj)
sobj
## An object of class Seurat
## 27998 features across 3696 samples within 1 assay
## Active assay: RNA (27998 features, 0 variable features)
## 2 dimensional reductions calculated: pca, umap
We build the input from the Seurat object :
expression = Seurat::GetAssayData(sobj, assay = "RNA", slot = "data")
expression = Matrix::t(expression)
expression = expression[, order(colnames(expression))]
counts = Seurat::GetAssayData(sobj, assay = "RNA", slot = "counts")
counts = counts[colnames(expression), ]
counts = Matrix::t(counts)
dataset = dynwrap::wrap_expression(id = sobj@project.name,
expression = expression,
counts = counts)
We apply the TI method :
set.seed(1234)
tinga_traj = dynwrap::infer_trajectory(dataset = dataset,
method = TInGa::gng_param2(),
seed = 1234,
params = list(max_iter = 10000,
max_nodes = 10,
epsilon_b = 0.05,
epsilon_n = 0.001,
age_max = 200,
lambda = 200,
alpha = 0.5,
beta = 0.99))
To define a pseudotime, we need to set a root. We choose randomly a cell in the Ductal cluster :
set.seed(1234)
root_id = sample(which(sobj$clusters == "Ductal"), size = 1)
root_barcode = colnames(sobj)[root_id]
root_barcode
## [1] "CATCCACCAATGACCT"
We visualize the root cell on the UMAP :
sobj$is_root = (colnames(sobj) == root_barcode)
sobj$cell_name = colnames(sobj)
root_coord = sobj@reductions[["umap"]]@cell.embeddings[root_barcode, ]
root_plot = Seurat::DimPlot(sobj, reduction = "umap",
cells.highlight = root_barcode, sizes.highlight = 3) +
ggplot2::annotate(geom = "text", label = root_barcode,
x = root_coord[1] + 0.15*abs(root_coord[1]),
y = root_coord[2] + 0.15*abs(root_coord[2])) +
ggplot2::ggtitle("Root cell") +
ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold")) +
Seurat::NoLegend() + Seurat::NoAxes()
sobj$is_root = NULL
sobj$cell_name = NULL
root_plot
We add the pseudotime, starting from this root cell :
tinga_traj = dynwrap::add_root(trajectory = tinga_traj,
root_cell_id = root_barcode)
tinga_traj = dynwrap::add_pseudotime(trajectory = tinga_traj)
We add the pseudotime in the Seurat object :
sobj$pseudotime_tinga = tinga_traj$pseudotime
head(sobj@meta.data)
## orig.ident nCount_RNA nFeature_RNA clusters
## AAACCTGAGAGGGATA SeuratProject 6529 3115 Pre-endocrine
## AAACCTGAGCCTTGAT SeuratProject 8049 3128 Ductal
## AAACCTGAGGCAATTA SeuratProject 5165 2552 Alpha
## AAACCTGCATCATCCC SeuratProject 10017 3670 Ductal
## AAACCTGGTAAGTGGC SeuratProject 6040 2686 Ngn3 high EP
## AAACCTGGTATTAGCC SeuratProject 9734 3530 Ductal
## pseudotime_slingshot pseudotime_slingshot_1
## AAACCTGAGAGGGATA 23,78922 54,2616250
## AAACCTGAGCCTTGAT 69,63663 9,2787305
## AAACCTGAGGCAATTA 19,90596 58,5088384
## AAACCTGCATCATCCC 76,70007 0,9812956
## AAACCTGGTAAGTGGC 40,29643 38,9445443
## AAACCTGGTATTAGCC 71,15815 7,8948470
## pseudotime_slingshot_2 pseudotime_tinga
## AAACCTGAGAGGGATA 59,2412330 1,695568374
## AAACCTGAGCCTTGAT 9,2804274 0,266301621
## AAACCTGAGGCAATTA NA 1,862531244
## AAACCTGCATCATCCC 0,9812002 0,008093603
## AAACCTGGTAAGTGGC 40,4065783 1,211911467
## AAACCTGGTATTAGCC 7,9017158 0,251120779
We visualize pseudotime :
Seurat::FeaturePlot(sobj, features = "pseudotime_tinga",
reduction = "umap",
cols = viridis::viridis(n = 100)) +
ggplot2::ggtitle("TInGa pseudotime") +
ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) +
Seurat::NoAxes()
We visualize the trajectory with lineage tree :
dynplot::plot_dimred(trajectory = tinga_traj,
label_milestones = FALSE,
plot_trajectory = TRUE,
dimred = sobj[["umap"]]@cell.embeddings,
color_cells = 'pseudotime',
color_trajectory = "none")
The trajectory is mainly linear. We could have changed the input settings to make a more ramified trajectory.
In this section, we run RNA velocity method using scvelo implementation.
We visualize the number of count for spliced and unspliced transcripts :
adata.obs['total_spliced'] = adata.layers["spliced"].sum(axis = 1)
adata.obs['total_unspliced'] = adata.layers["unspliced"].sum(axis = 1)
# Plot
fig, axs = plt.subplots(1, 2)
ax = axs[0]
scv.pl.scatter(adata, basis = "umap", color = "total_spliced",
legend_loc = 'right', size = 60, ax = ax, show = False)
ax.set_aspect('equal')
ax = axs[1]
scv.pl.scatter(adata, basis = "umap", color = "total_unspliced",
legend_loc = 'right', size = 60, ax = ax, show = False)
ax.set_aspect('equal')
plt.subplots_adjust(left = 0.05, right = 0.9, top = 0.99, bottom = 0.05)
plt.show()
What are the proportions of unspliced compared to spliced transcripts ?
scv.pl.proportions(adata)
We run each step of RNA velocity :
scv.pp.filter_genes(adata)
scv.pp.moments(adata)
## Normalized count data: X, spliced, unspliced.
## computing neighbors
## finished (0:00:07) --> added
## 'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
## computing moments based on connectivities
## finished (0:00:03) --> added
## 'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
scv.tl.recover_dynamics(adata, n_jobs = 12)
## recovering dynamics (using 12/48 cores)
## 0%| | 0/4130 [00:00<?, ?gene/s]
## finished (0:05:33) --> added
## 'fit_pars', fitted parameters for splicing dynamics (adata.var)
scv.tl.velocity(adata, mode = 'dynamical')
## computing velocities
## finished (0:00:14) --> added
## 'velocity', velocity vectors for each individual cell (adata.layers)
scv.tl.velocity_graph(adata, n_jobs = 12)
## computing velocity graph (using 12/48 cores)
## 0%| | 0/3696 [00:00<?, ?cells/s]
## finished (0:00:14) --> added
## 'velocity_graph', sparse matrix with cosine correlations (adata.uns)
adata
## AnnData object with n_obs × n_vars = 3696 × 27998
## obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'total_spliced', 'total_unspliced', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition'
## var: 'highly_variable_genes', 'gene_count_corr', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
## uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
## obsm: 'X_pca', 'X_umap'
## varm: 'loss'
## layers: 'spliced', 'unspliced', 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'velocity', 'velocity_u'
## obsp: 'distances', 'connectivities'
We visualize the vector fields :
adata.obs['no_color'] = 0
fig, axs = plt.subplots(1, 2)
ax = axs[0]
scv.pl.velocity_embedding_stream(adata, basis = 'umap', color = 'no_color', show = False, ax = ax, legend_loc = 'none')
ax.set_title('Vector fields')
ax.set_aspect('equal')
ax = axs[1]
scv.pl.velocity_embedding_stream(adata, basis = 'umap', color = 'clusters', show = False, ax = ax)
ax.set_title('Vector fields and cluster annotation')
ax.set_aspect('equal')
plt.subplots_adjust(left = 0.05, right = 0.9, top = 0.99, bottom = 0.05)
plt.show()
A vector corresponds to one arrow, an oriented segment. It indicates the most probable next stage of cells. Earliest cells are not pointed by any vector. From this figure, root population corresponds to Ductal cluster. There are three possible ends : alpha, beta and epsilon.
We infer a root set and compute a pseudotime, from the directed velocity graph :
scv.tl.velocity_pseudotime(adata)
## computing terminal states
## identified 2 regions of root cells and 1 region of end points .
## finished (0:00:00) --> added
## 'root_cells', root cells of Markov diffusion process (adata.obs)
## 'end_points', end points of Markov diffusion process (adata.obs)
scv.pl.scatter(adata, color = 'velocity_pseudotime', cmap = 'gnuplot')
We compute the latent time for each cell, based only on its transcriptional dynamics.
scv.tl.latent_time(adata)
## computing latent time using root_cells as prior
## finished (0:00:06) --> added
## 'latent_time', shared time (adata.obs)
scv.pl.scatter(adata, color = 'latent_time', color_map = 'gnuplot')
Latent time is coherent with pseudotime. However, pseudotime is homogeneous spread along cells while latent time more corresponds to a cell’s internal clock :
fig, axs = plt.subplots(1, 1)
plt.scatter(x = [t for t in adata.obs["velocity_pseudotime"]],
y = [t for t in adata.obs["latent_time"]],
s = 10)
axs.plot([0, 1], [0, 1], color = 'r', label = 'y = x')
plt.xlabel('Pseudotime')
plt.ylabel('Latent time')
plt.title('Relation between velocity pseudotime and latent time')
plt.legend()
plt.show()
We save both objects :
adata.write('./pancreas/adata_velocity.h5ad', compression = 'gzip')
To load it, use :
adata = scanpy.read_h5ad('./pancreas/adata_velocity.h5ad')
.
saveRDS(sobj, file = './pancreas/sobj_traj.rds')
To load it, use : sobj = readRDS('./sobj_traj.rds')
.
Here are the references for the tools used :
This can be done after trajectory inference or RNA velocity :
R session information :
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/local/src/R-3.6.3/lib/libRblas.so
## LAPACK: /usr/local/src/R-3.6.3/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=fr_FR.UTF-8
## [9] LC_ADDRESS=fr_FR.UTF-8 LC_TELEPHONE=fr_FR.UTF-8
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=fr_FR.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dynutils_1.0.5 purrr_0.3.4 tidyr_1.1.4 dplyr_1.0.7
## [5] viridis_0.5.1 viridisLite_0.3.0 Matrix_1.2-18 dynplot_1.1.0
## [9] dynwrap_1.2.1 TInGa_0.0.0.9000 slingshot_1.4.0 princurve_2.1.4
## [13] patchwork_1.1.2 ggplot2_3.3.5 Seurat_3.1.5
##
## loaded via a namespace (and not attached):
## [1] utf8_1.1.4 reticulate_1.34.0
## [3] tidyselect_1.1.0 htmlwidgets_1.5.4
## [5] grid_3.6.3 ranger_0.12.1
## [7] BiocParallel_1.20.1 Rtsne_0.15
## [9] munsell_0.5.0 codetools_0.2-16
## [11] ica_1.0-2 future_1.22.1
## [13] gng_0.1.0 withr_2.5.0
## [15] colorspace_1.4-1 Biobase_2.46.0
## [17] highr_0.8 knitr_1.41
## [19] rstudioapi_0.13 stats4_3.6.3
## [21] SingleCellExperiment_1.8.0 ROCR_1.0-7
## [23] listenv_0.8.0 labeling_0.3
## [25] GenomeInfoDbData_1.2.2 polyclip_1.10-0
## [27] farver_2.0.3 rprojroot_2.0.2
## [29] parallelly_1.28.1 vctrs_0.3.8
## [31] generics_0.1.0 xfun_0.40
## [33] R6_2.4.1 GenomeInfoDb_1.22.1
## [35] GA_3.2 graphlayouts_0.7.0
## [37] rsvd_1.0.3 bitops_1.0-6
## [39] DelayedArray_0.12.3 assertthat_0.2.1
## [41] scales_1.1.1 ggraph_2.0.3
## [43] nnet_7.3-12 lmds_0.1.0
## [45] gtable_0.3.0 dynfeature_1.0.0
## [47] babelwhale_1.0.1 npsurv_0.4-0
## [49] globals_0.14.0 processx_3.5.2
## [51] tidygraph_1.1.2 rlang_1.0.2
## [53] randomForestSRC_2.12.1 splines_3.6.3
## [55] lazyeval_0.2.2 acepack_1.4.1
## [57] checkmate_2.0.0 yaml_2.2.1
## [59] reshape2_1.4.4 backports_1.4.1
## [61] Hmisc_4.4-0 DiagrammeR_1.0.6.1
## [63] tools_3.6.3 ellipsis_0.3.2
## [65] gplots_3.0.3 jquerylib_0.1.4
## [67] RColorBrewer_1.1-2 BiocGenerics_0.32.0
## [69] ggridges_0.5.2 Rcpp_1.0.9
## [71] plyr_1.8.6 visNetwork_2.0.9
## [73] base64enc_0.1-3 zlibbioc_1.32.0
## [75] RCurl_1.98-1.2 ps_1.6.0
## [77] rpart_4.1-15 pbapply_1.4-2
## [79] cowplot_1.0.0 S4Vectors_0.24.4
## [81] dynparam_1.0.0 zoo_1.8-10
## [83] SummarizedExperiment_1.16.1 ggrepel_0.9.1
## [85] cluster_2.1.0 magrittr_2.0.1
## [87] data.table_1.14.2 carrier_0.1.0
## [89] lmtest_0.9-38 RANN_2.6.1
## [91] fitdistrplus_1.0-14 matrixStats_0.56.0
## [93] pkgload_1.2.2 hms_1.1.1
## [95] lsei_1.2-0 evaluate_0.18
## [97] jpeg_0.1-8.1 IRanges_2.20.2
## [99] gridExtra_2.3 testthat_3.1.0
## [101] compiler_3.6.3 tibble_3.1.5
## [103] KernSmooth_2.23-16 crayon_1.3.4
## [105] htmltools_0.5.2 proxyC_0.1.5
## [107] tzdb_0.1.2 Formula_1.2-3
## [109] RcppParallel_5.1.4 DBI_1.1.1
## [111] tweenr_1.0.1 MASS_7.3-51.5
## [113] rappdirs_0.3.3 data.tree_1.0.0
## [115] readr_2.0.2 cli_3.0.1
## [117] gdata_2.18.0 parallel_3.6.3
## [119] igraph_1.2.5 GenomicRanges_1.38.0
## [121] pkgconfig_2.0.3 dyneval_0.9.9
## [123] foreign_0.8-75 plotly_4.9.2.1
## [125] foreach_1.5.0 bslib_0.3.1
## [127] XVector_0.26.0 stringr_1.4.0
## [129] digest_0.6.25 dyndimred_1.0.3
## [131] sctransform_0.2.1 RcppAnnoy_0.0.16
## [133] tsne_0.1-3 rmarkdown_2.18
## [135] leiden_0.3.3 htmlTable_1.13.3
## [137] uwot_0.1.8 gtools_3.8.2
## [139] lifecycle_1.0.1 nlme_3.1-144
## [141] jsonlite_1.7.2 desc_1.4.1
## [143] fansi_0.4.1 pillar_1.6.3
## [145] lattice_0.20-38 fastmap_1.1.0
## [147] httr_1.4.2 survival_3.1-8
## [149] waldo_0.3.1 glue_1.4.2
## [151] remotes_2.4.2 png_0.1-7
## [153] iterators_1.0.12 ggforce_0.3.1
## [155] stringi_1.4.6 sass_0.4.0
## [157] blob_1.2.1 latticeExtra_0.6-29
## [159] caTools_1.18.0 irlba_2.3.3
## [161] future.apply_1.5.0 ape_5.3
Python session information :
## -----
## anndata 0.10.1
## matplotlib 3.5.0
## numpy 1.21.1
## pandas 1.5.3
## scanpy 1.9.5
## scipy 1.10.1
## scvelo 0.2.5
## session_info 1.0.0
## -----
## Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 04:04:13) [GCC 12.3.0]
## Linux-5.15.0-87-generic-x86_64-with-glibc2.31
## -----
## Session information updated at 2023-11-04 16:14