1 Libraries and environment

1.1 Environment

This report was generated using:

R: R version 4.3.1 (2023-06-16 ucrt)
FactoMineR: 2.9
factoextra: 1.0.7
ggpubr: 0.6.0
corrplot: 0.92
pheatmap: 1.0.12
grid: 4.3.1
gridExtra: 2.3
RGCCA: 3.0.3
ggplot2: 3.4.4
MOFA2: 1.12.1
MOFAdata: 1.18.0
DT: 0.31
psych: 2.4.1
GGally: 2.2.0
NMF: 0.27
r.jive: 2.4
dplyr: 1.1.3

1.2 Load libraries

DIY: Load required libraries : ….

library("FactoMineR")
library("factoextra")
library("ggpubr")
library("corrplot")
library("pheatmap")
library("grid")
library("gridExtra")
library("RGCCA")
library("ggplot2")
library("MOFA2")
library("MOFAdata")#for CLL_data and enrichment
library("DT")      #for formatround function
library("psych")   #for correlation
library("GGally")  #for ggplot extention
library("NMF")
library("r.jive")
library("dplyr")

2 Overview of the Tutorial

General Statement:

In this tutorial, Principal Component Analysis (PCA), along with three multi-omics techniques : Regularized Generalized Canonical Correlation Analysis (RGCCA), Multi-Omics Factor Analysis (MOFA) and Joint and Individual Variance Explained (JIVE), are going to be explored. These methods have in common that they all belong to the field of multivariate statistics, but also that, in different ways, they can be seen as generalizations of PCA to more than one omic/matrix/block/table. Indeed, similarly to PCA, they all aim at estimating a reduced/latent space that describes well the different omics. Some part of the literature refer to these methods as Joint Dimension Reduction (JDR) methods because their goal is to estimate this reduced space such that it is common to all the omic data used in the analysis (this is less true for RGCCA). Actually, applying PCA separatly on different omic data in not considered as a JDR method, but applying PCA on the concatenation of multiple omic data acquired on the same set of individuals is.

Bellow is figure summarizing the key concept behind JDR methods. This figure was taken from an interesting benchmark that compares, on different multi-omic data-sets, multiple JDR methods (among which RGCCA/MOFA/JIVE).

Summarizing Joint Dimension Reduction (JDR) pinciple. A.: Multiomics are profiled from the same sample. Each omics corresponds to a different matrix Xi. jDR methods factorize the matrices $X^i$ into the product of a factor matrix $F$ and weight matrices $A^i$. These matrices can then be used to cluster samples and identify molecular processes. Figure and caption were taken from: Cantini, L., Zakeri, P., Hernandez, C. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 12, 124 (2021). https://doi.org/10.1038/s41467-020-20430-7

Figure 2.1: Summarizing Joint Dimension Reduction (JDR) pinciple. A.: Multiomics are profiled from the same sample. Each omics corresponds to a different matrix Xi. jDR methods factorize the matrices $X^i$ into the product of a factor matrix $F$ and weight matrices $A^i$. These matrices can then be used to cluster samples and identify molecular processes. Figure and caption were taken from: Cantini, L., Zakeri, P., Hernandez, C. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 12, 124 (2021). https://doi.org/10.1038/s41467-020-20430-7

The tutorial is organized as follow:

In section 3, you will have to choose the data-set your are going to work with the entire week.
In section 4, we present the biological context data-set (non available for participant) use as example in this tutorial.
In section 5, you will have to load the data-set tou have chosen.
In section 6, you will apply PCA on your data-set.
In section 7, you will apply RGCCA on your data-set.
In section 8, you will apply MOFA on your data-set.
In section 9, you will apply JIVE on your data-set.
Section 10 aims at comparing the results of the different approaches.

If you are interested going further on the subject, here are an non exhaustive list of publications that might interest you:

About RGCCA:
- Original paper of RGCCA: Tenenhaus, A., Tenenhaus, M. Regularized Generalized Canonical Correlation Analysis. Psychometrika 76, 257–284 (2011)
- Presentation of the Sparse version of RGCCA: Tenenhaus A, Philippe C, Guillemot V, Le Cao KA, Grill J, Frouin V. Variable selection for generalized canonical correlation analysis. Biostatistics. 2014
- Exploration of the links between RGCCA and several multi-block method: Tenenhaus, M., Tenenhaus, A. & Groenen, P.J.F. Regularized Generalized Canonical Correlation Analysis: A Framework for Sequential Multiblock Component Methods. Psychometrika 82, 737–777 (2017)
About MOFA:
- Original paper of MOFA: Argelaguet, R. et al. Multi-omics factor analysis-a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018)
About JIVE:
- Original paper of JIVE: Lock EF, Hoadley KA, Marron JS, Nobel AB. JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. Ann Appl Stat. 2013
- A supervised version of JIVE: Elise F. Palzer, Christine H. Wendt, Russell P. Bowler, Craig P. Hersh, Sandra E. Safo, Eric F. Lock, sJIVE: Supervised joint and individual variation explained, Computational Statistics & Data Analysis, Volume 175, 2022
Other references:
- The benchmark mentionned above on JDR methods: Cantini, L., Zakeri, P., Hernandez, C. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 12, 124 (2021)
- Another broader benchmark on multi-omics methods (contains JDR methods but not only): Morgane Pierre-Jean, Jean-François Deleuze, Edith Le Floch, Florence Mauger, Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration, Briefings in Bioinformatics, Volume 21, Issue 6, November 2020
- A book exploring Multiblock Data Fusion: Age K. Smilde, Tormod Næs and Kristian Hovde Liland, Multiblock Data Fusion in Statistics and Machine Learning - Applications in the Natural and Life Sciences

3 Choose your dataset and your modality

In this tutorial we use Major Depressive Disorder (MDD) dataset, however for practical session you will pick another dataset among the ones proposed below and adapt the following sections.

3.1 TCGA breast cancer dataset

This dataset includes measurements of mRNA, miRNA, and protein abundances taken from samples representing different molecular subtypes of breast cancer.

This dataset was obtained from Mibiomics

Download the dataset from the summer school’s GitHub repository

For this dataset, we will use 3 modalities:

mrna.csv, contains mRNA expression data
mirna.csv, contains miRNA expression data
protein.csv, contains protein abundance data

We will also use the file clinical.csv, which contains samples metadata, including molecular subtypes.

3.2 TARA ocean dataset

Tara Ocean is a scientific initiative that aims to study and understand marine ecosystems on a global scale. It involves the collection of ocean samples from various regions to analyze the diversity of organisms, their genetic content, and their interactions with the environment.

In this dataset, we will have access to a NOG (Non-supervised Orthologous Groups) abundance dataset, measure accros various oceans and various depths. NOGs are clusters of genes from different species that share a common evolutionary origin and similar functions. These groups are formed computationally, without relying on manual annotations, and are valuable for understanding gene relationships and functions across diverse organisms. NOGs play a key role in comparative genomics and evolutionary studies by revealing insights into the conservation of genes and their roles in different species.

NOGs have been measured in various oceans (SPO: South Pacific Ocean, NAO: North Atlantic Ocean, IO: Indian Ocean, RS: Red Sea, MS: Mediterranean Sea, NPO: North Pacific Ocean, SO: Southern Ocean, SAO: South Atlantic Ocean) and at various depths (DCM: Deep Chlorophyll Maximum, MES: Mesopelagic Zone, MIX: Mixing Layer, SRF: Surface Layer)

This dataset was obtained from Mibiomics

Download the dataset from the summer school’s GitHub repository

3.3 CLL dataset

The Chronic lymphocytic leukaemia (CLL) dataset includes mRNA expression, methylation data and viability values in response to different drugs from CLL samples.

To access this dataset, you need to install and load the MOFAdata R library.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("MOFAdata")

library("MOFAdata")
data("CLL_data")
names(CLL_data)

We will use three modalities:

CLL_data$mRNA, which contains mRNA data for various CLL samples,
CLL_data$Methylation, which contains methylation data from the same samples,
CLL_data$Drugs, which contains viability response to various drugs and concentrations.

The samples metadata are not available from MOFAdata, you’ll have to download it from the summer school’s GitHub repository

Use the command ?CLL_data to know more about this dataset.

3.4 Tomato dataset

The tomato dataset was obtained from the authors of the following publication: Isma Belouah and others, Modeling Protein Destiny in Developing Fruit, Plant Physiology, Volume 180, Issue 3, July 2019, Pages 1709–1724, https://doi.org/10.1104/pp.19.00086

It contains mRNA and protein abundance data from tomato plants (Solanum lycopersicum) at various development stages. In this dataset, the mRNA and proteins are paired, meaning that corresponding gene expression levels at both the mRNA and protein levels are examined.

The dataset encompasses information gathered from tomato plants at different developmental points, representing various Days Post Anthesis (DPA) and distinct Growth Stages. ‘Days Post Anthesis’ (DPA) signifies the count of days that have elapsed since the opening of a flower, serving as a marker to track the temporal progression of plant development. On the other hand, ‘Growth Stages’ denote specific phases within the tomato plant’s lifecycle, spanning germination, vegetative growth, flowering, fruit development, and ripening.

The dataset was kindly provided to us by the authors. We have made slight adjustments to it, and made it accessible on the summer school’s GitHub repository

For this dataset, we have access to 2 modalities:

mrna.tsv, contains mRNA expression data (already normalized)
prots.csv, contains protein abundance data (already normalized)

We will also use the file samples_metadata.tsv, which contains samples metadata (DPA, Growth Stage).

3.5 WallOmics dataset

The tomato dataset was obtained from the WallomicsData R library. This dataset contains phenomics, metabolomics, proteomics and transcriptomics data collected from two organs of five ecotypes of the model plant Arabidopsis thaliana exposed to two temperature growth conditions. For more information, you can refer to the following publications:

Duruflé, Harold, et al. “A powerful framework for an integrative study with heterogeneous omics data: from univariate statistics to multi-block analysis.” Briefings in Bioinformatics 22.3 (2021): bbaa166.
Duruflé, Harold, et al. “An integrative study showing the adaptation to sub-optimal growth conditions of natural populations of Arabidopsis thaliana: A focus on cell wall changes.” Cells 9.10 (2020): 2249.

For the analysis to run faster, we will use the CellWall datasets (CW), that only consider the expression of cell wall proteins. The datasets can be accessed with the following R commands:

intall.packages("WallomicsData")

library("WallomicsData")
data("Transcriptomics_Rosettes_CW")
data("Transcriptomics_Stems_CW")
data("Proteomics_Rosettes_CW")
data("Proteomics_Stems_CW")

Use ?Transcriptomics_Rosettes_CW and ?Transcriptomics_Stems_CW for more information on the transciptomics dataset. Use ?Proteomics_Rosettes_CW and ?Proteomics_Stems_CW for more information on the proteomics dataset.

Additionally, we will use the other data available as metadata. This includes:

the Ecotype dataset
the Temperature dataset
the Genetic_Cluster dataset
the Altitude_Cluster dataset
the Genetic_Cluster dataset
the Altitude_Cluster dataset
the Metabolomics_Rosettes dataset for Rosettes
the Metabolomics_Stems dataset for Stem
the Phenomics_Rosettes dataset for Rosettes
the Phenomics_Stems dataset for Stem

3.6 ProMetIS dataset

A total of 42 mice from the Lat-/- (knock out for LAT, a gene involved in neurodevelopmental diseases) and Mx2-/- (knock out for Mx2, a gene modelling Down syndrome in mice) genotypes (as well as the wild-type controls), and from both sex, were analyzed by a series of phenomic (preclinical) measurements. Liver and plasma samples from all mice were further analyzed at the molecular level by proteomics and metabolomics (Alyssa Imbert & al. 2021).

The data-set is composed of 9 matrices: 1 preclinical, 2 proteomics (liver and plasma), 6 metabolomics (4 on the plasma and 2 on the liver). There are multiple metabolomics datasets because it depends on the chromatographic column (c18hypersil: Hypersil GOLD C18, hilic: ZIC-pHILIC, and c18acquity: Acquity HSS-T3) and the ionization mode (pos: positive, and neg: negative).

For this tutorial, we will focus only one metabolomic data-set per tissue, the one that was acquired with the same chromatographic column with the same ionization mode and that presents the fewest number of variables. In the end, 4 modalities will be explored:

preclinical
proteomics_liver
metabolomics_liver_hilic_neg
metabolomics_plasma_hilic_neg

The proteomics_plasma was put aside as it does not contains all individuals.

The following code can be used to retrieve these matrices.

devtools::install_github("IFB-ElixirFr/ProMetIS")

library(ProMetIS)

modalities = c("preclinical", "proteomics_liver", "metabolomics_liver_hilic_neg", 
               "metabolomics_plasma_hilic_neg")
data_dir.c = ProMetIS::post_processed_dir.c()


data.mds = phenomis::reading(data_dir.c,
                            subsets.vc = modalities,
                            output.c   = "set",
                            report.c   = "none")

blocks         = sapply(names(data.mds), function(x) t(Biobase::exprs(data.mds[[x]])))
knock_out_gene = factor(Biobase::pData(data.mds[["preclinical"]])[, "gene"])

For further information on how to retrieve all the data, you can visit the github page: https://github.com/IFB-ElixirFr/ProMetIS, and especially check the tutorial: https://github.com/IFB-ElixirFr/ProMetIS/blob/master/vignettes/tutorial.Rmd.

Again, in the upcoming sections, our primary focus will be on Major Depressive Disorder (MDD) dataset. However, it’s important to note that the procedure we will discuss can also be applied to other datasets.

Team organization:

Each method presented here : RGCCA, MOFA and JIVE are independent, so feel free to start with one method or another or split your group in three, each working on a different method.

4 Biological context (Major Depressive Disorder)

Major depressive disorder (MDD) is a leading cause of disability and reduced life expectancy, with a two-fold increase in prevalence in women compared to men. Over the last few years, identifying reliable molecular biomarkers of MDD has proved challenging, likely reflecting the fact that, in addition to sex-differences, a variety of environmental and genetic risk factors are implicated. Recently, epigenetic processes have been proposed as mediators of the impact of life experiences on functional regulation of the genome, with the potential to contribute to MDD biomarker development. In this context, gene expression data were characterized and integrated with two upstream mechanisms for epigenomic regulation: DNA methylation (DNAm) and microRNAs (miRNAs). The 3 molecular layers were analyzed in peripheral blood samples from a well-characterized cohort of individuals with MDD (n=80) and healthy controls (n=89) (Mokjhtari A. 2024).

Overview of the Major Depressive Disorder (MDD) data-set. Figure extracted from: Amazigh Mokhtari, `Signatures moléculaires multi-omiques issues de prélévements sanguins de patients atteints de troubles de l'humeur`, 2023, PhD manuscript. https://theses.fr/2022UNIP5226

Figure 4.1: Overview of the Major Depressive Disorder (MDD) data-set. Figure extracted from: Amazigh Mokhtari, Signatures moléculaires multi-omiques issues de prélévements sanguins de patients atteints de troubles de l'humeur, 2023, PhD manuscript. https://theses.fr/2022UNIP5226

Our objective is to build a multi-omic molecular signature discriminating MDD patients from controls

5 Input data

5.1 In summary

For RGCCA, data must be a list of matrices with specific shapes:

samples (e.g. patients, organisms …) are displayed on the rows
variables (e.g. genes, proteins …) are displayed on the columns

For MOFA and JIVE, matrices must be the other way round (samples in columns and variables in rows).

5.2 Load your dataset

Choose a dataset and load all the modalities into R, except for the meta/clinical data. You will probably need to read a csv/tsv file: look at the documentation for the function read.table:

?read.table

## starting httpd help server ... done

If your data is not stored into a csv/tsv file and you don’t know how to load it, don’t hesitate to ask us.

DIY: Load all the modalities, except for the meta/clinical data, as a list of matrices.

data_path  = "./data/MDD/"
blocks     = readRDS(file = paste0(data_path,"data.train"))
blocks$cli = NULL

DIY: Load separately the meta/clinical data.

clinical_data           = readRDS(file = paste0(data_path,"pd_mdd_bin.RDS"))
clinical_data           = clinical_data[match(rownames(blocks$mRNA), 
                                              clinical_data$Sample_ID), ]
rownames(clinical_data) = clinical_data$Sample_ID

covariates_explored = c("Sample_Group", "BMI", "BMI.bin", "Age", "Age.bin", "Sex",
                        "Array", "Slide", "CD4", "CD8", "MO", "B", "NK", "GR")
clinical_data       = clinical_data[, covariates_explored]

6 Principal Component Analysis (PCA)

6.1 In summary

Principal Component Analysis (PCA) is an unsupervised method for multivariate statistical analysis.

It is a good practice to first explore your data with PCA, especially if your plan is to perform next analysis with Joint Dimension Reduction (JDR) methods such as RGCCA, MOFA or JIVE. Indeed, all three of them can, in a sense, be seen as a generalization of PCA for more than one omic (block).

Applying a separate PCA on all your omics will eventually help you understand what information (phenotype/co-variable) drives the main variations in each omic data. If all their main variations is driven by the same information, going further with a JDR method is especially interesting as it will be able to take advantage of the redundancy of this information across omics to catch a stronger signal.

6.2 Run PCA on a first omic modality

In the example of MDD, we will go with the transcriptomic block (mrna), choose yours in the data-set you selected.

DIY: Run a PCA with the PCA function of the FactoMineR package. You will have to set:

ncp: the number of components.
scale.unit: a boolean specifying if you want or not to scale (to unit variance) all your variables separately to make them comparable. You may skip this step if you previously performed a more expert normalization.
graph: to avoid unwanted graphical output.

Important Reminder: Centering all your variables when performing PCA is mandatory, otherwize you are not extracting the components with the highest variance anymore.

mrna_pca_results <- PCA(blocks$mRNA, scale.unit = TRUE, ncp = 25, graph = F)

6.2.1 Sample/individual Space

DIY: Display the first 2 Principal Components (PCs) against each others with the function fviz_pca_ind from the factoextra package. You will have to set the :

axes : the two components you want to display.

This type of representation is also called the sample space as each dot represent a sample (individual/subject…).

fviz_pca_ind(mrna_pca_results, axes = c(1,2))

Depending on the modality and the data-set you selected, you may see a group of samples that already is separated from the other, as above. In order to understand if this group is associated with a already known information (phenotype/co-variable), you may add this information to the plot.

DIY: Select a phenotype/co-variable and color your samples depending on the category of this selected phenotype/co-variable they belong to. If you choose a continuous variable, you may want to categorize it (by making bins).

fviz_pca_ind(mrna_pca_results, axes = c(1,2)) + 
  geom_point(aes(colour = clinical_data$Sex)) +
  guides(colour = guide_legend(title = "color"))

We can notice in the figure above that there is a strong “Sex” effect. Several methods exist in the literature to correct for Sex (or other batch) effects (ex: ComBat, for more details see Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007).

For the sake of simplicity, we will limit ourselves to the gender with the highest number of samples, which corresponds to the Females.

DIY: If needed, extract a subset of your data-set as here for the female group.

idx_female    = which(clinical_data$Sex == "F")
idx_sex       = which(colnames(clinical_data) == "Sex")
clinical_data = clinical_data[idx_female, -idx_sex]
blocks        = lapply(blocks, function(x) x[idx_female, ])

DIY: You can plot again the sample space once the subset has been extracted. Do not forget to compute the PCA again. This time, samples are colored by their status (case vs. control).

mrna_pca_results <- PCA(blocks$mRNA, scale.unit = TRUE, ncp = 25, graph = F) 

fviz_pca_ind(mrna_pca_results, axes = c(1,2)) + 
  geom_point(aes(colour = clinical_data$Sample_Group)) +
  guides(colour = guide_legend(title = "color"))

6.2.2 Variable Space

Once components have been identified as bearing an interesting information, you may want to understand how the different variables contribute to the computation of these components.

DIY: Display the top 50 variables contributing to the component of your choice with the function fviz_contrib from the package factoextra. You will have to set:

choice = 'var': to specify that you are interesting in the contribution of the variables.
axes: the PC you are interested in.
top: the number of variables contributing the most to this PC.

fviz_contrib(mrna_pca_results, choice = "var", axes = 1, top = 50)

This contribution can also be plot in 2D in order to show, at the same time, the contribution of one variable to two different PCs. This is called the PCA correlation circle.

Important Reminder: In PCA, the contribution (weight/coefficient) of a variable to a PC equals the correlation between the variable and the PC. This allows to plot the contributions of variables to 2 differents PCs on a correlation circle, as all contribution will be between -1 an 1.

DIY: Display the PCA correlation circle for the top 50 variables contributing to the 2 components of your choice with the function fviz_pca_var from the package factoextra. You will have to set:

axes: the two PCs you are interested in.
select.var: to specify the number of variables contributing the most to this PC.
repel: if the names of the variables are not displayed properly (are overlapping).

fviz_pca_var(mrna_pca_results, axes = c(1, 2), repel = T, select.var = list(contrib = 50))

6.2.3 Biplot

It is also possible to display, on the same plot, the individual and the variable space. In that case, variables are not displayed inside a correlation circle anymore in order to adapt the scale of the individual space. In that case, the coordinates of the variables, representing their contribution to the PCs, is artificially changed. Though, the direction of each variable is kept, allowing still to interpret if a variable is more contributing to one PC or the other.

This type of representation is particularly usefull when a group of samples separates from the others and you want to understand if a group of variables especially bears this separation.

DIY: Display a biplot for the top 50 variables contributing to the 2 components of your choice with the function fviz_pca_biplot from the package factoextra. You will have to set:

axes: the two PCs you are interested in.
select.var: to specify the number of variables contributing the most to this PC.
repel: if the names of the variables are not displayed properly.

Furthermore, color your samples depending on the category of the selected phenotype/co-variable they belong to.

fviz_pca_biplot(mrna_pca_results, repel = T, select.var = list(contrib = 50), axes = c(1,2))+ 
    geom_point(aes(colour = clinical_data$Sample_Group)) +
    guides(colour = guide_legend(title = "color"))

6.2.4 Scree plot

Each PC explains a certain percentage of the total variance of the data-set. This explained variance decreases with the rank of the PC. The scree plot subsumes the explained variance of the top PCs. It can be interesting to visualize it in order to understand how many components bear most of the information.

DIY: Display a scree plot.

fviz_eig(mrna_pca_results)

6.2.5 Going Further

When trying to understand the information behind each PC, it might be interesting to directly test the association between them and a series of phenotypes/co-varibales.

This is the idea behind the function champ.svd in the package CHAMP where a univariate statistical test is performed between each PC and a designated variable of interest. The chosen test depends on the nature of the variable:

continuous : a F-test.
discrete : a Kruskal and Wallis test, which is a generalization to an arbitrary number of groups to the Wilcoxon-Mann-Whitney test.

Inspired from the CHAMP package, an home-made function interpret_PC was created that takes as arguments:

component_matrix: a matrix of size number of samples $\times$ number of components.
clinical_df: a dataframe of size number of samples $\times$ number of variables.
title: the title of your plot.
display_legend: If you want or not the legend for the p-values to be displayed.

# Function taken from CHAMP to help interpret factors
drawheatmap <- function(svdPV.m, title = NULL, 
                        silent = FALSE,show_rownames = TRUE, show_colnames = TRUE, show_legend = TRUE) {
  myPalette         = c("darkred", "red", "orange", "pink", "white")
  breaks.v          = c(-10000, -10, -5, -2, log10(0.05), 0)
  rownames(svdPV.m) = paste("Comp.", 1:dim(svdPV.m)[1])
  tmp_plot          = pheatmap(log10(t(svdPV.m)), 
                               cluster_rows  = F, 
                               cluster_cols  = F, 
                               breaks        = breaks.v, 
                               color         = myPalette, 
                               silent        = T, 
                               legend        = F, 
                               main          = paste("Interpretation of the Components -", title), 
                               show_rownames = show_rownames, 
                               show_colnames = show_colnames)
  if (show_legend){
    leg               = grid::legendGrob(c(expression("p < 1x" ~ 10^{-10}),
                                           expression("p < 1x" ~ 10^{-5}),
                                           "p < 0.01", 
                                           "p < 0.05", 
                                           "p > 0.05"), 
                                         nrow = 5, pch = 15, gp = grid::gpar(fontsize = 10, col = myPalette))
  }else{
    leg = NULL
  }
  final_plot = arrangeGrob(tmp_plot$gtable, leg, ncol = 2, widths = c(5,1))
  if (!silent){
    plot.new()
    grid::grid.draw(final_plot)
  }
  return(final_plot)
}

check_association_CHAMP <- function(x, y){
  if(!is.numeric(y)){
    return(kruskal.test(x ~ as.factor(y))$p.value)
  }else{
    return(summary(lm(x ~ y))$coeff[2, 4])
  }
}

interpret_PC <- function(component_matrix, clinical_df, title = NULL, 
                         silent = FALSE, show_rownames = TRUE, show_colnames = TRUE, show_legend = TRUE){
  tmp = apply(component_matrix, 2,
              function(x) 
                sapply(clinical_df, 
                       function(y) 
                         check_association_CHAMP(x, y)))
  drawheatmap(t(tmp), title = title, 
              silent = silent, show_rownames = show_rownames, show_colnames = show_colnames, show_legend = show_legend)
}

DIY: Apply this interpret_PC function on the result of your PCA in order to evaluate the associations with your clinical data. The component matrix can be retrieved through mrna_pca_results$ind$coord.

interpret_PC(component_matrix = mrna_pca_results$ind$coord, clinical_df = clinical_data, title = "mRNA")

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name                  grob
## 1 1 (1-1,1-1) arrange        gtable[layout]
## 2 2 (1-1,2-2) arrange frame[GRID.frame.393]

This is not the case here, as Sample_Group is only composed of two categories, but sometimes, you can see a strong link with a qualitative variable but without really knowing which category is best represented. It can be interesting to add intermediary variables describing each category. More generally, if you have a qualitative variable with multiple categories, it can be interesting to try to decipher if a PC describes one or more of these categories.

DIY: Create intermediary variable for each category of the clinical/co-variable you selected. Each new intermediary variables must specify “I belong to this category or not”. Do not forget to convert these variables into factors.

# clinical_data$Sample_Group = as.factor(clinical_data$Sample_Group)
# each_subtypes_vs_ALL  = model.matrix(~ subtype - 1, data = clinical_data)
# each_subtypes_vs_ALL  = apply(each_subtypes_vs_ALL, 2, as.factor)
# clinical_data         = cbind(clinical_data, each_subtypes_vs_ALL)

DIY: Draw again the previous plot with these new intermediary variables.

# interpret_PC(component_matrix = mrna_pca_results$ind$coord, clinical_df = clinical_data, title = "mRNA")

6.3 PCA for other modalities

DIY Explore your other omic modalities with PCA

mirna_pca_results   = PCA(blocks$miRNA, scale.unit = TRUE, ncp = 25, graph = F) 
protein_pca_results = PCA(blocks$DNAm, scale.unit = TRUE, ncp = 25, graph = F)

6.3.1 Sample/individual Space

a = fviz_pca_ind(mirna_pca_results, axes = c(1,2), subtitle  = "miRNA") + 
  geom_point(aes(colour = clinical_data$Sample_Group)) +
  guides(colour = guide_legend(title = "color"))

b = fviz_pca_ind(protein_pca_results, axes = c(1,2), subtitle  = "DNAm") + 
  geom_point(aes(colour = clinical_data$Sample_Group)) +
  guides(colour = guide_legend(title = "color"))

ggarrange(a, b, ncol = 2, nrow = 1)

rm(a,b)

6.3.2 Variable Space

a = fviz_contrib(mirna_pca_results, choice = "var", axes = 1, top = 50, subtitle  = "miRNA")
b = fviz_contrib(protein_pca_results, choice = "var", axes = 1, top = 50, subtitle  = "DNAm")

ggarrange(a, b, ncol = 2, nrow = 1)

rm(a,b)

a = fviz_pca_var(mirna_pca_results, axes = c(1, 2), repel = T, select.var = list(contrib = 50), subtitle  = "miRNA")
b = fviz_pca_var(protein_pca_results, axes = c(1, 2), repel = T, select.var = list(contrib = 50), subtitle  = "DNAm")

ggarrange(a, b, ncol = 2, nrow = 1)

rm(a,b)

6.3.3 Biplot

a = fviz_pca_biplot(mirna_pca_results, repel = T, select.var = list(contrib = 50), axes = c(1,2), subtitle  = "miRNA")+ 
  geom_point(aes(colour = clinical_data$Sample_Group)) +
  guides(colour = guide_legend(title = "color"))

b = fviz_pca_biplot(protein_pca_results, repel = T, select.var = list(contrib = 50), axes = c(1,2), subtitle  = "DNAm")+ 
  geom_point(aes(colour = clinical_data$Sample_Group)) +
  guides(colour = guide_legend(title = "color"))

ggarrange(a, b, ncol = 2, nrow = 1)

rm(a,b)

6.3.4 Scree plot

a = fviz_eig(mirna_pca_results, subtitle  = "miRNA")
b = fviz_eig(protein_pca_results, subtitle  = "DNAm")

ggarrange(a, b, ncol = 2, nrow = 1)

rm(a,b)

6.3.5 Going Further

a = interpret_PC(component_matrix = mirna_pca_results$ind$coord, clinical_df = clinical_data, title = "miRNA", 
                 silent = T, show_rownames = F, show_legend = F)
b = interpret_PC(component_matrix = protein_pca_results$ind$coord, clinical_df = clinical_data, title = "DNAm", silent = T)

ggarrange(a, b, ncol = 2, nrow = 1)

rm(a,b)

7 Regularized Generalized Canonical Correlation Analysis (RGCCA)

7.1 First application of RGCCA

DIY Store all your omic tables in a list of matrices and verify that they all have the same number of samples, that samples are in rows and that sample_IDs match across matrices.

sapply(blocks, dim)

##      mRNA miRNA DNAm
## [1,]   87    87   87
## [2,] 2000   350 2000

all(rownames(blocks$DNAm) == rownames(blocks$mRNA))

## [1] TRUE

all(rownames(blocks$DNAm) == rownames(blocks$miRNA))

## [1] TRUE

DIY Apply the rgcca function from the RGCCA package and extract more than 1 component per block. Also, let us print the progression of the algorithm. You will have to set the parameters:

blocks: the list of matrices of size samples $\times$ variables you want to analyze.
ncomp: the number of component extracted per block.
verbose: if you want or not the progression of the algorithm to be displayed.

fit_RGCCA = rgcca(blocks = blocks, ncomp = 2, verbose = T)

## Computation of the RGCCA block components based on the factorial scheme 
## Shrinkage intensity parameters are chosen manually 
## Computation of the RGCCA block components #1 is under progress...
##  Iter:    1  Fit:  0.04899960  Dif:  0.03290491 
##  Iter:    2  Fit:  0.05214152  Dif:  0.00314192 
##  Iter:    3  Fit:  0.05336547  Dif:  0.00122395 
##  Iter:    4  Fit:  0.05400096  Dif:  0.00063548 
##  Iter:    5  Fit:  0.05428003  Dif:  0.00027908 
##  Iter:    6  Fit:  0.05438787  Dif:  0.00010783 
##  Iter:    7  Fit:  0.05442615  Dif:  0.00003828 
##  Iter:    8  Fit:  0.05443902  Dif:  0.00001288 
##  Iter:    9  Fit:  0.05444322  Dif:  0.00000419 
##  Iter:   10  Fit:  0.05444456  Dif:  0.00000134 
##  Iter:   11  Fit:  0.05444498  Dif:  0.00000042 
##  Iter:   12  Fit:  0.05444511  Dif:  0.00000013 
##  Iter:   13  Fit:  0.05444515  Dif:  0.00000004 
##  Iter:   14  Fit:  0.05444517  Dif:  0.00000001 
##  Iter:   15  Fit:  0.05444517  Dif:  0.00000000

## The RGCCA algorithm converged to a stationary point after 14 iterations

## Computation of the RGCCA block components #2 is under progress...
##  Iter:    1  Fit:  0.02610835  Dif:  0.01498170 
##  Iter:    2  Fit:  0.02680963  Dif:  0.00070128 
##  Iter:    3  Fit:  0.02681587  Dif:  0.00000625 
##  Iter:    4  Fit:  0.02681620  Dif:  0.00000033 
##  Iter:    5  Fit:  0.02681622  Dif:  0.00000002 
##  Iter:    6  Fit:  0.02681622  Dif:  0.00000000

## The RGCCA algorithm converged to a stationary point after 5 iterations

The mention verbose = TRUE resulted in the printing of the progression of the algorithm both as a text or as a graph. On the text, three information are displayed for each line: (i) the number of the current iteration, (ii) the value of the optimization criterion of RGCCA (that we want to maximize) and (iii) the difference between the current and the previous value of the criterion. Furthermore, we can see that the algorithm is monotonically convergent (on the plots, the criterion always increases or stay the same; on the text, Dif is always positive or null).

We can see the “sequentiallity” in the computation of the components as the first one is estimated until convergence and only then, the second component is estimated.

In the end, convergence is obtained in a few number of iterations and relatively quickly. The algorithm stops when Dif is below a certain threshold, arbitrary low (1e-8).

Let us explore the parameters of the rgcca function, their default value and what they stand for (to access to the explanation of the parameter, click on it. These explanations are mostly coming from ??rgcca, except for comments in bold):

tau = 1, 1, 1
If tau is a vector, tau[j] is used for the constraints applied to all the block weight vectors associated to block $X_j$. If tau is a matrix, tau[k, j] is associated with the constraints applied to the kth block weight vector corresponding to block $X_j$.
tol = 10^{-8}
The stopping value for the convergence of the algorithm. The lower this value, the more precise should be the estimates but the longer it takes to estimate them.
init = svd
A string giving the type of initialization to use in the RGCCA algorithm. It could be either by Singular Value Decomposition (“svd”) or by random initialization (“random”). The latter is actually quicker in situations where a block has a high number of variables and observations.
bias = TRUE
A logical value for biased $(1/n)$ or unbiased $(1/(n-1))$ estimator of the variance/covariance.
quiet = TRUE
A logical value indicating if some diagnostic messages are reported.
scale = TRUE
A logical value indicating if each variable for each block have to be standardized (actually, they are going to be at least centered whatever the value of this parameter is).
ncomp = 2, 2, 2
A numerical value or a vector of length $J$ indicating the number of components per block (meaning, a different number of component can be extracted for each block). If a single value is provided, the same number of components is extracted for every block.
scheme = factorial
A string or a function specifying the scheme function applied to covariance maximization among “horst” (the identity function), “factorial” (the square function - default value), “centroid” (the absolute value function). The scheme function can be any continuously differentiable convex function and it is possible to design explicitly the scheme function (e.g. function(x) x^4) as argument of the function.
method = rgcca
A string specifying which multiblock component method to consider. Possible values are found using RGCCA::available_methods(). When a method is chosen, the value of tau, connection and scheme are automatically defined.
verbose = TRUE
A logical value indicating if the progress of the algorithm is reported while computing.
sparsity = 1, 1, 1
only relevant when you want to use a sparse version of RGCCA.
response =
A numerical value giving the position of the response block. When the response argument is filled, the supervised mode is automatically activated. It especially means that in such situation, the connection matrix is automatically designed so that all blocks are linked to the response block and the response block only.
NA_method = na.ignore
A string indicating the method used for handling missing values (“na.ignore”: handle missing values, “na.omit”: keep only subject without any missing value accross all blocks).
comp_orth = TRUE
A logical value indicating if the deflation should lead to orthogonal block components or orthogonal block weight vectors.
n_iter_max = 1000
Integer giving the algorithm’s maximum number of iterations.
connection = 0, 1, 1, 1, 0, 1, 1, 1, 0
A $(J\times J)$ symmetric matrix describing the network of connections between blocks. Usually $c_{kl} = c_{lk} = 1$ is blocks $k$ and $l$ are connected and $0$ otherwise. Most of the time $c_{kk}=0$, otherwize, the algorithm also tries to maximize the variance of the block component (when $tau[k]\neq 0$).
superblock = FALSE
A logical value indicating if the superblock option is used. The superblock corresponds to the concatenation of all blocks provided. When it is used, all blocks are connected to the superbock and the superblock only. It is used to estimate components computed with all the variables across all the blocks. For examples using the superblock, see (Garali & al. 2018) and (Cantini & al. 2021)
scale_block = inertia
A logical value or a string indicating if each block is scaled. If TRUE or “inertia”, each block is divided by the sum of eigenvalues of its empirical covariance matrix. If “lambda1”, each block is divided by the square root of the highest eigenvalue of its empirical covariance matrix.If standardization is applied (scale = TRUE), the block scaling applies on the standardized blocks. In multi-block analysis, normalizing variables is important but normalizing blocks is also mandatory !

7.2 Explore the graphical representations of RGCCA

7.2.1 Sample space

Similarly to PCA, the sample space can be explored.

DIY Display the sample space associated with your first RGCCA analysis with the function plot. Also color the sample depending on the phenotype/co-variable you are interested in (you can select the same as for the PCA section). You will have to set the parameters:

type: to select the type of space you want to display.
response: corresponding on the phenotype/co-variable you selected.

plot(fit_RGCCA, type = "sample", response = clinical_data$Sample_Group)

You can see here that, by default, the first component of the first block is displayed against the second component of the second block.

DIY Try a different representation for your sample space. You will have to set the parameters:

block: to select the blocks you want to be displayed.
comp: to select the component you want to be displayed.

plot(fit_RGCCA, type = "sample", response = clinical_data$Sample_Group, block = c(1, 3), comp = c(1, 1))

7.2.2 Variable Space

Similarly, the variable space associated with only 1 component can be displayed.

DIY Display the variable space associated with your first RGCCA analysis with the function plot. Choose the block and the component you want to visualize and restrain the representation to the top 50 variables. You will have to set the parameters:

type: to select the type of space you want to display.
block: to select the blocks you want to be displayed.
comp: to select the component you want to be displayed.
n_mark: the number of variables contributing the most to this component.

plot(fit_RGCCA, type = "weights", block = 3, comp = 2, n_mark = 50)

If you compare the variable space between PCA and RGCCA, you can realize that for RGCCA, weights can change sign. It is because for PCA the percentage of contribution was displayed (the absolute value of the weights), whereas here, the weights with its sign is shown (and they are ordered by their absolute value).

Actually, the weights of multiple blocks can be shown at the same time.

DIY Display the variable space associated with all the blocks for the component of your choice. You will have to play with n_mark for all the blocks to really appear.

plot(fit_RGCCA, type = "weights", comp = 1, n_mark = 200)

Similarly to PCA, you can display the correlation circle. Though, RGCCA does not have the same propriety as PCA. Indeed, the variable weights are not anymore equals to the correlation between the component and the variable: 0.0466347, 0.5724352 $\neq$ 0.0050023, 0.0252807. So, what is actually displayed in the correlation circle proposed by RGCCA is the correlation between each component and the corresponding variable.

DIY Display the correlation circle for the block of your choice. You will have to set the parameters:

type: to select the type of space you want to display.
display_blocks: to select the block you want to be displayed.

Unfortunately the n_marks parameter is not implemented yet for this type of representation.

plot(fit_RGCCA, type = "cor_circle", display_blocks = 1)

DIY Display the correlation circle for all the blocks at the same time.

plot(fit_RGCCA, type = "cor_circle")

7.2.3 Biplot

Similarly to PCA, a biplot representation can be displayed.

DIY Display a biplot for the block of your choice. You will have to set the parameters:

type: to select the type of space you want to display.
response: corresponding on the phenotype/co-variable you selected.
block: to select the blocks you want to be displayed.
comp: to select the component you want to be displayed.
show_arrows: to eventually remove the arrows classically displayed on a correlation circle plot.

Unfortunately the n_marks parameter is not implemented yet for this type of representation.

plot(fit_RGCCA, type = "biplot", show_arrows = F, block = 2, response = clinical_data$Sample_Group)

7.2.4 Screeplot

Similarly to PCA, a screeplot can be displayed. An important propriety when extracting the $j^{th}$ component $\bf{y}_j$ from block $\bf{X} = \left[\bf{x}_1, \ldots, \bf{x}_p\right]$, of dimension $n\times p$, with PCA is that the variance of $\bf{y}_j$ equals to $\lambda_j^2$, the $j^{th}$ singular value of $\bf{X}$. Hence, the percentage of variance explained by this component can be computed as:

\[\frac{\lambda_j^2}{\sum_{k=1}^{min(n,p)}\lambda_k^2}\] Furthermore, as each new singular value is necessarily lower or equals to the previous one and greater or equals to the next one, it guarantees that the percentage of explained variances decreases with the level of the component. These proprieties are no longer true for RGCCA, especially because a block component no longer focuses on maximizing only the extracted variance from its block but also cares about being highly correlated with its surrounding blocks (no link between a component and the singular values of its block can be made in the general case). A new formula is used to compute the explained variance of the extracted components, which is strictly equivalent to the above formula in the case of PCA (i.e. components computed with PCA):

\[\frac{1}{\sum_{k=1}^p var(\bf{x}_k)}\sum_{l=1}^p var(\bf{x}_l) cor(\bf{x}_l, \bf{y}_j)^2\].

DIY Display the averaged variance explained by each block and each component. You will have to correctly define the parameter type in the function plot for this.

plot(fit_RGCCA, type = "ave")

Indeed, the averaged variance explained by each component is not necessarily decreasing with the level of the component.

7.2.5 Going Further

DIY Using the function interpret_PC on the result of RGCCA, try to evaluate the associations between each extracted component of each block and your clinical data. The component matrix can be retrieved through fit_RGCCA$Y$mRNA for example.

a = interpret_PC(component_matrix = fit_RGCCA$Y$mRNA, clinical_df = clinical_data, title = "mRNA", silent = T, show_rownames = F, show_legend = F)
b = interpret_PC(component_matrix = fit_RGCCA$Y$miRNA, clinical_df = clinical_data, title = "miRNA", silent = T, show_rownames = F, show_legend = F)
c = interpret_PC(component_matrix = fit_RGCCA$Y$DNAm, clinical_df = clinical_data, title = "DNAm", silent = T)

ggarrange(a, b, c, nrow = 1, ncol = 3)

rm(a,b,c)

7.3 Playing with the parameters

7.3.1 Impact of the parameter $\tau$

DIY For a higher number of component than previously, compute two RGCCA models, one with $\tau = 1$ and one with an arbitrary low value of $\tau$. You will have to play with the parameter tau.

ncomp = 5

fit_RGCCA_tau_1   = rgcca(blocks = blocks, tau = 1, ncomp = ncomp)
fit_RGCCA_tau_0.1 = rgcca(blocks = blocks, tau = 0.1, ncomp = ncomp)

DIY Apply the function interpret_PC on your two previous RGCCA models in order to see if there are any differences.

a = interpret_PC(component_matrix = fit_RGCCA_tau_1$Y$mRNA, clinical_df = clinical_data, title = "mRNA \n tau = 1", silent = T, show_rownames = F, show_legend = F)
b = interpret_PC(component_matrix = fit_RGCCA_tau_1$Y$miRNA, clinical_df = clinical_data, title = "miRNA \n tau = 1", silent = T, show_rownames = F, show_legend = F)
c = interpret_PC(component_matrix = fit_RGCCA_tau_1$Y$DNAm, clinical_df = clinical_data, title = "DNAm \n tau = 1", silent = T)

d = interpret_PC(component_matrix = fit_RGCCA_tau_0.1$Y$mRNA, clinical_df = clinical_data, title = "mRNA \n tau = 0.1", silent = T, show_rownames = F, show_legend = F)
e = interpret_PC(component_matrix = fit_RGCCA_tau_0.1$Y$miRNA, clinical_df = clinical_data, title = "miRNA \n tau = 0.1", silent = T, show_rownames = F, show_legend = F)
f = interpret_PC(component_matrix = fit_RGCCA_tau_0.1$Y$DNAm, clinical_df = clinical_data, title = "DNAm \n tau = 0.1", silent = T)

ggarrange(a, b, c, d, e, f, nrow = 2, ncol = 3)

rm(a,b,c,d,e,f)

7.3.2 Impact of the function $g$

DIY Now, compute two RGCCA models with two different values of the function $g$. You will have to play with the parameter scheme.

fit_RGCCA_tau_factorial = rgcca(blocks = blocks, scheme = "factorial", ncomp = ncomp)
fit_RGCCA_tau_horst     = rgcca(blocks = blocks, scheme = "horst", ncomp = ncomp)

DIY Apply the function interpret_PC on your two previous RGCCA models in order to see if there are any differences.

a = interpret_PC(component_matrix = fit_RGCCA_tau_factorial$Y$mRNA, clinical_df = clinical_data, title = "mRNA \n tau = 1", silent = T, show_rownames = F, show_legend = F)
b = interpret_PC(component_matrix = fit_RGCCA_tau_factorial$Y$miRNA, clinical_df = clinical_data, title = "miRNA \n tau = 1", silent = T, show_rownames = F, show_legend = F)
c = interpret_PC(component_matrix = fit_RGCCA_tau_factorial$Y$DNAm, clinical_df = clinical_data, title = "DNAm \n tau = 1", silent = T)

d = interpret_PC(component_matrix = fit_RGCCA_tau_horst$Y$mRNA, clinical_df = clinical_data, title = "mRNA \n tau = 0.1", silent = T, show_rownames = F, show_legend = F)
e = interpret_PC(component_matrix = fit_RGCCA_tau_horst$Y$miRNA, clinical_df = clinical_data, title = "miRNA \n tau = 0.1", silent = T, show_rownames = F, show_legend = F)
f = interpret_PC(component_matrix = fit_RGCCA_tau_horst$Y$DNAm, clinical_df = clinical_data, title = "DNAm \n tau = 0.1", silent = T)

ggarrange(a, b, c, d, e, f, nrow = 2, ncol = 3)

rm(a,b,c,d,e,f)

7.3.3 Impact of the connection matrix $\bf{C}$

DIY Build two connection matrices: one with all the bocks connected to each other and one with all the blocks connected to a single (common) one. Do not forget to put zeros on the diagonal of these matrices (otherwize, the variance of each component will also be maximized).

L   = length(blocks)
C_1 = matrix(1, nrow = L, ncol = L) - diag(L)

C_2       = matrix(0, nrow = L, ncol = L)
C_2[, L]  = C_2[L, ] = 1
C_2[L, L] = 0

DIY Now, compute two RGCCA models, one with each connection matrix you created earlier. You will have to play with the parameter connection.

fit_RGCCA_tau_complete     = rgcca(blocks = blocks, connection = C_1, ncomp = ncomp)
fit_RGCCA_tau_hierarchical = rgcca(blocks = blocks, connection = C_2, ncomp = ncomp)

DIY Apply the function interpret_PC on your two previous RGCCA models in order to see if there are any differences.

a = interpret_PC(component_matrix = fit_RGCCA_tau_complete$Y$mRNA, clinical_df = clinical_data, title = "mRNA \n tau = 1", silent = T, show_rownames = F, show_legend = F)
b = interpret_PC(component_matrix = fit_RGCCA_tau_complete$Y$miRNA, clinical_df = clinical_data, title = "miRNA \n tau = 1", silent = T, show_rownames = F, show_legend = F)
c = interpret_PC(component_matrix = fit_RGCCA_tau_complete$Y$DNAm, clinical_df = clinical_data, title = "DNAm \n tau = 1", silent = T)

d = interpret_PC(component_matrix = fit_RGCCA_tau_hierarchical$Y$mRNA, clinical_df = clinical_data, title = "mRNA \n tau = 0.1", silent = T, show_rownames = F, show_legend = F)
e = interpret_PC(component_matrix = fit_RGCCA_tau_hierarchical$Y$miRNA, clinical_df = clinical_data, title = "miRNA \n tau = 0.1", silent = T, show_rownames = F, show_legend = F)
f = interpret_PC(component_matrix = fit_RGCCA_tau_hierarchical$Y$DNAm, clinical_df = clinical_data, title = "DNAm \n tau = 0.1", silent = T)

ggarrange(a, b, c, d, e, f, nrow = 2, ncol = 3)

rm(a,b,c,d,e,f)

7.4 Using permutation to tune the parameters in an unsupervised setting

In an unsupervised setting, a permutation procedure is provided by the RGCCA package in order to tune the parameters.

DIY Use the function rgcca_permutation to tune the parameter $\tau$. Depending on your previous results, choose the function $g$, the number of components and the connection matrix $\bf{C}$ you want to work with. You will have to set:

ncomp: the number of components.

By default, 20 permutations are performed (use n_perms if you want to change it) to tune tau (modify par_type if you want to tune another parameter).

# This function is commented because it is taking too long
# perm_fit_RGCCA_tau = rgcca_permutation(blocks = blocks, ncomp = 10)

# Let us load the precomputed results. They were obtained with
# par_value = sapply(blocks, function(x) pracma:::logseq(x1 = 1e-4, x2 = 0.5, n = 20))
# n_perms   = 50
# ncomp     = 10
# init      = "random"
# tol       = 1e-5
perm_fit_RGCCA_tau = readRDS("data/MDD/perm_fit_RGCCA_tau.rds")

By default, $10$ values of the tuned parameter are tested and forced to be the same for all blocks and components. If needed, you can provide your own matrix of size (number of tested sets of parameters)$\times$(number of blocks) to the parameter par_value of the rgcca_permutation.

If this procedure is taking too long, you can play with:

n_cores: RGCCA is run $\left(n_{perms}+1\right)\times n_{par\_value}$, it can be interesting to run them in parallel, which is possible with rgcca_permutation. n_cores corresponds to the number of cores you want to use in parallel.
tol: the stopping value for the convergence of the algorithm. As we are mainly interested in relative differences between the value of the RGCCA criterion in permuted and unpermuted situations, as long as the value of tol is sufficiently lower than this relative difference, it should not change the conclusion of this permutation test. Thus, it might be interesting to define tol to higher values than the default one (default value is $1e-8$) in order to save some computation time.
init: the type of initialization used for the RGCCA algorithm. The block-weight vectors can either be initialized with a Singular Value Decomposition (SVD) of their block or randomly. Initializing at the SVD can take more time (when there is a high number of variables and observations).

DIY With the function plot, display the result of this permutation and identify the optimal value for $\tau$.

plot(perm_fit_RGCCA_tau)

Each line of the graph above is associated with a set of tested parameters. For each line, the boxplot corresponds to the permuted distribution of RGCCA’s criterion and the triangle to the un-permuted value of the criterion.

At the time we speak, the RGCCA package does not allow to tune several parameters at the same time (ex: tau and ncomp). This is not ideal but a workaround can be to perform this tuning sequentially. For example you can set tau with an arbitrary high number of components and then, at this best value of tau, tune the number of components by permutation.

7.5 Bootstrapping: a way of assessing the robustness of the tuned model

Now that we optimized the set of parameters for our model, we wish to evaluate its robustness. This is going to be performed through bootstrapping.

DIY Compute an RGCCA model associated with the optimal parameter values estimated through permutation. You can have access to this optimal values through perm_fit_RGCCA_tau$best_params.

optim_fit_RGCCA = rgcca(blocks = blocks, tau = perm_fit_RGCCA_tau$best_params, ncomp = 10, init = "random")

DIY Use the function rgcca_bootstrap to compute the confidence interval associated with each feature for the optimal (according to the permutation procedure) RGCCA model. You will have to set:

rgcca_res: the model you want to bootstrap.
n_boot: the number of bootstrap re-sampling.

# This function is commented beacuse it is taking too long
# bootstrap_fit_RGCCA = rgcca_bootstrap(rgcca_res = optim_fit_RGCCA, n_boot = 100)

# Let us load the precomputed results. They were obtained with
# n_boot=2000
bootstrap_fit_RGCCA = readRDS("data/MDD/bootstrap_fit_RGCCA.rds")

DIY Using the plot function, display the confidence interval associated with the block of your choice (parameter block), the component of your choice (parameter comp) and corresponding to the top n_mark features (by default, variables are ranked in decreasing order of the absolute value of their weight estimated on the whole (non-re-sampled) data-set).

plot(bootstrap_fit_RGCCA, block = 3, n_mark = 100, comp = 1)

On the figure above, the first 100 weights (black dots: they were estimated on the regular cohort, without re-sampling) with their estimated Confidence Interval (grey lines) for the first component associated with the DNAm block.

DIY You can use also the function summary, to explore in details your boostrapped results and eventually identify significant weights. You can play with the parameters:

block: the block you want to explore.
comp: the component you want to explore.
adj.method: to define your favorite method to adjust your p-values for multiple testing.

summary(bootstrap_fit_RGCCA, block = 2, comp = 1, adj.method = "fdr")

By default, variables are displayed in decreasing order of the absolute value of their estimated weights.

You have identified one or more components computed with variables whose weights are significantly different from 0 in multiple blocks ?? Congrats ! You have build your first (or not o_0) multi-omics signature !! Now, you have to understand which signal it represents. You can use again interpret_PC to identify links between this (theses) component and co-variables… or you can perform enrichment analysis !!

There is nothing significant here ? Do not worry, there is plenty of things you can try !:

Have you removed all individuals having one or more NAs ? You can keep them, RGCCA can actually handle them.
Can you re-think about the pre-processing of your data ? Maybe something went wrong there.
Parameters can be better better tuned. By default, RGCCA forced them to be the same across all components and blocks.
Try extensions of RGCCA that allows to select variables (see the parameter sparsity in ??rgcca)
Can you easily remove batch effects and/or unwanted effects ? (for example association with age, site effects…)
Can you supervise your model ? (see the parameter response in ??rgcca)
But before all that, let us try MOFA and JIVE !!

8 Multi-Omics Factor Analysis (MOFA)

Multi-Omics Factor Analysis (MOFA) is a multi-omics data integration approach giving a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. Its extension, MOFA+, is designed for integration of single-cell multi-modal data.

Argelaguet et al., Mol Syst Biol, 2018.

MOFA decomposes each omics into the product of a factor matrix and omics-specific weight matrices.

Link to MOFA article : https://www.embopress.org/doi/full/10.15252/msb.20178124

Link to MOFA+ article : https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02015-1

We will perform the MOFA analysis and interpret the results using an object containing various data modalities.

Refer to the MOFA2 R package documentation to get details about each functions used in this section : https://www.bioconductor.org/packages/release/bioc/manuals/MOFA2/man/MOFA2.pdf

8.1 First steps

8.1.1 Vocabulary

In this MOFA section, we will use the following vocabulary :

view or modality : omic type that will compose a block of data in the analysis (eg transcriptomic, proteomic, genomic…)
feature : variable of a specific view (eg transcipt, protein, locus…)
factor : similar to a component in PCA, a factor is a weighted vector of all available features
factor values : compose the Z matrix
weight values (or loadings) : compose the W matrix

8.1.2 Input data

You should use a multiview matrix that contains all data cleaned and normalized. Each view of the matrix is a block of data corresponding to a modality, with features as lines and samples as columns. No concordance between lines across modalities is required, but columns should correspond to the same sample across modalities (you can still have some missing data).

MOFA accept several formats for this multiview matrix: data.frame, list of matrices, multiAssayExperiments…

DIY: write the code to create such a multiview matrix for your data.

#we just have to transpose the blocks 
multiview_mat=list(methylation=t(blocks$DNAm), mrna=t(blocks$mRNA), mirna=t(blocks$miRNA))

8.2 Prepare MOFA

Before running MOFA, you should first create a MOFA object including the different data modalities and then specify options.

8.2.1 Create MOFA object

You can create a MOFA object from an artificially generated dataset with make_example_data function or using a real dataset as the multiview matrix you just created. Each sub-matrix contains one modality with sample as columns.

DIY: use the create_mofa function to create a MOFAobject containing input data, then use the plot_data_overview function over this object to check available information for each modality.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                          Create the MOFA object                            ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

MOFAobject <- create_mofa(multiview_mat)

#Plot overview of the input data, including the number of samples, views, features, and the missing assays.
plot_data_overview(MOFAobject) # everything matches here, but we could have kept non matching samples with MOFA

NB: by default no group is defined among samples, however you can specify some groups if necessary using “groups” option in the create_mofa function. Group information is useful to analyse single cell dataset by specifying groups of sample (ie cell) coming from the same biological sample or to take into account batch or experimental conditions.

8.2.2 MOFA options

We have to define MOFA options before running it, they are grouped in 3 categories :

Data options
Model options
Training options

You will have to create a list for each of these option categories.

8.2.2.1 Data options

Main data options are:

scale_views: logical indicating whether to scale views to have the same unit variance. As long as the scale differences between the views is not too high, this is not required. Default is FALSE.
scale_groups: logical indicating whether to scale groups to have the same unit variance. As long as the scale differences between the groups is not too high, this is not required. Default is FALSE.
center_groups: logical indicating whether to center groups to have the same mean. As long as the mean differences between the groups is not too high, this is not required. Default is FALSE.

You can also change the name of views and groups through the ‘data_options’ field of the MOFA object.

DIY: retrieve the list of default data options associated to your MOFAobject in a dedicated variable (eg named data_options) with the get_default_data_options function and modify any fields of this list if necessary.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                              MOFA data options                             ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

data_options <- get_default_data_options(MOFAobject)
data_options$scale_views <- TRUE
data_options

## $scale_views
## [1] TRUE
## 
## $scale_groups
## [1] FALSE
## 
## $center_groups
## [1] TRUE
## 
## $use_float32
## [1] TRUE
## 
## $views
## [1] "methylation" "mrna"        "mirna"      
## 
## $groups
## [1] "group1"

8.2.2.2 Model options

Main model options are:

likelihoods: data nature per view (possible options are “gaussian”, “poisson”, “bernoulli”). ‘gaussian’ is for continuous data, ‘bernoulli’ for binary data and ‘poisson’ for count/discrete data. By default they are inferred automatically.
num_factors: number of factors to use for the factorization

You can also specified through ‘model_options’ the use of sparsity priors on the weight and factor matrices (W and Z). And which type of prior to use (ARD, spike-and-slab, or both). In the original MOFA approach, sparsity is only applied on the W matrix (ARD + spike-and-slab). In MOFA+ sparsity prior is also applied to the factor matrix Z when considering several groups.

Corresponding options are : spikeslab_factors, spikeslab_weights, ard_factors and ard_weights.

DIY: retrieve the list of default model options associated to your MOFAobject in a dedicated variable (eg named model_options) with the get_default_model_options function and check first the components of these list and modify if necessary the sparsity levels to consider in your model. Likelihoods and number of factors will be modified in the next steps.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                              MOFA model options                            ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

model_options <- get_default_model_options(MOFAobject)
model_options$spikeslab_weights <- TRUE
model_options

## $likelihoods
## methylation        mrna       mirna 
##  "gaussian"  "gaussian"  "gaussian" 
## 
## $num_factors
## [1] 15
## 
## $spikeslab_factors
## [1] FALSE
## 
## $spikeslab_weights
## [1] TRUE
## 
## $ard_factors
## [1] FALSE
## 
## $ard_weights
## [1] TRUE

Likelihoods to use.

The model flexibly handles different data distribution, it is important to choose the likelihoods corresponding to the data (you can visually check data distribution for that). You can also let MOFA deciding for you, but it’s better if you don’t.

DIY: check visually the distribution of each data modality using basic function such as hist.

Visual inspection of mRNA data

## mRNA histogram
###################################

hist(MOFAobject@data$mrna$group1 , breaks = 100, main = "Histogram of mRNA count values stored in MOFA object")

Visual inspection of miRNA data

## miRNA histogram
###################################

hist(MOFAobject@data$mirna$group1 , breaks = 100, main = "Histogram of miRNA values stored in MOFA object")

Visual inspection of methylation data

## methylation histogram
###################################

hist(MOFAobject@data$methylation$group1 , breaks = 100, main = "Histogram of methylation values stored in MOFA object")

Open question: Should we model all these data modalities with a gaussian distribution?

DIY: if you decide it, change the corresponding likelihood for a given modality in the model_options variable.

To change this option, you can use model_options$likelihoods <- “gaussian” or “poisson” or “bernoulli”

## as an example
#model_options$likelihoods["Mutations"] <- "bernoulli"

You can also transform your data to get a distribution closer than the expected distribution (for gaussian likelihood), it’s better to anticipate this step before creating your MOFAobject, or you can modify the corresponding fields in MOFAobject@data.

## as an example
#MOFAobject@data$mrna$group1<-log(MOFAobject@data$mrna$group1)
#hist(MOFAobject@data$mrna$group1 , breaks = 100, main = "Histogram of log mRNA count values stored in MOFA object")

Number of factors

For the number of factors, you can orient your choice using factorization matrix functions that estimate an “optimal” factorization rank. For example using the nmfEstimateRank function from NMF package. Different strategies can be used to decide the optimal rank based on results produced by nmfEstimateRank (read section 2.6 in https://cran.r-project.org/web/packages/NMF/vignettes/NMF-vignette.pdf).

DIY: use the nmfEstimateRank function on each modality to decide the number of factor to use with MOFA [optional].

Warning the execution of such approach can be time consuming, so keep a reasonable range (ie less than 10 different values).

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                           Estimate matrix ranks                            ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

rankEstimate_mrna <- nmfEstimateRank(MOFAobject@data$mrna$group1[,which(apply(MOFAobject@data$mrna$group1,2,function(x)sum(is.na(x)))==0)], seq(5,10), method='brunet', nrun=5, seed=123456)
plot(rankEstimate_mrna)

rankEstimate_miRNA <- nmfEstimateRank(MOFAobject@data$mirna$group1[,which(apply(MOFAobject@data$mirna$group1,2,function(x)sum(is.na(x)))==0)], seq(5,10), method='brunet', nrun=5, seed=123456)
plot(rankEstimate_miRNA)

rankEstimate_methylation <- nmfEstimateRank(MOFAobject@data$methylation$group1[,which(apply(MOFAobject@data$methylation$group1,2,function(x)sum(is.na(x)))==0)], seq(5,10), method='brunet', nrun=5, seed=123456)
plot(rankEstimate_methylation)

Then you can decide the number of factors to use in MOFA based on rank of each data modality, for example by taking the maximum rank over all modalities. You can also take by default a number of factors ~10. MOFA can remove automatically uninformative factors during its execution (see training options).

DIY: set the number of factors in the corresponding field in your model_options variable.

## Model options
###################################

model_options$num_factors <- 10

8.2.2.3 Training options

Main training options are:

maxiter: maximum number of iterations (default is 1000).
convergence_mode: “fast”, “medium”, “slow”. For exploration, the fast mode is good enough.
startELBO: initial iteration to compute the ELBO (the objective function used to assess convergence)
freqELBO: frequency of computations of the ELBO (the objective function used to assess convergence)
drop_factor_threshold: minimum fraction of variance explained criteria to drop factors while training (eg 0.01 for 1%). Default is -1 (no dropping of factors). One can change it to discard factors that explain a low percentage of variance in all views.
stochastic: logical indicating whether to use stochastic variational inference (only required for very big data sets, default is FALSE).
seed : numeric indicating the seed for reproducibility (default is 42).

DIY: retrieve the list of default training options associated to your MOFAobject in a dedicated variable (eg named train_options) with the get_default_training_options function and modify this list if necessary.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                          MOFA training options                             ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

train_options <- get_default_training_options(MOFAobject)
train_options$convergence_mode <- "slow"
train_options$freqELBO <- 5
train_options$startELBO <- 400
train_options$seed <- 42
train_options$drop_factor_threshold <- -1

train_options

## $maxiter
## [1] 1000
## 
## $convergence_mode
## [1] "slow"
## 
## $drop_factor_threshold
## [1] -1
## 
## $verbose
## [1] FALSE
## 
## $startELBO
## [1] 400
## 
## $freqELBO
## [1] 5
## 
## $stochastic
## [1] FALSE
## 
## $gpu_mode
## [1] FALSE
## 
## $gpu_device
## NULL
## 
## $seed
## [1] 42
## 
## $outfile
## NULL
## 
## $weight_views
## [1] FALSE
## 
## $save_interrupted
## [1] FALSE

8.3 Run MOFA

8.3.1 Prepare the object and run MOFA

DIY: use the prepare_mofa function to wrap all the previous variables including MOFAobject, data_options, model_options and train_options. Then run MOFA using the run_mofa function on the MOFAobject.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                          Prepare the MOFA object                           ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

MOFAobject <- prepare_mofa(MOFAobject,
  data_options = data_options,
  model_options = model_options,
  training_options = train_options
)

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                                  Run MOFA                                  ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

MOFAobject <- run_mofa(MOFAobject, outfile="/home/vandelj/Bureau/Bilille/Formation/ETBII/2024/Git/etbii/THEMATIC-SESSIONS/Multivariate_analysis/02_TP/data/MDD/MOFA2_TCGA.hdf5", use_basilisk = TRUE)

## 
##         #########################################################
##         ###           __  __  ____  ______                    ### 
##         ###          |  \/  |/ __ \|  ____/\    _             ### 
##         ###          | \  / | |  | | |__ /  \ _| |_           ### 
##         ###          | |\/| | |  | |  __/ /\ \_   _|          ###
##         ###          | |  | | |__| | | / ____ \|_|            ###
##         ###          |_|  |_|\____/|_|/_/    \_\              ###
##         ###                                                   ### 
##         ######################################################### 
##        
##  
##         
## use_float32 set to True: replacing float64 arrays by float32 arrays to speed up computations...
## 
## Scaling views to unit variance...
## 
## Successfully loaded view='methylation' group='group1' with N=87 samples and D=2000 features...
## Successfully loaded view='mrna' group='group1' with N=87 samples and D=2000 features...
## Successfully loaded view='mirna' group='group1' with N=87 samples and D=350 features...
## 
## 
## Model options:
## - Automatic Relevance Determination prior on the factors: False
## - Automatic Relevance Determination prior on the weights: True
## - Spike-and-slab prior on the factors: False
## - Spike-and-slab prior on the weights: True
## Likelihoods:
## - View 0 (methylation): gaussian
## - View 1 (mrna): gaussian
## - View 2 (mirna): gaussian
## 
## 
## 
## 
## ######################################
## ## Training the model with seed 42 ##
## ######################################
## 
## 
## ELBO before training: -2948664.49 
## 
## Iteration 1: time=0.01, Factors=10
## Iteration 2: time=0.02, Factors=10
## Iteration 3: time=0.01, Factors=10
## Iteration 4: time=0.02, Factors=10
## Iteration 5: time=0.02, Factors=10
## Iteration 6: time=0.02, Factors=10
## Iteration 7: time=0.02, Factors=10
## Iteration 8: time=0.02, Factors=10
## Iteration 9: time=0.02, Factors=10
## Iteration 10: time=0.02, Factors=10
## Iteration 11: time=0.01, Factors=10
## Iteration 12: time=0.02, Factors=10
## Iteration 13: time=0.02, Factors=10
## Iteration 14: time=0.02, Factors=10
## Iteration 15: time=0.01, Factors=10
## Iteration 16: time=0.02, Factors=10
## Iteration 17: time=0.02, Factors=10
## Iteration 18: time=0.02, Factors=10
## Iteration 19: time=0.01, Factors=10
## Iteration 20: time=0.02, Factors=10
## Iteration 21: time=0.02, Factors=10
## Iteration 22: time=0.02, Factors=10
## Iteration 23: time=0.02, Factors=10
## Iteration 24: time=0.01, Factors=10
## Iteration 25: time=0.02, Factors=10
## Iteration 26: time=0.02, Factors=10
## Iteration 27: time=0.02, Factors=10
## Iteration 28: time=0.01, Factors=10
## Iteration 29: time=0.02, Factors=10
## Iteration 30: time=0.02, Factors=10
## Iteration 31: time=0.02, Factors=10
## Iteration 32: time=0.02, Factors=10
## Iteration 33: time=0.02, Factors=10
## Iteration 34: time=0.01, Factors=10
## Iteration 35: time=0.01, Factors=10
## Iteration 36: time=0.02, Factors=10
## Iteration 37: time=0.02, Factors=10
## Iteration 38: time=0.01, Factors=10
## Iteration 39: time=0.02, Factors=10
## Iteration 40: time=0.02, Factors=10
## Iteration 41: time=0.01, Factors=10
## Iteration 42: time=0.02, Factors=10
## Iteration 43: time=0.01, Factors=10
## Iteration 44: time=0.02, Factors=10
## Iteration 45: time=0.02, Factors=10
## Iteration 46: time=0.01, Factors=10
## Iteration 47: time=0.02, Factors=10
## Iteration 48: time=0.01, Factors=10
## Iteration 49: time=0.02, Factors=10
## Iteration 50: time=0.02, Factors=10
## Iteration 51: time=0.02, Factors=10
## Iteration 52: time=0.02, Factors=10
## Iteration 53: time=0.02, Factors=10
## Iteration 54: time=0.01, Factors=10
## Iteration 55: time=0.02, Factors=10
## Iteration 56: time=0.02, Factors=10
## Iteration 57: time=0.02, Factors=10
## Iteration 58: time=0.02, Factors=10
## Iteration 59: time=0.02, Factors=10
## Iteration 60: time=0.02, Factors=10
## Iteration 61: time=0.02, Factors=10
## Iteration 62: time=0.02, Factors=10
## Iteration 63: time=0.02, Factors=10
## Iteration 64: time=0.01, Factors=10
## Iteration 65: time=0.02, Factors=10
## Iteration 66: time=0.02, Factors=10
## Iteration 67: time=0.02, Factors=10
## Iteration 68: time=0.02, Factors=10
## Iteration 69: time=0.02, Factors=10
## Iteration 70: time=0.02, Factors=10
## Iteration 71: time=0.02, Factors=10
## Iteration 72: time=0.02, Factors=10
## Iteration 73: time=0.01, Factors=10
## Iteration 74: time=0.02, Factors=10
## Iteration 75: time=0.02, Factors=10
## Iteration 76: time=0.02, Factors=10
## Iteration 77: time=0.02, Factors=10
## Iteration 78: time=0.02, Factors=10
## Iteration 79: time=0.02, Factors=10
## Iteration 80: time=0.02, Factors=10
## Iteration 81: time=0.01, Factors=10
## Iteration 82: time=0.02, Factors=10
## Iteration 83: time=0.01, Factors=10
## Iteration 84: time=0.02, Factors=10
## Iteration 85: time=0.01, Factors=10
## Iteration 86: time=0.01, Factors=10
## Iteration 87: time=0.02, Factors=10
## Iteration 88: time=0.02, Factors=10
## Iteration 89: time=0.02, Factors=10
## Iteration 90: time=0.01, Factors=10
## Iteration 91: time=0.02, Factors=10
## Iteration 92: time=0.02, Factors=10
## Iteration 93: time=0.02, Factors=10
## Iteration 94: time=0.02, Factors=10
## Iteration 95: time=0.01, Factors=10
## Iteration 96: time=0.02, Factors=10
## Iteration 97: time=0.02, Factors=10
## Iteration 98: time=0.02, Factors=10
## Iteration 99: time=0.01, Factors=10
## Iteration 100: time=0.02, Factors=10
## Iteration 101: time=0.02, Factors=10
## Iteration 102: time=0.02, Factors=10
## Iteration 103: time=0.02, Factors=10
## Iteration 104: time=0.02, Factors=10
## Iteration 105: time=0.02, Factors=10
## Iteration 106: time=0.02, Factors=10
## Iteration 107: time=0.01, Factors=10
## Iteration 108: time=0.02, Factors=10
## Iteration 109: time=0.02, Factors=10
## Iteration 110: time=0.01, Factors=10
## Iteration 111: time=0.02, Factors=10
## Iteration 112: time=0.02, Factors=10
## Iteration 113: time=0.01, Factors=10
## Iteration 114: time=0.02, Factors=10
## Iteration 115: time=0.02, Factors=10
## Iteration 116: time=0.02, Factors=10
## Iteration 117: time=0.01, Factors=10
## Iteration 118: time=0.02, Factors=10
## Iteration 119: time=0.02, Factors=10
## Iteration 120: time=0.02, Factors=10
## Iteration 121: time=0.01, Factors=10
## Iteration 122: time=0.02, Factors=10
## Iteration 123: time=0.01, Factors=10
## Iteration 124: time=0.02, Factors=10
## Iteration 125: time=0.02, Factors=10
## Iteration 126: time=0.02, Factors=10
## Iteration 127: time=0.02, Factors=10
## Iteration 128: time=0.02, Factors=10
## Iteration 129: time=0.01, Factors=10
## Iteration 130: time=0.02, Factors=10
## Iteration 131: time=0.02, Factors=10
## Iteration 132: time=0.02, Factors=10
## Iteration 133: time=0.01, Factors=10
## Iteration 134: time=0.02, Factors=10
## Iteration 135: time=0.01, Factors=10
## Iteration 136: time=0.02, Factors=10
## Iteration 137: time=0.02, Factors=10
## Iteration 138: time=0.01, Factors=10
## Iteration 139: time=0.02, Factors=10
## Iteration 140: time=0.02, Factors=10
## Iteration 141: time=0.02, Factors=10
## Iteration 142: time=0.02, Factors=10
## Iteration 143: time=0.02, Factors=10
## Iteration 144: time=0.01, Factors=10
## Iteration 145: time=0.02, Factors=10
## Iteration 146: time=0.02, Factors=10
## Iteration 147: time=0.01, Factors=10
## Iteration 148: time=0.02, Factors=10
## Iteration 149: time=0.02, Factors=10
## Iteration 150: time=0.02, Factors=10
## Iteration 151: time=0.01, Factors=10
## Iteration 152: time=0.02, Factors=10
## Iteration 153: time=0.02, Factors=10
## Iteration 154: time=0.02, Factors=10
## Iteration 155: time=0.01, Factors=10
## Iteration 156: time=0.02, Factors=10
## Iteration 157: time=0.02, Factors=10
## Iteration 158: time=0.02, Factors=10
## Iteration 159: time=0.02, Factors=10
## Iteration 160: time=0.01, Factors=10
## Iteration 161: time=0.02, Factors=10
## Iteration 162: time=0.01, Factors=10
## Iteration 163: time=0.02, Factors=10
## Iteration 164: time=0.02, Factors=10
## Iteration 165: time=0.01, Factors=10
## Iteration 166: time=0.02, Factors=10
## Iteration 167: time=0.02, Factors=10
## Iteration 168: time=0.01, Factors=10
## Iteration 169: time=0.02, Factors=10
## Iteration 170: time=0.01, Factors=10
## Iteration 171: time=0.02, Factors=10
## Iteration 172: time=0.02, Factors=10
## Iteration 173: time=0.02, Factors=10
## Iteration 174: time=0.01, Factors=10
## Iteration 175: time=0.02, Factors=10
## Iteration 176: time=0.01, Factors=10
## Iteration 177: time=0.01, Factors=10
## Iteration 178: time=0.02, Factors=10
## Iteration 179: time=0.01, Factors=10
## Iteration 180: time=0.02, Factors=10
## Iteration 181: time=0.02, Factors=10
## Iteration 182: time=0.01, Factors=10
## Iteration 183: time=0.02, Factors=10
## Iteration 184: time=0.01, Factors=10
## Iteration 185: time=0.02, Factors=10
## Iteration 186: time=0.02, Factors=10
## Iteration 187: time=0.01, Factors=10
## Iteration 188: time=0.02, Factors=10
## Iteration 189: time=0.02, Factors=10
## Iteration 190: time=0.02, Factors=10
## Iteration 191: time=0.02, Factors=10
## Iteration 192: time=0.01, Factors=10
## Iteration 193: time=0.03, Factors=10
## Iteration 194: time=0.02, Factors=10
## Iteration 195: time=0.01, Factors=10
## Iteration 196: time=0.02, Factors=10
## Iteration 197: time=0.02, Factors=10
## Iteration 198: time=0.02, Factors=10
## Iteration 199: time=0.02, Factors=10
## Iteration 200: time=0.01, Factors=10
## Iteration 201: time=0.02, Factors=10
## Iteration 202: time=0.02, Factors=10
## Iteration 203: time=0.01, Factors=10
## Iteration 204: time=0.02, Factors=10
## Iteration 205: time=0.02, Factors=10
## Iteration 206: time=0.02, Factors=10
## Iteration 207: time=0.02, Factors=10
## Iteration 208: time=0.02, Factors=10
## Iteration 209: time=0.01, Factors=10
## Iteration 210: time=0.01, Factors=10
## Iteration 211: time=0.02, Factors=10
## Iteration 212: time=0.02, Factors=10
## Iteration 213: time=0.02, Factors=10
## Iteration 214: time=0.02, Factors=10
## Iteration 215: time=0.02, Factors=10
## Iteration 216: time=0.02, Factors=10
## Iteration 217: time=0.02, Factors=10
## Iteration 218: time=0.02, Factors=10
## Iteration 219: time=0.02, Factors=10
## Iteration 220: time=0.02, Factors=10
## Iteration 221: time=0.02, Factors=10
## Iteration 222: time=0.02, Factors=10
## Iteration 223: time=0.01, Factors=10
## Iteration 224: time=0.02, Factors=10
## Iteration 225: time=0.02, Factors=10
## Iteration 226: time=0.02, Factors=10
## Iteration 227: time=0.02, Factors=10
## Iteration 228: time=0.02, Factors=10
## Iteration 229: time=0.02, Factors=10
## Iteration 230: time=0.02, Factors=10
## Iteration 231: time=0.01, Factors=10
## Iteration 232: time=0.02, Factors=10
## Iteration 233: time=0.02, Factors=10
## Iteration 234: time=0.02, Factors=10
## Iteration 235: time=0.02, Factors=10
## Iteration 236: time=0.02, Factors=10
## Iteration 237: time=0.01, Factors=10
## Iteration 238: time=0.02, Factors=10
## Iteration 239: time=0.02, Factors=10
## Iteration 240: time=0.01, Factors=10
## Iteration 241: time=0.02, Factors=10
## Iteration 242: time=0.02, Factors=10
## Iteration 243: time=0.01, Factors=10
## Iteration 244: time=0.02, Factors=10
## Iteration 245: time=0.02, Factors=10
## Iteration 246: time=0.01, Factors=10
## Iteration 247: time=0.02, Factors=10
## Iteration 248: time=0.02, Factors=10
## Iteration 249: time=0.01, Factors=10
## Iteration 250: time=0.02, Factors=10
## Iteration 251: time=0.02, Factors=10
## Iteration 252: time=0.01, Factors=10
## Iteration 253: time=0.02, Factors=10
## Iteration 254: time=0.02, Factors=10
## Iteration 255: time=0.01, Factors=10
## Iteration 256: time=0.02, Factors=10
## Iteration 257: time=0.01, Factors=10
## Iteration 258: time=0.02, Factors=10
## Iteration 259: time=0.02, Factors=10
## Iteration 260: time=0.02, Factors=10
## Iteration 261: time=0.02, Factors=10
## Iteration 262: time=0.01, Factors=10
## Iteration 263: time=0.02, Factors=10
## Iteration 264: time=0.02, Factors=10
## Iteration 265: time=0.02, Factors=10
## Iteration 266: time=0.02, Factors=10
## Iteration 267: time=0.01, Factors=10
## Iteration 268: time=0.02, Factors=10
## Iteration 269: time=0.02, Factors=10
## Iteration 270: time=0.02, Factors=10
## Iteration 271: time=0.02, Factors=10
## Iteration 272: time=0.02, Factors=10
## Iteration 273: time=0.01, Factors=10
## Iteration 274: time=0.02, Factors=10
## Iteration 275: time=0.01, Factors=10
## Iteration 276: time=0.02, Factors=10
## Iteration 277: time=0.02, Factors=10
## Iteration 278: time=0.02, Factors=10
## Iteration 279: time=0.02, Factors=10
## Iteration 280: time=0.02, Factors=10
## Iteration 281: time=0.02, Factors=10
## Iteration 282: time=0.02, Factors=10
## Iteration 283: time=0.02, Factors=10
## Iteration 284: time=0.02, Factors=10
## Iteration 285: time=0.02, Factors=10
## Iteration 286: time=0.02, Factors=10
## Iteration 287: time=0.02, Factors=10
## Iteration 288: time=0.02, Factors=10
## Iteration 289: time=0.02, Factors=10
## Iteration 290: time=0.02, Factors=10
## Iteration 291: time=0.01, Factors=10
## Iteration 292: time=0.03, Factors=10
## Iteration 293: time=0.02, Factors=10
## Iteration 294: time=0.01, Factors=10
## Iteration 295: time=0.02, Factors=10
## Iteration 296: time=0.02, Factors=10
## Iteration 297: time=0.02, Factors=10
## Iteration 298: time=0.01, Factors=10
## Iteration 299: time=0.02, Factors=10
## Iteration 300: time=0.02, Factors=10
## Iteration 301: time=0.02, Factors=10
## Iteration 302: time=0.02, Factors=10
## Iteration 303: time=0.02, Factors=10
## Iteration 304: time=0.02, Factors=10
## Iteration 305: time=0.02, Factors=10
## Iteration 306: time=0.01, Factors=10
## Iteration 307: time=0.02, Factors=10
## Iteration 308: time=0.02, Factors=10
## Iteration 309: time=0.02, Factors=10
## Iteration 310: time=0.02, Factors=10
## Iteration 311: time=0.02, Factors=10
## Iteration 312: time=0.02, Factors=10
## Iteration 313: time=0.02, Factors=10
## Iteration 314: time=0.01, Factors=10
## Iteration 315: time=0.02, Factors=10
## Iteration 316: time=0.01, Factors=10
## Iteration 317: time=0.03, Factors=10
## Iteration 318: time=0.02, Factors=10
## Iteration 319: time=0.02, Factors=10
## Iteration 320: time=0.02, Factors=10
## Iteration 321: time=0.02, Factors=10
## Iteration 322: time=0.01, Factors=10
## Iteration 323: time=0.02, Factors=10
## Iteration 324: time=0.02, Factors=10
## Iteration 325: time=0.02, Factors=10
## Iteration 326: time=0.05, Factors=10
## Iteration 327: time=0.02, Factors=10
## Iteration 328: time=0.02, Factors=10
## Iteration 329: time=0.02, Factors=10
## Iteration 330: time=0.02, Factors=10
## Iteration 331: time=0.03, Factors=10
## Iteration 332: time=0.01, Factors=10
## Iteration 333: time=0.01, Factors=10
## Iteration 334: time=0.03, Factors=10
## Iteration 335: time=0.02, Factors=10
## Iteration 336: time=0.02, Factors=10
## Iteration 337: time=0.02, Factors=10
## Iteration 338: time=0.02, Factors=10
## Iteration 339: time=0.02, Factors=10
## Iteration 340: time=0.02, Factors=10
## Iteration 341: time=0.02, Factors=10
## Iteration 342: time=0.01, Factors=10
## Iteration 343: time=0.02, Factors=10
## Iteration 344: time=0.02, Factors=10
## Iteration 345: time=0.01, Factors=10
## Iteration 346: time=0.02, Factors=10
## Iteration 347: time=0.03, Factors=10
## Iteration 348: time=0.02, Factors=10
## Iteration 349: time=0.02, Factors=10
## Iteration 350: time=0.02, Factors=10
## Iteration 351: time=0.02, Factors=10
## Iteration 352: time=0.02, Factors=10
## Iteration 353: time=0.01, Factors=10
## Iteration 354: time=0.02, Factors=10
## Iteration 355: time=0.03, Factors=10
## Iteration 356: time=0.05, Factors=10
## Iteration 357: time=0.02, Factors=10
## Iteration 358: time=0.02, Factors=10
## Iteration 359: time=0.02, Factors=10
## Iteration 360: time=0.02, Factors=10
## Iteration 361: time=0.02, Factors=10
## Iteration 362: time=0.02, Factors=10
## Iteration 363: time=0.02, Factors=10
## Iteration 364: time=0.02, Factors=10
## Iteration 365: time=0.02, Factors=10
## Iteration 366: time=0.02, Factors=10
## Iteration 367: time=0.02, Factors=10
## Iteration 368: time=0.02, Factors=10
## Iteration 369: time=0.02, Factors=10
## Iteration 370: time=0.02, Factors=10
## Iteration 371: time=0.04, Factors=10
## Iteration 372: time=0.04, Factors=10
## Iteration 373: time=0.02, Factors=10
## Iteration 374: time=0.02, Factors=10
## Iteration 375: time=0.02, Factors=10
## Iteration 376: time=0.01, Factors=10
## Iteration 377: time=0.02, Factors=10
## Iteration 378: time=0.03, Factors=10
## Iteration 379: time=0.02, Factors=10
## Iteration 380: time=0.02, Factors=10
## Iteration 381: time=0.01, Factors=10
## Iteration 382: time=0.02, Factors=10
## Iteration 383: time=0.03, Factors=10
## Iteration 384: time=0.02, Factors=10
## Iteration 385: time=0.02, Factors=10
## Iteration 386: time=0.02, Factors=10
## Iteration 387: time=0.01, Factors=10
## Iteration 388: time=0.02, Factors=10
## Iteration 389: time=0.02, Factors=10
## Iteration 390: time=0.03, Factors=10
## Iteration 391: time=0.02, Factors=10
## Iteration 392: time=0.02, Factors=10
## Iteration 393: time=0.02, Factors=10
## Iteration 394: time=0.02, Factors=10
## Iteration 395: time=0.01, Factors=10
## Iteration 396: time=0.02, Factors=10
## Iteration 397: time=0.03, Factors=10
## Iteration 398: time=0.02, Factors=10
## Iteration 399: time=0.01, Factors=10
## Iteration 400: time=0.02, ELBO=-453091.10, deltaELBO=2495573.385 (84.63402310%), Factors=10
## Iteration 401: time=0.03, Factors=10
## Iteration 402: time=0.02, Factors=10
## Iteration 403: time=0.00, Factors=10
## Iteration 404: time=0.03, Factors=10
## Iteration 405: time=0.02, ELBO=-453089.31, deltaELBO=1.789 (0.00006068%), Factors=10
## Iteration 406: time=0.02, Factors=10
## Iteration 407: time=0.01, Factors=10
## Iteration 408: time=0.03, Factors=10
## Iteration 409: time=0.02, Factors=10
## Iteration 410: time=0.02, ELBO=-453087.65, deltaELBO=1.662 (0.00005636%), Factors=10
## Iteration 411: time=0.01, Factors=10
## Iteration 412: time=0.02, Factors=10
## Iteration 413: time=0.04, Factors=10
## Iteration 414: time=0.02, Factors=10
## Iteration 415: time=0.02, ELBO=-453086.08, deltaELBO=1.575 (0.00005341%), Factors=10
## Iteration 416: time=0.02, Factors=10
## Iteration 417: time=0.02, Factors=10
## Iteration 418: time=0.03, Factors=10
## Iteration 419: time=0.02, Factors=10
## Iteration 420: time=0.02, ELBO=-453084.49, deltaELBO=1.591 (0.00005395%), Factors=10
## Iteration 421: time=0.01, Factors=10
## Iteration 422: time=0.03, Factors=10
## Iteration 423: time=0.02, Factors=10
## Iteration 424: time=0.02, Factors=10
## Iteration 425: time=0.02, ELBO=-453083.04, deltaELBO=1.449 (0.00004914%), Factors=10
## Iteration 426: time=0.01, Factors=10
## Iteration 427: time=0.03, Factors=10
## Iteration 428: time=0.02, Factors=10
## Iteration 429: time=0.00, Factors=10
## Iteration 430: time=0.04, ELBO=-453081.61, deltaELBO=1.431 (0.00004853%), Factors=10
## Iteration 431: time=0.02, Factors=10
## Iteration 432: time=0.01, Factors=10
## Iteration 433: time=0.02, Factors=10
## Iteration 434: time=0.03, Factors=10
## Iteration 435: time=0.02, ELBO=-453080.33, deltaELBO=1.274 (0.00004320%), Factors=10
## Iteration 436: time=0.02, Factors=10
## Iteration 437: time=0.01, Factors=10
## Iteration 438: time=0.03, Factors=10
## Iteration 439: time=0.02, Factors=10
## Iteration 440: time=0.02, ELBO=-453079.01, deltaELBO=1.324 (0.00004489%), Factors=10
## Iteration 441: time=0.01, Factors=10
## Iteration 442: time=0.03, Factors=10
## Iteration 443: time=0.02, Factors=10
## Iteration 444: time=0.01, Factors=10
## Iteration 445: time=0.03, ELBO=-453077.83, deltaELBO=1.182 (0.00004010%), Factors=10
## Iteration 446: time=0.02, Factors=10
## Iteration 447: time=0.02, Factors=10
## Iteration 448: time=0.02, Factors=10
## Iteration 449: time=0.02, Factors=10
## Iteration 450: time=0.02, ELBO=-453076.63, deltaELBO=1.199 (0.00004066%), Factors=10
## Iteration 451: time=0.02, Factors=10
## Iteration 452: time=0.02, Factors=10
## Iteration 453: time=0.02, Factors=10
## Iteration 454: time=0.02, Factors=10
## Iteration 455: time=0.01, ELBO=-453075.51, deltaELBO=1.121 (0.00003801%), Factors=10
## Iteration 456: time=0.03, Factors=10
## Iteration 457: time=0.03, Factors=10
## Iteration 458: time=0.02, Factors=10
## Iteration 459: time=0.02, Factors=10
## Iteration 460: time=0.02, ELBO=-453074.42, deltaELBO=1.092 (0.00003703%), Factors=10
## Iteration 461: time=0.03, Factors=10
## Iteration 462: time=0.02, Factors=10
## Iteration 463: time=0.03, Factors=10
## Iteration 464: time=0.02, Factors=10
## Iteration 465: time=0.02, ELBO=-453073.34, deltaELBO=1.072 (0.00003635%), Factors=10
## Iteration 466: time=0.03, Factors=10
## Iteration 467: time=0.02, Factors=10
## Iteration 468: time=0.02, Factors=10
## Iteration 469: time=0.02, Factors=10
## Iteration 470: time=0.03, ELBO=-453072.27, deltaELBO=1.074 (0.00003641%), Factors=10
## Iteration 471: time=0.02, Factors=10
## Iteration 472: time=0.02, Factors=10
## Iteration 473: time=0.02, Factors=10
## Iteration 474: time=0.02, Factors=10
## Iteration 475: time=0.01, ELBO=-453071.30, deltaELBO=0.969 (0.00003287%), Factors=10
## Iteration 476: time=0.02, Factors=10
## Iteration 477: time=0.02, Factors=10
## Iteration 478: time=0.02, Factors=10
## Iteration 479: time=0.01, Factors=10
## Iteration 480: time=0.02, ELBO=-453070.40, deltaELBO=0.903 (0.00003062%), Factors=10
## Iteration 481: time=0.03, Factors=10
## Iteration 482: time=0.02, Factors=10
## Iteration 483: time=0.02, Factors=10
## Iteration 484: time=0.02, Factors=10
## Iteration 485: time=0.02, ELBO=-453069.41, deltaELBO=0.988 (0.00003351%), Factors=10
## Iteration 486: time=0.02, Factors=10
## Iteration 487: time=0.02, Factors=10
## Iteration 488: time=0.02, Factors=10
## Iteration 489: time=0.02, Factors=10
## Iteration 490: time=0.02, ELBO=-453068.50, deltaELBO=0.910 (0.00003085%), Factors=10
## Iteration 491: time=0.02, Factors=10
## Iteration 492: time=0.02, Factors=10
## Iteration 493: time=0.02, Factors=10
## Iteration 494: time=0.01, Factors=10
## Iteration 495: time=0.03, ELBO=-453067.65, deltaELBO=0.852 (0.00002888%), Factors=10
## Iteration 496: time=0.02, Factors=10
## Iteration 497: time=0.01, Factors=10
## Iteration 498: time=0.03, Factors=10
## Iteration 499: time=0.02, Factors=10
## Iteration 500: time=0.02, ELBO=-453066.87, deltaELBO=0.779 (0.00002642%), Factors=10
## Iteration 501: time=0.00, Factors=10
## Iteration 502: time=0.04, Factors=10
## Iteration 503: time=0.02, Factors=10
## Iteration 504: time=0.01, Factors=10
## Iteration 505: time=0.02, ELBO=-453065.96, deltaELBO=0.909 (0.00003084%), Factors=10
## Iteration 506: time=0.03, Factors=10
## Iteration 507: time=0.02, Factors=10
## Iteration 508: time=0.01, Factors=10
## Iteration 509: time=0.01, Factors=10
## Iteration 510: time=0.02, ELBO=-453065.22, deltaELBO=0.743 (0.00002519%), Factors=10
## Iteration 511: time=0.03, Factors=10
## Iteration 512: time=0.02, Factors=10
## Iteration 513: time=0.02, Factors=10
## Iteration 514: time=0.02, Factors=10
## Iteration 515: time=0.01, ELBO=-453064.38, deltaELBO=0.843 (0.00002858%), Factors=10
## Iteration 516: time=0.02, Factors=10
## Iteration 517: time=0.03, Factors=10
## Iteration 518: time=0.02, Factors=10
## Iteration 519: time=0.02, Factors=10
## Iteration 520: time=0.02, ELBO=-453063.61, deltaELBO=0.761 (0.00002582%), Factors=10
## Iteration 521: time=0.03, Factors=10
## Iteration 522: time=0.02, Factors=10
## Iteration 523: time=0.02, Factors=10
## Iteration 524: time=0.02, Factors=10
## Iteration 525: time=0.03, ELBO=-453062.91, deltaELBO=0.705 (0.00002391%), Factors=10
## Iteration 526: time=0.02, Factors=10
## Iteration 527: time=0.02, Factors=10
## Iteration 528: time=0.02, Factors=10
## Iteration 529: time=0.02, Factors=10
## Iteration 530: time=0.04, ELBO=-453062.16, deltaELBO=0.744 (0.00002523%), Factors=10
## Iteration 531: time=0.02, Factors=10
## Iteration 532: time=0.02, Factors=10
## Iteration 533: time=0.02, Factors=10
## Iteration 534: time=0.02, Factors=10
## Iteration 535: time=0.03, ELBO=-453061.49, deltaELBO=0.679 (0.00002304%), Factors=10
## Iteration 536: time=0.02, Factors=10
## Iteration 537: time=0.02, Factors=10
## Iteration 538: time=0.02, Factors=10
## Iteration 539: time=0.02, Factors=10
## Iteration 540: time=0.03, ELBO=-453060.82, deltaELBO=0.665 (0.00002255%), Factors=10
## Iteration 541: time=0.02, Factors=10
## Iteration 542: time=0.01, Factors=10
## Iteration 543: time=0.02, Factors=10
## Iteration 544: time=0.03, Factors=10
## Iteration 545: time=0.02, ELBO=-453060.12, deltaELBO=0.702 (0.00002381%), Factors=10
## Iteration 546: time=0.02, Factors=10
## Iteration 547: time=0.03, Factors=10
## Iteration 548: time=0.02, Factors=10
## Iteration 549: time=0.02, Factors=10
## Iteration 550: time=0.02, ELBO=-453059.50, deltaELBO=0.614 (0.00002083%), Factors=10
## Iteration 551: time=0.03, Factors=10
## Iteration 552: time=0.02, Factors=10
## Iteration 553: time=0.00, Factors=10
## Iteration 554: time=0.02, Factors=10
## Iteration 555: time=0.02, ELBO=-453058.88, deltaELBO=0.622 (0.00002111%), Factors=10
## Iteration 556: time=0.02, Factors=10
## Iteration 557: time=0.01, Factors=10
## Iteration 558: time=0.02, Factors=10
## Iteration 559: time=0.03, Factors=10
## Iteration 560: time=0.02, ELBO=-453058.21, deltaELBO=0.670 (0.00002271%), Factors=10
## Iteration 561: time=0.02, Factors=10
## Iteration 562: time=0.02, Factors=10
## Iteration 563: time=0.02, Factors=10
## Iteration 564: time=0.02, Factors=10
## Iteration 565: time=0.01, ELBO=-453057.59, deltaELBO=0.622 (0.00002108%), Factors=10
## Iteration 566: time=0.03, Factors=10
## Iteration 567: time=0.02, Factors=10
## Iteration 568: time=0.02, Factors=10
## Iteration 569: time=0.02, Factors=10
## Iteration 570: time=0.02, ELBO=-453057.00, deltaELBO=0.589 (0.00001998%), Factors=10
## Iteration 571: time=0.03, Factors=10
## Iteration 572: time=0.02, Factors=10
## Iteration 573: time=0.00, Factors=10
## Iteration 574: time=0.04, Factors=10
## Iteration 575: time=0.03, ELBO=-453056.45, deltaELBO=0.549 (0.00001861%), Factors=10
## Iteration 576: time=0.02, Factors=10
## Iteration 577: time=0.02, Factors=10
## Iteration 578: time=0.02, Factors=10
## Iteration 579: time=0.02, Factors=10
## Iteration 580: time=0.01, ELBO=-453055.84, deltaELBO=0.608 (0.00002063%), Factors=10
## Iteration 581: time=0.02, Factors=10
## Iteration 582: time=0.03, Factors=10
## Iteration 583: time=0.02, Factors=10
## Iteration 584: time=0.00, Factors=10
## Iteration 585: time=0.03, ELBO=-453055.29, deltaELBO=0.555 (0.00001882%), Factors=10
## Iteration 586: time=0.02, Factors=10
## Iteration 587: time=0.02, Factors=10
## Iteration 588: time=0.02, Factors=10
## Iteration 589: time=0.01, Factors=10
## Iteration 590: time=0.02, ELBO=-453054.73, deltaELBO=0.560 (0.00001900%), Factors=10
## Iteration 591: time=0.03, Factors=10
## Iteration 592: time=0.02, Factors=10
## Iteration 593: time=0.00, Factors=10
## Iteration 594: time=0.03, Factors=10
## Iteration 595: time=0.02, ELBO=-453054.22, deltaELBO=0.508 (0.00001723%), Factors=10
## Iteration 596: time=0.01, Factors=10
## Iteration 597: time=0.02, Factors=10
## Iteration 598: time=0.02, Factors=10
## Iteration 599: time=0.02, Factors=10
## Iteration 600: time=0.01, ELBO=-453053.72, deltaELBO=0.502 (0.00001701%), Factors=10
## Iteration 601: time=0.04, Factors=10
## Iteration 602: time=0.03, Factors=10
## Iteration 603: time=0.02, Factors=10
## Iteration 604: time=0.01, Factors=10
## Iteration 605: time=0.03, ELBO=-453053.18, deltaELBO=0.544 (0.00001845%), Factors=10
## Iteration 606: time=0.02, Factors=10
## Iteration 607: time=0.02, Factors=10
## Iteration 608: time=0.02, Factors=10
## Iteration 609: time=0.02, Factors=10
## Iteration 610: time=0.02, ELBO=-453052.64, deltaELBO=0.533 (0.00001808%), Factors=10
## Iteration 611: time=0.02, Factors=10
## Iteration 612: time=0.02, Factors=10
## Iteration 613: time=0.02, Factors=10
## Iteration 614: time=0.02, Factors=10
## Iteration 615: time=0.02, ELBO=-453052.14, deltaELBO=0.507 (0.00001720%), Factors=10
## Iteration 616: time=0.02, Factors=10
## Iteration 617: time=0.02, Factors=10
## Iteration 618: time=0.01, Factors=10
## Iteration 619: time=0.02, Factors=10
## Iteration 620: time=0.03, ELBO=-453051.63, deltaELBO=0.507 (0.00001719%), Factors=10
## Iteration 621: time=0.02, Factors=10
## Iteration 622: time=0.02, Factors=10
## Iteration 623: time=0.02, Factors=10
## Iteration 624: time=0.02, Factors=10
## Iteration 625: time=0.02, ELBO=-453051.18, deltaELBO=0.451 (0.00001529%), Factors=10
## Iteration 626: time=0.02, Factors=10
## Iteration 627: time=0.02, Factors=10
## Iteration 628: time=0.02, Factors=10
## Iteration 629: time=0.02, Factors=10
## Iteration 630: time=0.02, ELBO=-453050.73, deltaELBO=0.451 (0.00001530%), Factors=10
## Iteration 631: time=0.02, Factors=10
## Iteration 632: time=0.02, Factors=10
## Iteration 633: time=0.02, Factors=10
## Iteration 634: time=0.03, Factors=10
## Iteration 635: time=0.03, ELBO=-453050.26, deltaELBO=0.469 (0.00001590%), Factors=10
## Iteration 636: time=0.02, Factors=10
## Iteration 637: time=0.02, Factors=10
## Iteration 638: time=0.01, Factors=10
## Iteration 639: time=0.02, Factors=10
## Iteration 640: time=0.02, ELBO=-453049.78, deltaELBO=0.481 (0.00001633%), Factors=10
## Iteration 641: time=0.02, Factors=10
## Iteration 642: time=0.02, Factors=10
## Iteration 643: time=0.02, Factors=10
## Iteration 644: time=0.00, Factors=10
## Iteration 645: time=0.03, ELBO=-453049.32, deltaELBO=0.459 (0.00001556%), Factors=10
## Iteration 646: time=0.02, Factors=10
## Iteration 647: time=0.02, Factors=10
## Iteration 648: time=0.01, Factors=10
## Iteration 649: time=0.02, Factors=10
## Iteration 650: time=0.02, ELBO=-453048.91, deltaELBO=0.403 (0.00001367%), Factors=10
## Iteration 651: time=0.03, Factors=10
## Iteration 652: time=0.02, Factors=10
## Iteration 653: time=0.01, Factors=10
## Iteration 654: time=0.03, Factors=10
## Iteration 655: time=0.02, ELBO=-453048.47, deltaELBO=0.445 (0.00001510%), Factors=10
## Iteration 656: time=0.01, Factors=10
## Iteration 657: time=0.02, Factors=10
## Iteration 658: time=0.02, Factors=10
## Iteration 659: time=0.02, Factors=10
## Iteration 660: time=0.02, ELBO=-453048.05, deltaELBO=0.415 (0.00001409%), Factors=10
## Iteration 661: time=0.02, Factors=10
## Iteration 662: time=0.02, Factors=10
## Iteration 663: time=0.02, Factors=10
## Iteration 664: time=0.01, Factors=10
## Iteration 665: time=0.03, ELBO=-453047.64, deltaELBO=0.409 (0.00001388%), Factors=10
## Iteration 666: time=0.02, Factors=10
## Iteration 667: time=0.01, Factors=10
## Iteration 668: time=0.02, Factors=10
## Iteration 669: time=0.02, Factors=10
## Iteration 670: time=0.03, ELBO=-453047.19, deltaELBO=0.458 (0.00001554%), Factors=10
## Iteration 671: time=0.02, Factors=10
## Iteration 672: time=0.02, Factors=10
## Iteration 673: time=0.02, Factors=10
## Iteration 674: time=0.02, Factors=10
## Iteration 675: time=0.02, ELBO=-453046.80, deltaELBO=0.387 (0.00001312%), Factors=10
## Iteration 676: time=0.02, Factors=10
## Iteration 677: time=0.03, Factors=10
## Iteration 678: time=0.02, Factors=10
## Iteration 679: time=0.00, Factors=10
## Iteration 680: time=0.03, ELBO=-453046.40, deltaELBO=0.403 (0.00001366%), Factors=10
## Iteration 681: time=0.02, Factors=10
## Iteration 682: time=0.02, Factors=10
## Iteration 683: time=0.01, Factors=10
## Iteration 684: time=0.02, Factors=10
## Iteration 685: time=0.02, ELBO=-453046.07, deltaELBO=0.325 (0.00001101%), Factors=10
## Iteration 686: time=0.02, Factors=10
## Iteration 687: time=0.00, Factors=10
## Iteration 688: time=0.02, Factors=10
## Iteration 689: time=0.03, Factors=10
## Iteration 690: time=0.02, ELBO=-453045.73, deltaELBO=0.340 (0.00001153%), Factors=10
## Iteration 691: time=0.02, Factors=10
## Iteration 692: time=0.04, Factors=10
## Iteration 693: time=0.02, Factors=10
## Iteration 694: time=0.02, Factors=10
## Iteration 695: time=0.02, ELBO=-453045.36, deltaELBO=0.367 (0.00001244%), Factors=10
## Iteration 696: time=0.02, Factors=10
## Iteration 697: time=0.03, Factors=10
## Iteration 698: time=0.01, Factors=10
## Iteration 699: time=0.02, Factors=10
## Iteration 700: time=0.03, ELBO=-453044.97, deltaELBO=0.396 (0.00001344%), Factors=10
## Iteration 701: time=0.02, Factors=10
## Iteration 702: time=0.02, Factors=10
## Iteration 703: time=0.02, Factors=10
## Iteration 704: time=0.02, Factors=10
## Iteration 705: time=0.02, ELBO=-453044.63, deltaELBO=0.341 (0.00001155%), Factors=10
## Iteration 706: time=0.01, Factors=10
## Iteration 707: time=0.02, Factors=10
## Iteration 708: time=0.02, Factors=10
## Iteration 709: time=0.02, Factors=10
## Iteration 710: time=0.02, ELBO=-453044.32, deltaELBO=0.311 (0.00001056%), Factors=10
## Iteration 711: time=0.02, Factors=10
## Iteration 712: time=0.02, Factors=10
## Iteration 713: time=0.01, Factors=10
## Iteration 714: time=0.02, Factors=10
## Iteration 715: time=0.03, ELBO=-453043.97, deltaELBO=0.347 (0.00001176%), Factors=10
## Iteration 716: time=0.02, Factors=10
## Iteration 717: time=0.02, Factors=10
## Iteration 718: time=0.02, Factors=10
## Iteration 719: time=0.02, Factors=10
## Iteration 720: time=0.02, ELBO=-453043.68, deltaELBO=0.287 (0.00000974%), Factors=10
## Iteration 721: time=0.01, Factors=10
## Iteration 722: time=0.02, Factors=10
## Iteration 723: time=0.02, Factors=10
## Iteration 724: time=0.02, Factors=10
## Iteration 725: time=0.02, ELBO=-453043.34, deltaELBO=0.338 (0.00001146%), Factors=10
## Iteration 726: time=0.01, Factors=10
## Iteration 727: time=0.02, Factors=10
## Iteration 728: time=0.02, Factors=10
## Iteration 729: time=0.02, Factors=10
## Iteration 730: time=0.02, ELBO=-453043.02, deltaELBO=0.330 (0.00001118%), Factors=10
## Iteration 731: time=0.02, Factors=10
## Iteration 732: time=0.02, Factors=10
## Iteration 733: time=0.01, Factors=10
## Iteration 734: time=0.02, Factors=10
## Iteration 735: time=0.02, ELBO=-453042.76, deltaELBO=0.250 (0.00000849%), Factors=10
## Iteration 736: time=0.02, Factors=10
## Iteration 737: time=0.02, Factors=10
## Iteration 738: time=0.02, Factors=10
## Iteration 739: time=0.02, Factors=10
## Iteration 740: time=0.02, ELBO=-453042.42, deltaELBO=0.346 (0.00001175%), Factors=10
## Iteration 741: time=0.01, Factors=10
## Iteration 742: time=0.02, Factors=10
## Iteration 743: time=0.02, Factors=10
## Iteration 744: time=0.02, Factors=10
## Iteration 745: time=0.01, ELBO=-453042.16, deltaELBO=0.262 (0.00000888%), Factors=10
## Iteration 746: time=0.02, Factors=10
## Iteration 747: time=0.02, Factors=10
## Iteration 748: time=0.02, Factors=10
## Iteration 749: time=0.01, Factors=10
## Iteration 750: time=0.02, ELBO=-453041.82, deltaELBO=0.341 (0.00001155%), Factors=10
## Iteration 751: time=0.02, Factors=10
## Iteration 752: time=0.02, Factors=10
## Iteration 753: time=0.02, Factors=10
## Iteration 754: time=0.02, Factors=10
## Iteration 755: time=0.02, ELBO=-453041.55, deltaELBO=0.263 (0.00000893%), Factors=10
## Iteration 756: time=0.02, Factors=10
## Iteration 757: time=0.02, Factors=10
## Iteration 758: time=0.02, Factors=10
## Iteration 759: time=0.02, Factors=10
## Iteration 760: time=0.02, ELBO=-453041.26, deltaELBO=0.290 (0.00000982%), Factors=10
## Iteration 761: time=0.02, Factors=10
## Iteration 762: time=0.02, Factors=10
## Iteration 763: time=0.02, Factors=10
## Iteration 764: time=0.02, Factors=10
## Iteration 765: time=0.02, ELBO=-453040.98, deltaELBO=0.288 (0.00000976%), Factors=10
## Iteration 766: time=0.02, Factors=10
## Iteration 767: time=0.02, Factors=10
## Iteration 768: time=0.02, Factors=10
## Iteration 769: time=0.02, Factors=10
## Iteration 770: time=0.02, ELBO=-453040.71, deltaELBO=0.263 (0.00000892%), Factors=10
## Iteration 771: time=0.02, Factors=10
## Iteration 772: time=0.02, Factors=10
## Iteration 773: time=0.02, Factors=10
## Iteration 774: time=0.02, Factors=10
## Iteration 775: time=0.02, ELBO=-453040.51, deltaELBO=0.207 (0.00000700%), Factors=10
## Iteration 776: time=0.02, Factors=10
## Iteration 777: time=0.02, Factors=10
## Iteration 778: time=0.02, Factors=10
## Iteration 779: time=0.01, Factors=10
## Iteration 780: time=0.03, ELBO=-453040.21, deltaELBO=0.298 (0.00001010%), Factors=10
## Iteration 781: time=0.02, Factors=10
## Iteration 782: time=0.02, Factors=10
## Iteration 783: time=0.02, Factors=10
## Iteration 784: time=0.01, Factors=10
## Iteration 785: time=0.02, ELBO=-453039.95, deltaELBO=0.255 (0.00000865%), Factors=10
## Iteration 786: time=0.02, Factors=10
## Iteration 787: time=0.01, Factors=10
## Iteration 788: time=0.02, Factors=10
## Iteration 789: time=0.02, Factors=10
## Iteration 790: time=0.03, ELBO=-453039.70, deltaELBO=0.255 (0.00000865%), Factors=10
## Iteration 791: time=0.01, Factors=10
## Iteration 792: time=0.02, Factors=10
## Iteration 793: time=0.02, Factors=10
## Iteration 794: time=0.02, Factors=10
## Iteration 795: time=0.02, ELBO=-453039.44, deltaELBO=0.259 (0.00000880%), Factors=10
## Iteration 796: time=0.02, Factors=10
## Iteration 797: time=0.02, Factors=10
## Iteration 798: time=0.01, Factors=10
## Iteration 799: time=0.02, Factors=10
## Iteration 800: time=0.03, ELBO=-453039.23, deltaELBO=0.210 (0.00000713%), Factors=10
## Iteration 801: time=0.03, Factors=10
## Iteration 802: time=0.02, Factors=10
## Iteration 803: time=0.02, Factors=10
## Iteration 804: time=0.02, Factors=10
## Iteration 805: time=0.02, ELBO=-453039.00, deltaELBO=0.230 (0.00000781%), Factors=10
## Iteration 806: time=0.03, Factors=10
## Iteration 807: time=0.02, Factors=10
## Iteration 808: time=0.01, Factors=10
## Iteration 809: time=0.02, Factors=10
## Iteration 810: time=0.02, ELBO=-453038.72, deltaELBO=0.282 (0.00000956%), Factors=10
## Iteration 811: time=0.02, Factors=10
## Iteration 812: time=0.02, Factors=10
## Iteration 813: time=0.01, Factors=10
## Iteration 814: time=0.02, Factors=10
## Iteration 815: time=0.02, ELBO=-453038.47, deltaELBO=0.242 (0.00000822%), Factors=10
## Iteration 816: time=0.02, Factors=10
## Iteration 817: time=0.02, Factors=10
## Iteration 818: time=0.01, Factors=10
## Iteration 819: time=0.02, Factors=10
## Iteration 820: time=0.02, ELBO=-453038.25, deltaELBO=0.226 (0.00000765%), Factors=10
## Iteration 821: time=0.02, Factors=10
## Iteration 822: time=0.01, Factors=10
## Iteration 823: time=0.03, Factors=10
## Iteration 824: time=0.02, Factors=10
## Iteration 825: time=0.02, ELBO=-453038.02, deltaELBO=0.228 (0.00000772%), Factors=10
## Iteration 826: time=0.02, Factors=10
## Iteration 827: time=0.02, Factors=10
## Iteration 828: time=0.02, Factors=10
## Iteration 829: time=0.02, Factors=10
## Iteration 830: time=0.02, ELBO=-453037.81, deltaELBO=0.214 (0.00000725%), Factors=10
## Iteration 831: time=0.02, Factors=10
## Iteration 832: time=0.01, Factors=10
## Iteration 833: time=0.02, Factors=10
## Iteration 834: time=0.02, Factors=10
## Iteration 835: time=0.03, ELBO=-453037.58, deltaELBO=0.232 (0.00000785%), Factors=10
## Iteration 836: time=0.02, Factors=10
## Iteration 837: time=0.01, Factors=10
## Iteration 838: time=0.03, Factors=10
## Iteration 839: time=0.02, Factors=10
## Iteration 840: time=0.02, ELBO=-453037.42, deltaELBO=0.157 (0.00000531%), Factors=10
## Iteration 841: time=0.01, Factors=10
## Iteration 842: time=0.02, Factors=10
## Iteration 843: time=0.02, Factors=10
## Iteration 844: time=0.03, Factors=10
## Iteration 845: time=0.02, ELBO=-453037.20, deltaELBO=0.215 (0.00000728%), Factors=10
## Iteration 846: time=0.02, Factors=10
## Iteration 847: time=0.02, Factors=10
## Iteration 848: time=0.02, Factors=10
## Iteration 849: time=0.01, Factors=10
## Iteration 850: time=0.02, ELBO=-453036.97, deltaELBO=0.236 (0.00000799%), Factors=10
## Iteration 851: time=0.03, Factors=10
## Iteration 852: time=0.02, Factors=10
## Iteration 853: time=0.02, Factors=10
## Iteration 854: time=0.02, Factors=10
## Iteration 855: time=0.02, ELBO=-453036.80, deltaELBO=0.165 (0.00000558%), Factors=10
## Iteration 856: time=0.03, Factors=10
## Iteration 857: time=0.03, Factors=10
## Iteration 858: time=0.03, Factors=10
## Iteration 859: time=0.01, Factors=10
## Iteration 860: time=0.02, ELBO=-453036.59, deltaELBO=0.215 (0.00000729%), Factors=10
## Iteration 861: time=0.03, Factors=10
## Iteration 862: time=0.02, Factors=10
## Iteration 863: time=0.01, Factors=10
## Iteration 864: time=0.02, Factors=10
## Iteration 865: time=0.02, ELBO=-453036.41, deltaELBO=0.181 (0.00000613%), Factors=10
## Iteration 866: time=0.02, Factors=10
## Iteration 867: time=0.02, Factors=10
## Iteration 868: time=0.02, Factors=10
## Iteration 869: time=0.02, Factors=10
## Iteration 870: time=0.02, ELBO=-453036.25, deltaELBO=0.159 (0.00000540%), Factors=10
## Iteration 871: time=0.02, Factors=10
## Iteration 872: time=0.02, Factors=10
## Iteration 873: time=0.02, Factors=10
## Iteration 874: time=0.02, Factors=10
## Iteration 875: time=0.02, ELBO=-453036.01, deltaELBO=0.234 (0.00000792%), Factors=10
## Iteration 876: time=0.02, Factors=10
## Iteration 877: time=0.02, Factors=10
## Iteration 878: time=0.02, Factors=10
## Iteration 879: time=0.02, Factors=10
## Iteration 880: time=0.02, ELBO=-453035.90, deltaELBO=0.119 (0.00000403%), Factors=10
## Iteration 881: time=0.02, Factors=10
## Iteration 882: time=0.02, Factors=10
## Iteration 883: time=0.01, Factors=10
## Iteration 884: time=0.02, Factors=10
## Iteration 885: time=0.03, ELBO=-453035.69, deltaELBO=0.204 (0.00000693%), Factors=10
## Iteration 886: time=0.01, Factors=10
## Iteration 887: time=0.02, Factors=10
## Iteration 888: time=0.02, Factors=10
## Iteration 889: time=0.02, Factors=10
## Iteration 890: time=0.02, ELBO=-453035.52, deltaELBO=0.170 (0.00000577%), Factors=10
## Iteration 891: time=0.02, Factors=10
## Iteration 892: time=0.02, Factors=10
## Iteration 893: time=0.02, Factors=10
## Iteration 894: time=0.02, Factors=10
## Iteration 895: time=0.02, ELBO=-453035.37, deltaELBO=0.149 (0.00000505%), Factors=10
## Iteration 896: time=0.02, Factors=10
## Iteration 897: time=0.02, Factors=10
## Iteration 898: time=0.02, Factors=10
## Iteration 899: time=0.02, Factors=10
## Iteration 900: time=0.02, ELBO=-453035.15, deltaELBO=0.219 (0.00000742%), Factors=10
## Iteration 901: time=0.02, Factors=10
## Iteration 902: time=0.02, Factors=10
## Iteration 903: time=0.01, Factors=10
## Iteration 904: time=0.02, Factors=10
## Iteration 905: time=0.02, ELBO=-453035.00, deltaELBO=0.154 (0.00000521%), Factors=10
## Iteration 906: time=0.02, Factors=10
## Iteration 907: time=0.02, Factors=10
## Iteration 908: time=0.02, Factors=10
## Iteration 909: time=0.02, Factors=10
## Iteration 910: time=0.02, ELBO=-453034.84, deltaELBO=0.164 (0.00000557%), Factors=10
## Iteration 911: time=0.02, Factors=10
## Iteration 912: time=0.02, Factors=10
## Iteration 913: time=0.01, Factors=10
## Iteration 914: time=0.02, Factors=10
## Iteration 915: time=0.02, ELBO=-453034.62, deltaELBO=0.217 (0.00000735%), Factors=10
## Iteration 916: time=0.02, Factors=10
## Iteration 917: time=0.02, Factors=10
## Iteration 918: time=0.02, Factors=10
## Iteration 919: time=0.01, Factors=10
## Iteration 920: time=0.03, ELBO=-453034.54, deltaELBO=0.081 (0.00000275%), Factors=10
## Iteration 921: time=0.02, Factors=10
## Iteration 922: time=0.02, Factors=10
## Iteration 923: time=0.02, Factors=10
## Iteration 924: time=0.02, Factors=10
## Iteration 925: time=0.02, ELBO=-453034.37, deltaELBO=0.165 (0.00000559%), Factors=10
## Iteration 926: time=0.02, Factors=10
## Iteration 927: time=0.02, Factors=10
## Iteration 928: time=0.03, Factors=10
## Iteration 929: time=0.02, Factors=10
## Iteration 930: time=0.02, ELBO=-453034.21, deltaELBO=0.160 (0.00000544%), Factors=10
## Iteration 931: time=0.02, Factors=10
## Iteration 932: time=0.02, Factors=10
## Iteration 933: time=0.02, Factors=10
## Iteration 934: time=0.02, Factors=10
## Iteration 935: time=0.02, ELBO=-453034.06, deltaELBO=0.148 (0.00000503%), Factors=10
## Iteration 936: time=0.02, Factors=10
## Iteration 937: time=0.00, Factors=10
## Iteration 938: time=0.03, Factors=10
## Iteration 939: time=0.02, Factors=10
## Iteration 940: time=0.02, ELBO=-453033.92, deltaELBO=0.147 (0.00000500%), Factors=10
## Iteration 941: time=0.02, Factors=10
## Iteration 942: time=0.03, Factors=10
## Iteration 943: time=0.02, Factors=10
## Iteration 944: time=0.01, Factors=10
## Iteration 945: time=0.02, ELBO=-453033.80, deltaELBO=0.115 (0.00000391%), Factors=10
## Iteration 946: time=0.02, Factors=10
## Iteration 947: time=0.02, Factors=10
## Iteration 948: time=0.00, Factors=10
## Iteration 949: time=0.02, Factors=10
## Iteration 950: time=0.03, ELBO=-453033.64, deltaELBO=0.161 (0.00000546%), Factors=10
## Iteration 951: time=0.02, Factors=10
## Iteration 952: time=0.02, Factors=10
## Iteration 953: time=0.01, Factors=10
## Iteration 954: time=0.02, Factors=10
## Iteration 955: time=0.03, ELBO=-453033.51, deltaELBO=0.126 (0.00000427%), Factors=10
## Iteration 956: time=0.02, Factors=10
## Iteration 957: time=0.02, Factors=10
## Iteration 958: time=0.02, Factors=10
## Iteration 959: time=0.02, Factors=10
## Iteration 960: time=0.02, ELBO=-453033.37, deltaELBO=0.141 (0.00000479%), Factors=10
## 
## Converged!
## 
## 
## 
## #######################
## ## Training finished ##
## #######################
## 
## 
## Warning: Output file /home/vandelj/Bureau/Bilille/Formation/ETBII/2024/Git/etbii/THEMATIC-SESSIONS/Multivariate_analysis/02_TP/data/MDD/MOFA2_TCGA.hdf5 already exists, it will be replaced
## Saving model in /home/vandelj/Bureau/Bilille/Formation/ETBII/2024/Git/etbii/THEMATIC-SESSIONS/Multivariate_analysis/02_TP/data/MDD/MOFA2_TCGA.hdf5...

NB: You can also load a pre-computed MOFA model with the load_Model function from an hdf5 file.

8.4 Results interpretation

8.4.1 Get familiar with the MOFA object

8.4.1.1 Slots of MOFA object

Resulting MOFAobject is a S4 class used to store all relevant data to analyse a MOFA model.

DIY: display available attributes using the slotNames function.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                            MOFA object attributes                          ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

slotNames(MOFAobject)

##  [1] "data"               "covariates"         "covariates_warped" 
##  [4] "intercepts"         "imputed_data"       "interpolated_Z"    
##  [7] "samples_metadata"   "features_metadata"  "expectations"      
## [10] "training_stats"     "data_options"       "model_options"     
## [13] "training_options"   "stochastic_options" "mefisto_options"   
## [16] "dimensions"         "on_disk"            "dim_red"           
## [19] "cache"              "status"

Object attributes accessible using ‘@’ :

data : input data (either a list of matrices or a MultiAssayExperiment)
intercepts : feature intercepts
imputed_data: imputed data (filled by running impute on the object, which is not necessary)
samples_ and features_metadata (basically groups)
expectations: expectations of the factors and weights ($Z and $W matrices)
training_stats: statistics about iterative training process (time, elbo..)
data_options: data processing options (given as parameters)
model_options: model options (given as parameters)
training_options: training options (given as parameters)
stochastic_options: training options (given as parameters is stochastic option is TRUE in training_options)
dimensions : dimensions for the number of views (M), groups (G), samples (N), features per view (D) and factors (K)
on_disk : logical indicating whether data is loaded from disk.
dim_red : non-linear dimensionality reduction manifolds.
cache : variance explained by factors.
status : auxiliary variable indicating whether the model has been trained.
mefisto_options, covariates, covariates_warped and interpolated_Z are optional slot to store sample covariate only for training in MEFISTO

8.4.1.2 Functions to get infos from MOFA object

Some interesting functions to access these attributes:

factors_names(): to get or set factor names
features_names(): to get or set feature names
samples_names(): to get or set sample names
views_names(): to get or set view names
get_dimensions(): to get dimensions (number of samples, features, etc.)
get_factors(): to get model factors
get_weights(): to get model weights

DIY: try some of these functions on MOFAobject.

## Function tests
###################################
views_names(MOFAobject)

## [1] "methylation" "mrna"        "mirna"

get_dimensions(MOFAobject)

## $M
## [1] 3
## 
## $G
## [1] 1
## 
## $N
## group1 
##     87 
## 
## $D
## methylation        mrna       mirna 
##        2000        2000         350 
## 
## $K
## [1] 10

8.4.1.3 Add sample metadata to the model

DIY: use the samples_metadata function to access and rewrite (or add) metadata on the MOFAobject.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                                 Metadata                                   ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

## Add sample metadata to the model
temp_metadata <- readRDS(file = paste0(data_path,"pd_mdd_bin.RDS"))
colnames(temp_metadata)[2]="sample" #one metadata column should contains IDs of samples matching with input data and labelled as "sample"
#remove metadata for samples without omic information
temp_metadata <- temp_metadata[which(temp_metadata[,2] %in% colnames(MOFAobject@data$mrna$group1)),]
#which(temp_metadata[,2]=="302-1")#two sample with the same number but different name in men+women cohort
#temp_metadata <- temp_metadata[-95,] 
samples_metadata(MOFAobject) <- temp_metadata

8.4.2 Results

To interpret these results it is important to have information about the samples Biological context (Major Depressive Disorder).

8.4.2.1 General factors plots

First, we’ll analyze all the factors at a high level to get an overview of the main trends in decomposition.

8.4.2.1.1 Factors correlation plot

DIY: use the plot_factor_cor to display correlations between factors.

## Factor correlation plot
###################################

plot_factor_cor(MOFAobject)

Normally, factors should not be too correlated. If they are, you’ve probably chosen too many factors, creating redundancy between some of them.

8.4.2.1.2 Variance decomposition

Representing percentage of variance explained by each factor in each view is a key plot of MOFA and should always be done before inspecting factor values or weights.

DIY: use the plot_variance_explained to display variance explained by each factor for each view.

## Variance explained values
###################################

plot_variance_explained(MOFAobject, max_r2=20)

#plot_variance_explained(MOFAobject, plot_total = T)[[1]]

DIY: you can get the numerical values from the cache attribute of the MOFAobject, in the variance_explained field.

## Variance explained values
###################################

MOFAobject@cache$variance_explained$r2_per_factor

## $group1
##          methylation         mrna        mirna
## Factor1  0.159168243  0.003612041 33.562338352
## Factor2  2.511799335 18.395793438  2.739840746
## Factor3  0.068938732  0.000679493 15.183448792
## Factor4  0.005549192  0.001430511 15.165054798
## Factor5  0.005173683  8.321201801  0.004100800
## Factor6  0.094151497  2.534943819  2.761000395
## Factor7  0.437366962  2.622681856  1.598966122
## Factor8  0.005549192  3.781014681  0.034618378
## Factor9  0.006508827  2.613085508  0.260007381
## Factor10 0.013369322  2.678036690  0.004154444

We can ask whether the model provides a good fit to the data. To check it we can plot the total variance explained (using all factors). The resulting values will depend on the nature of the data set, the number of samples, the number of factors, etc….

We can identify some rules about this amount:

Noisy datasets will result in a lower amounts of variance explained (<10%).
Higher the number of samples –> smaller the total variance explained.
Higher the number of factors –> higher the total variance explained.

The objective is not to reach 100%, even with a large number of factors. Explaining ~50% for a view thanks to a linear model is already good.

DIY: use the plot_variance_explained to display total variance explained for each view.

## Variance explained histogram
###################################

plot_variance_explained(MOFAobject, plot_total = T)[[2]]

The latent factors inferred by MOFA represent the underlying principal sources of variation among the samples.

Some factor can capture a source of variability that is present across all data modalities. Thus, its etiology is likely to be something very important for the studied phenomenon. Whereas some factor can capture a very strong source of variation in only a specific modality

We can also access the variance explained per sample (only available with Gaussian data).

DIY: if your data are exclusively Gaussian (regarding model_options parameters) use the calculate_variance_explained function to display variance explained per sample.

## Variance explained per sample
###################################

#calculate_variance_explained(MOFAobject)
datatable(calculate_variance_explained_per_sample(MOFAobject)$group1) #only with pure gaussian data

8.4.2.1.3 Summary plot of factors

We can look at associations (correlations) between MOFA factors and some metadata such as Gender, Age… these associations are a good starting point to interpret factors.

DIY: use the correlate_factors_with_covariates function to search such associations with available metadata.

## Factor correlation with metadata
###################################

correlate_factors_with_covariates(MOFAobject, 
  covariates = names(clinical_data), 
  plot="log_pval"
)

We can also stay at the sample level and display factor values for each sample (Z matrix) crossed with metadata.

DIY: use the plot_factor function to display factors values and use metadata to color dots.

Using metadata variable significant in the previous correlation analysis for a given factor to color dots should give you a sharp separation for the same factor on this plot.

## Plot factor values
###################################

p <- plot_factor(MOFAobject, 
  factors = c(1:5),
  color_by = "GR",
  dot_size = 3,        # change dot size
  dodge = T,           # dodge points with different colors
  legend = T,          # remove legend
#  add_violin = T,      # add violin plots (here we have too few points to make violin, let's do boxplots)
#  violin_alpha = 0.25  # transparency of violin plots
  add_boxplot = T
)

# The output of plot_factor is a ggplot2 object that we can edit
print(p)

8.4.2.1.4 Combinations of factors

We can study pairwise combination of factors values using the plot_factors (with an s) function to detect pairwise association between factors and a metadata variable (similar to PCA representation).

DIY:try plot_factors on the MOFAobject still coloring dots using metadata.

## Plot combinations of factor values
###################################

plot_factors(MOFAobject, 
  factors = 1:5,
  color_by = "GR"
)

8.4.2.2 Explore individual factors

After this general overview about all factors, you can explore in details a specific factor of interest.

8.4.2.2.1 Useful functions

To explore loadings (W matrix)

plot_top_weights(): plot top loadings for a given factor and view
plot_weights(): plot all loadings for a given factor and view
plot_weights_heatmap(): plot a heatmap of the loadings from multiple factors of a given view

To explore data

plot_data_heatmap(): heatmap of the training data using only top features of a given factor
plot_data_scatter(): scatterplot of the data using only top features of a given factor

The use of theses functions is illustrated in the next sections.

8.4.2.2.2 Table of weights/factors

The weights measure how strong each feature relates to each factor :

no association = 0
strong association = high absolute values

The sign of the weight gives the direction of the effect (eg a positive weight indicates that the feature has higher levels in the sample with positive factor values).

DIY: access the ‘W’ field of the expectation MOFAobject attribute and display weights associated to a given modality. Use the datatable function for a nicer visualization.

## Weight values
###################################

datatable(MOFAobject@expectations$W$mrna[order(MOFAobject@expectations$W$mrna[,1], decreasing = TRUE ),]) %>% formatRound(c(1:dim(MOFAobject@expectations$W$mrna)[2]),3)

## Factor to study
###################################

factorToStudy=6

In the next section we will focus on factor 6, however feel free to choose another factor.

8.4.2.2.3 Exploration of factor 6

Factor values

We can first display factor 6 values for each sample (Z matrix).

DIY: use the plot_factor function to display factor 6 values and color dots according to this value.

You can also group the sample according to qualitative metadata variable to identify biological association.

## Plot factor values
###################################

plot_factor(MOFAobject, 
  factors =  factorToStudy, 
  color_by = paste("Factor",factorToStudy,sep=""),
  group_by = "Sample_Group"
)

Weight of associated features

Then we can search for features associated to factor 6 in each modality to give a “signature” of the factor.

DIY: visualize weight distribution for factor 6 in the different modalities using plot_weights function.

## Plot weight values distribution
###################################

plot_weights(MOFAobject,
 view = "mrna",
 factor = factorToStudy,
 nfeatures = 10,     # Top number of features to highlight
 scale = T           # Scale weights from -1 to 1
)

NB: you can specify a given number of features to display in this distribution through ‘nfeatures’ option.

DIY: use the plot_top_weights function to focus on top weights (up and down) for factor 6 in each modality.

The weights measure how strong each feature relates to each factor (no association = 0 ; strong association = high absolute values). The sign of the weight gives the direction of the effect (e.g. a positive weight indicates that the feature has higher levels in the sample with positive factor values).

## Plot top weight values
###################################

plot_top_weights(MOFAobject,
 view = "mrna",
 factor = factorToStudy,
 nfeatures = 10,     # Top number of features to highlight
 scale = T           # Scale weights from -1 to 1
)

Raw values of associated features

Instead of looking at weight values associated to each feature, we can also display raw values of features associated to factor 6.

DIY: use the plot_data_heatmap function to display raw values of the top20 features associated to factor 6 in each modality. You can also cluster features and/or sample based on these values.

## Plot feature heatmap
###################################

plot_data_heatmap(MOFAobject, 
  view = "mrna",
  factor = factorToStudy,  
  features = 25,
  cluster_rows = TRUE, cluster_cols = FALSE,
  show_rownames = TRUE, show_colnames = TRUE,
  denoise=TRUE,
  scale = "row"
)

To focus on a limited number of features you can also display raw values as a scatter plot and cross information with metadata or factors.

DIY: use the plot_data_scatter function to plot raw values of top4 features having positive factor 6 values in each modality, and color samples based on metadata.

## Plot positive weight feature scatter
###################################

plot_data_scatter(MOFAobject, 
  view = "mrna",
  factor = factorToStudy,  
  features = 4,
  sign = "positive",
  color_by = "GR",
) + labs(y="mRNA expression")

DIY: do the same for top4 features having negative factor 6 values in each modality, and color samples based on metadata.

## Plot negative weight feature scatter
###################################

plot_data_scatter(MOFAobject, 
  view = "mrna",
  factor = factorToStudy,  
  features = 4,
  sign = "negative",
  color_by = "GR",
) + labs(y="mRNA expression")

8.4.3 To go futher …

Before going any further, consider Joint and Individual Variation Explained (JIVE) and Regularized Generalized Canonical Correlation Analysis (RGCCA) if you haven’t tried them yet.

8.4.3.1 Play with data pre-processing and MOFA options

You can decide to modify the pre-processing of the data and/or the model/training options to check their effect on the factors.

8.4.3.2 Ontology enrichment on features

You can run gene set enrichment analysis using group of genes having a high weight for a given factor to get indications about the biological meaning of factors.

DIY: using the run_enrichment and plot_enrichment_heatmap functions, try to identify ontology enrichment associated to each factor.

Some pre-built gene sets are implemented in MOFAdata (See https://github.com/bioFAM/MOFAdata for details), they can be used as feature.sets in the run_enrichment function.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                                 Enrichment                                 ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

data("reactomeGS")

## perform enrichment analysis
fsea.out <- run_enrichment(
  MOFAobject,
  view = "mrna",
  feature.sets = reactomeGS,
  alpha = 0.01
)
#plot_enrichment(fsea.out,1, alpha=0.01)

## plot enrichment heatmap
plot_enrichment_heatmap(fsea.out, alpha=0.01)

## plot enrichment lollipop
plot_enrichment(fsea.out, factor=factorToStudy, alpha=0.5)

## plot enrichment with features details
plot_enrichment_detailed(fsea.out, factor=factorToStudy, alpha=0.5)

8.4.3.3 Cluster samples

MOFA is an unsupervised approach, as in PCA you can try to cluster sample based on their projection in the MOFA factor space.

DIY: using the cluster_samples function identify clusters of samples based on their projection on the first factors. Then use the plot_factors function to display the result of this clustering by coloring samples according to clusters.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                                 Clustering                                 ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

## plot sample projection on factor space
plot_factors(MOFAobject,
  factors = 1:2,
  color_by = "CD4")

## cluster samples based on factor projection
clusters <- cluster_samples(MOFAobject,
  k=2,
  factors=1:2)

## plot sample projection on factor space with cluster information
plot_factors(MOFAobject,
  factors=1:2,
  color_by=clusters$cluster)

8.4.3.4 Non-linear reduction (UMAP/tSNE)

The MOFA factors are linear (as in PCA). Nevertheless, the MOFA factors can be used as input to other methods that learn compact nonlinear manifolds (as t-SNE or UMAP).

DIY: using the run_tsne or run_umap functions to run t-SNE and UMAP on the MOFA factors, then use the plot_dimred function on the resulting object to display t-SNE/UMAP representation.

##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##
##                        UMAP/tSNE reduction                                 ##
##––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––##

## Compute non-linear dimensionality reduction
set.seed(42)
model <- run_umap(MOFAobject)
#model <- run_tsne(MOFAobject,factors = 1:5)

## Plot non-linear dimensionality reduction
plot_dimred(model,
  method = "UMAP",  # method can be either "TSNE" or "UMAP"
  color_by = "CD4"
)

9 Joint and Individual Variation Explained (JIVE)

JIVE is, like RGCCA and MOFA, a JDR method, but with some particularities. The main characteristic of JIVE is that it has the ability to extract, from the data, a joint structure (called $\mathbf J$) that contains the common signal between the blocks, and individuals structures (called $\mathbf A_k$) that contain signal that specific to block $k$.

9.1 Apply the regular version of JIVE

We first apply the regular version of JIVE (within package r.jive) to the MDD data. Because of the shift in notations (rows are variables and columns are observations), we need to transpose all the blocks.

Be aware that it takes a non-negligible amount of time to run! (more than 1 hour on my computer). If you are running out of time, run the method with rankJ = 3 and a handful of components for each block (say rankA = c(5, 5, 5)). In this case, do not forget to set the parameter method to "given".

set.seed(456) # For permutation reproducibility
res.jive <- jive(data = blocks.jive)

The permutation procedure ended up estimating 5 joint components ($r_J = 3$) and $\mathbf{r}_A = [13, 8, 36]$ individual components, based on partial results of the permutation method. These are the results that will be shown in the following. If you chose different parameters, you may obtain different results (!).

9.1.1 “Variation explained”

The explained variances (by the joint and individual components) are depicted on the following graph.

plot(res.jive)

A good portion of the variability is shared between blocks mRNA and miRNA, while block DNAm stands on its own. The following table shows the distribution of the individual block ranks, with more than 30 for DNAm.

knitr::kable(summary(res.jive)$Ranks)

Source	Rank
Joint	3
mRNA	13
miRNA	8
DNAm	36

9.1.2 Heatmaps

Then we plot the joint and individual approximations of each block. The result is impressive, and shows a beautiful joint structure, while the individual structure looks more complex to read.

r.jive:::plot.jive(res.jive, "heat")

9.1.3 Joint structure

One key characteristic of JIVE is its ability to extract joint and individual characteristics from the blocks.

Here we plot the joint structure: it needs a little bit of work, do not hesitate to take a peek at the code to help you. In a nutshell, we

extracted the components and loadings with jive.predict
plotted the joint components with ggplot2

pred.results <- jive.predict(blocks.jive, res.jive) 

J <- data.frame(
  J = t(pred.results$joint.scores),
  group = clinical_data$Sample_Group,
  row.names = rownames(blocks[[1]]))

J %>%
  ggplot(aes(J.1, J.2, color = group)) + 
  geom_point() + 
  theme_bw()

The joint structure could be related to other clinical features. Go ahead and check this hypothesis with the heatmaps that were used in the previous sections.

interpret_PC(
  component_matrix = data.frame(J = t(pred.results$joint.scores)),
  clinical_df = clinical_data,
  title = "JIVE joint dimensions")

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name                 grob
## 1 1 (1-1,1-1) arrange       gtable[layout]
## 2 2 (1-1,2-2) arrange frame[GRID.frame.79]

The joint structure does not really look like it explains the difference between MDD versus controls. To explore the individual structures, we perform an MFA on the three blocks of individual structures, and show the individual maps low.

indivlist <- lapply(pred.results$indiv.scores, t)
names(indivlist) <- names(blocks)

A <- data.frame(
  indivlist,
  group = clinical_data$Sample_Group,
  row.names = rownames(blocks[[1]]))

res.mfa.indiv <- MFA(A, c(res.jive$rankA, 1), type = c(rep("s", 3), "n"), name.group = c(names(blocks), "group"), num.group.sup = 4, graph = FALSE)

labz <- sprintf(
  "Dim %i (%0.2f%%)",
  1:nrow(res.mfa.indiv$eig),
  res.mfa.indiv$eig[, 2])

dat.mfa <- data.frame(
  res.mfa.indiv$ind$coord,
  group = A$group)

dat.mfa %>%
  ggplot(aes(Dim.1, Dim.2, color = group)) + 
  geom_hline(yintercept = 0, color = "grey") +
  geom_vline(xintercept = 0, color = "grey") +
  geom_point() + 
  labs(x = labz[1], y = labz[2]) +
  stat_ellipse() +
  coord_equal() +
  theme_bw()

And we also apply the heatmaps previously used.

A <- data.frame(t(Reduce("rbind", pred.results$indiv.scores)))
namesA <- unlist(setNames(lapply(res.jive$rankA, seq), names(blocks)))
colnames(A) <- names(namesA)

interpret_PC(
  component_matrix = A,
  clinical_df = clinical_data,
  title = "JIVE individual dimensions")

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name                  grob
## 1 1 (1-1,1-1) arrange        gtable[layout]
## 2 2 (1-1,2-2) arrange frame[GRID.frame.171]

Finally, we extract the top features associated with each individual components.

load2.mRNA <- pred.results$indiv.load[[1]][, 2]
load2.miRNA <- pred.results$indiv.load[[2]][, 1]
load2.DNAm <- pred.results$indiv.load[[3]][, 1]

dat.load2 <-rbind(
  tibble(
    ID = colnames(blocks[[1]]),
    sqload = load2.mRNA,
    block = "mRNA") %>% 
    slice_max(load2.mRNA, n = 10),
  tibble(
    ID = colnames(blocks[[2]]),
    sqload = load2.miRNA,
    block = "miRNA") %>% 
    slice_max(load2.miRNA, n = 10),
  tibble(
    ID = colnames(blocks[[3]]),
    sqload = load2.DNAm,
    block = "DNAm") %>% 
    slice_max(load2.DNAm, n = 10)
  )

gload_mRNA <- dat.load2 %>% 
  filter(block == "mRNA") %>% 
  ggplot(aes(x = sqload, y = reorder(ID, sqload)))  +
  geom_point() + 
  geom_segment(aes(xend = sqload, yend = reorder(ID, sqload)), x = 0) +
  theme_bw() + 
  labs(x = "Squared loading", y = "")

gload_miRNA <- dat.load2 %>% 
  filter(block == "miRNA") %>% 
  ggplot(aes(x = sqload, y = reorder(ID, sqload)))  +
  geom_point() + 
  geom_segment(aes(xend = sqload, yend = reorder(ID, sqload)), x = 0) +
  theme_bw() + 
  labs(x = "Squared loading", y = "")

gload_DNAm <- dat.load2 %>% 
  filter(block == "DNAm") %>% 
  ggplot(aes(x = sqload, y = reorder(ID, sqload)))  +
  geom_point() + 
  geom_segment(aes(xend = sqload, yend = reorder(ID, sqload)), x = 0) +
  theme_bw() + 
  labs(x = "Squared loading", y = "")

ggpubr::ggarrange(gload_mRNA, gload_miRNA, gload_DNAm)

10 Comparaison between PCA/RGCCA/MOFA/JIVE

Let’s compare the three methods, parametered to all act as “unsupervised”, in order not to favor RGCCA and JIVE.

First, extract the components from RGCCA (pick your favorite unsupervised RGCCA model, or just re-run RGCCA with a few components). Also extract the components that you want to include in this comparison from MOFAobject and pred.results. You can compare the dimensions with a number of tools: PCA, correlograms, heatmaps, hierarchical clustering and much more! We decided for a simple PCA and a correlogram on a few components.

res.rgcca.final <- rgcca(blocks = blocks, ncomp = c(10, 10, 10))
comp.rgcca <- data.frame(res.rgcca.final$Y)
names(comp.rgcca) <- paste0("RGCCA_",names(comp.rgcca))
comp.mofa <- get_factors(MOFAobject)$group1
colnames(comp.mofa) <- paste0("MOFA_", colnames(comp.mofa))
comp.jive <- cbind(J %>% select(-group), A)
names(comp.jive) <- paste0("JIVE_", names(comp.jive))

comp.all <- cbind(comp.rgcca, comp.mofa, comp.jive)

res.pca.comparison <- PCA(comp.all, graph = FALSE)

## Correlation circle
fviz_pca_var(res.pca.comparison, repel = TRUE, select.var = list(cos2 = 0.6))

## Correlogram
corrplot::corrplot(
  cor(
    comp.rgcca,
    data.frame(comp.mofa,
               comp.jive %>% 
                 select(-contains("DNAm"))
               )
    )
  )

Then, extract the loadings (aka weights) that are used by each method to compute the (joint) first component of mRNA and use a Venn diagram to compare the first 5%.

ensemlb <- colnames(blocks$mRNA)
wrgcca_mRNA <- res.rgcca.final$a$mRNA[, 1]
wmofa_mRNA <- get_weights(MOFAobject)$mrna[, 1]
wjive_mRNA <- pred.results$joint.load[1:2000, 1] 

five_percent_list <- list(
  RGCCA = ensemlb[order(-wrgcca_mRNA**2)][1:100],
  MOFA =  ensemlb[order(-wmofa_mRNA**2)][1:100],
  JIVE =  ensemlb[order(-wjive_mRNA**2)][1:100])

ggvenn::ggvenn(five_percent_list)

Joint Dimensional Reduction methods for multi-omics data analysis

Arnaud Gloaguen - Vincent Guillemot - Jimmy Vandel

mars 26, 2024

1 Libraries and environment

1.1 Environment

1.2 Load libraries

2 Overview of the Tutorial

3 Choose your dataset and your modality

3.1 TCGA breast cancer dataset

3.2 TARA ocean dataset

3.3 CLL dataset

3.4 Tomato dataset

3.5 WallOmics dataset

3.6 ProMetIS dataset

4 Biological context (Major Depressive Disorder)

5 Input data

5.1 In summary

5.2 Load your dataset

6 Principal Component Analysis (PCA)

6.1 In summary

6.2 Run PCA on a first omic modality

6.2.1 Sample/individual Space

6.2.2 Variable Space

6.2.3 Biplot

6.2.4 Scree plot

6.2.5 Going Further

6.3 PCA for other modalities

6.3.1 Sample/individual Space

6.3.2 Variable Space

6.3.3 Biplot

6.3.4 Scree plot

6.3.5 Going Further

7 Regularized Generalized Canonical Correlation Analysis (RGCCA)

7.1 First application of RGCCA

7.2 Explore the graphical representations of RGCCA

7.2.1 Sample space

7.2.2 Variable Space

7.2.3 Biplot

7.2.4 Screeplot

7.2.5 Going Further

7.3 Playing with the parameters

7.3.1 Impact of the parameter \(\tau\)

7.3.2 Impact of the function \(g\)

7.3.3 Impact of the connection matrix \(\bf{C}\)

7.4 Using permutation to tune the parameters in an unsupervised setting

7.5 Bootstrapping: a way of assessing the robustness of the tuned model

8 Multi-Omics Factor Analysis (MOFA)

8.1 First steps

8.1.1 Vocabulary

8.1.2 Input data

8.2 Prepare MOFA

8.2.1 Create MOFA object

8.2.2 MOFA options

8.2.2.1 Data options

8.2.2.2 Model options

8.2.2.3 Training options

8.3 Run MOFA

8.3.1 Prepare the object and run MOFA

8.4 Results interpretation

8.4.1 Get familiar with the MOFA object

8.4.1.1 Slots of MOFA object

8.4.1.2 Functions to get infos from MOFA object

8.4.1.3 Add sample metadata to the model

8.4.2 Results

8.4.2.1 General factors plots

8.4.2.1.1 Factors correlation plot

8.4.2.1.2 Variance decomposition

8.4.2.1.3 Summary plot of factors

8.4.2.1.4 Combinations of factors

8.4.2.2 Explore individual factors

8.4.2.2.1 Useful functions

8.4.2.2.2 Table of weights/factors

8.4.2.2.3 Exploration of factor 6

8.4.3 To go futher …

8.4.3.1 Play with data pre-processing and MOFA options

8.4.3.2 Ontology enrichment on features

8.4.3.3 Cluster samples

8.4.3.4 Non-linear reduction (UMAP/tSNE)

9 Joint and Individual Variation Explained (JIVE)

9.1 Apply the regular version of JIVE