RGCCA and SGCCA: theory, equations, and mapping to rgcca()

Scope

This note gives a compact but reasonably detailed presentation of Regularized Generalized Canonical Correlation Analysis (RGCCA) and Sparse Generalized Canonical Correlation Analysis (SGCCA), with explicit equations and a practical mapping to the arguments of the function rgcca() from the RGCCA package.

The emphasis is on:

  1. the optimization criteria,
  2. the role of each theoretical parameter,
  3. the interpretation of the regularization parameter tau,
  4. the sparsity parameterization used in SGCCA,
  5. and the direct correspondence with rgcca() arguments.

TL;DR

  • RGCCA finds one component per block at each stage by maximizing a weighted sum of pairwise block associations under block-specific quadratic constraints.
  • SGCCA keeps the same multiblock criterion but adds an \(\ell_1\) constraint to the weights, so that only a subset of variables contributes to each component.

Most important theoretical knobs:

  1. connection controls which block relationships are targeted.
  2. scheme controls how these relationships are aggregated.
  3. tau controls the correlation–covariance compromise in RGCCA.
  4. sparsity controls variable selection in SGCCA.
  5. ncomp controls how many component stages are extracted.

Notation

Assume we observe \(L\) blocks of centered variables on the same \(n\) individuals:

\[ \mathbf{X}_1,\ldots,\mathbf{X}_L, \qquad \mathbf{X}_l \in \mathbb{R}^{n \times p_l}. \]

For block \(l\), RGCCA looks for a weight vector

\[ \mathbf{w}_l \in \mathbb{R}^{p_l} \]

and the associated block component

\[ \mathbf{y}_l = \mathbf{X}_l \mathbf{w}_l \in \mathbb{R}^{n}. \]

The relationships between blocks are encoded by a symmetric design matrix

\[ C = (c_{kl}) \in \mathbb{R}^{L \times L}, \qquad c_{kl} \ge 0, \]

where usually \(c_{kl}=1\) if blocks \(k\) and \(l\) are connected, and \(0\) otherwise.

RGCCA

General optimization problem

A modern formulation of RGCCA is:

\[ \max_{\mathbf{w}_1,\ldots,\mathbf{w}_L} \sum_{k,l=1}^L c_{kl} \, g\!\left(\frac{1}{n} \mathbf{w}_k^\top \mathbf{X}_k^\top \mathbf{X}_l \mathbf{w}_l\right) \quad \text{subject to} \quad \mathbf{w}_l^\top \mathbf{M}_l \mathbf{w}_l = 1, \; l=1,\ldots,L. \]

Here:

  • \(g\) is a convex differentiable scheme function,
  • \(\mathbf{M}_l\) is a positive definite matrix defining the constraint for block \(l\).

This formulation is convenient because many multiblock methods become special cases for suitable choices of \((g, \mathbf{M}_l, C)\).

Original regularized RGCCA formulation with \(\tau_l\)

In the original paper, the block constraint is written with a shrinkage parameter \(\tau_l \in [0,1]\):

\[ \max_{\mathbf{w}_1,\ldots,\mathbf{w}_L} \sum_{k,l=1}^L c_{kl} \, g\!\left(\operatorname{Cov}(\mathbf{X}_k \mathbf{w}_k, \mathbf{X}_l \mathbf{w}_l)\right) \]

subject to

\[ \tau_l \|\mathbf{w}_l\|_2^2 + (1-\tau_l)\operatorname{Var}(\mathbf{X}_l \mathbf{w}_l) = 1, \qquad l=1,\ldots,L. \]

Since

\[ \operatorname{Var}(\mathbf{X}_l \mathbf{w}_l) = \frac{1}{n} \mathbf{w}_l^\top \mathbf{X}_l^\top \mathbf{X}_l \mathbf{w}_l, \]

the constraint is equivalent to

\[ \mathbf{w}_l^\top \left[ \tau_l \mathbf{I}_{p_l} + (1-\tau_l)\frac{1}{n}\mathbf{X}_l^\top \mathbf{X}_l \right] \mathbf{w}_l = 1. \]

Hence the link with the previous formulation is

\[ \mathbf{M}_l = \tau_l \mathbf{I}_{p_l} + (1-\tau_l)\frac{1}{n}\mathbf{X}_l^\top \mathbf{X}_l. \]

Meaning of the design matrix \(C\)

The matrix \(C\) determines which block relationships are explicitly optimized.

  • If \(c_{kl}=0\), the pair \((k,l)\) does not contribute to the criterion.
  • If \(c_{kl}>0\), the corresponding block components are encouraged to be associated.
  • In most applications, \(c_{kl}\in\{0,1\}\).

So RGCCA is not restricted to a complete graph: one can impose a chain, a star, a supervised graph, or any other symmetric connectivity pattern.

Meaning of the scheme function \(g\)

The usual choices are:

Horst scheme

\[ g(x)=x \]

This leads to maximizing a sum of covariances.

Centroid scheme

\[ g(x)=|x| \]

This leads to maximizing a sum of absolute covariances.

Factorial scheme

\[ g(x)=x^2 \]

This leads to maximizing a sum of squared covariances.

The package default is usually the factorial scheme. The scheme controls how strongly large block-to-block associations are emphasized and whether signs matter.

Meaning of \(\tau_l\)

The parameter \(\tau_l\) is the key regularization parameter of RGCCA.

Extreme values

If \(\tau_l = 0\), the constraint becomes

\[ \operatorname{Var}(\mathbf{X}_l \mathbf{w}_l)=1, \]

so the block component is variance-normalized. In that case, covariance terms behave like correlation terms. This corresponds to the correlation-oriented side of RGCCA.

If \(\tau_l = 1\), the constraint becomes

\[ \|\mathbf{w}_l\|_2^2 = 1, \]

which is the covariance-oriented side.

Intermediate values

For \(0 < \tau_l < 1\), RGCCA interpolates continuously between these two regimes. In the paper, this is interpreted as a ridge-type compromise between correlation and covariance criteria.

A useful equivalent expression is:

\[ \widehat\Sigma_l(\tau_l) = \tau_l \mathbf{I}_{p_l} + (1-\tau_l)\mathbf{S}_{ll}, \qquad \mathbf{S}_{ll}=\frac{1}{n}\mathbf{X}_l^\top \mathbf{X}_l. \]

Thus \(\tau_l\) shrinks the empirical within-block covariance matrix toward the identity.

Practical interpretation

  • small \(\tau_l\): prioritize correlation with neighboring components,
  • large \(\tau_l\): prioritize stable, variance-explaining components within the block,
  • intermediate \(\tau_l\): compromise.

This is why the literature often reads \(\tau_l=0\) as a mode close to canonical correlation behavior, and \(\tau_l=1\) as a mode closer to covariance-based methods.

Optimal choice of \(\tau_l\)

The 2011 paper discusses the Schäfer–Strimmer shrinkage estimator. At the covariance level, one considers

\[ \widehat\Sigma_l(\tau_l)= \tau_l \mathbf{I} + (1-\tau_l)\mathbf{S}_{ll}. \]

The optimal shrinkage intensity is chosen to minimize a Frobenius-risk criterion of the form

\[ \operatorname{MSE} = \mathbb{E}\big[\|\widehat\Sigma_l(\tau_l)-\Sigma_{ll}\|_F^2\big]. \]

The paper gives the corresponding analytical estimator:

\[ \hat\tau_l^* = \frac{ \sum_{k\ne m}\operatorname{Var}(s_{l,km}) + \sum_k \operatorname{Var}(s_{l,kk}) }{ \sum_{k\ne m}s_{l,km}^2 + \sum_k (s_{l,kk}-1)^2 }. \]

The current package exposes this through tau = "optimal".

Important implementation note

By inspecting the uploaded package source, the current implementation computes tau = "optimal" block-wise at each deflation stage. Therefore:

  • for one component per block, one gets one tau per block;
  • for several components, the returned object can store a matrix of tau values, one row per component stage and one column per block.

That behavior is consistent with the package documentation, which allows tau to be scalar, vector, matrix, or the string "optimal".

Stationary equations and block updates

The inner component associated with block \(j\) is defined by

\[ \mathbf{z}_j = \sum_{k \ne j} c_{jk}\, g'\!\left(\operatorname{Cov}(\mathbf{y}_j,\mathbf{y}_k)\right) \mathbf{y}_k, \]

where \(g'(\cdot)\) is the first derivative of the weight function \(g\).

The sample stationary equation can be written as

\[ \mathbf{w}_j \propto \left[ \tau_j \mathbf{I} + (1-\tau_j)\frac{1}{n}\mathbf{X}_j^\top \mathbf{X}_j \right]^{-1}\mathbf{X}_j^\top \mathbf{z}_j, \]

followed by normalization according to the RGCCA constraint.

This gives the core alternating algorithm:

  1. initialize weights,
  2. compute block components,
  3. compute the inner components \(\mathbf{z}_j\),
  4. update each block weight vector,
  5. iterate until the criterion stabilizes.

Several components: sequential RGCCA

The standard RGCCA implementation computes the first component of each block, then uses deflation to obtain higher-order components.

If \(\mathbf{y}_l^{(1)}\) is the first component for block \(l\), one deflates the block and solves the same type of optimization problem again on the residualized block. This yields components that are orthogonal either at the component level or at the weight level depending on the chosen deflation rule.

SGCCA

From RGCCA to sparse RGCCA

SGCCA keeps the same multiblock objective but introduces an \(\ell_1\) constraint to produce sparse weight vectors and therefore variable selection.

A convenient formulation is:

\[ \max_{\mathbf{w}_1,\ldots,\mathbf{w}_L} \sum_{k,l=1}^L c_{kl} \, g\!\left(\frac{1}{n} \mathbf{w}_k^\top \mathbf{X}_k^\top \mathbf{X}_l \mathbf{w}_l\right) \quad \text{subject to} \quad \mathbf{w}_l \in \Omega_l, \]

with

\[ \Omega_l = \left\{\mathbf{w}_l \in \mathbb{R}^{p_l}: \|\mathbf{w}_l\|_2 \le 1, \; \|\mathbf{w}_l\|_1 \le s_l \right\}. \]

Meaning of the sparsity parameter \(s_l\)

The quantity \(s_l\) controls the size of the feasible set.

Using norm equivalence,

\[ \|x\|_2 \le \|x\|_1 \le \sqrt{p_l}\,\|x\|_2, \]

the meaningful range is

\[ 1 \le s_l \le \sqrt{p_l}. \]

Interpretation:

  • \(s_l \approx 1\) means strong sparsity,
  • \(s_l \approx \sqrt{p_l}\) means weak sparsity.

Caution: in the implementation of SGCCA, we decided to scale the input value between 0 and 1 to make it more interpretable and easier to use. It means that, In R,

  • a sparsity parameter \(\approx 0\) means strong sparsity,
  • a sparsity parameter \(\approx 1\) means weak sparsity.

See below for more details.

SGCCA update: soft-thresholding

At each block update, one solves a linear maximization under simultaneous \(\ell_1\) and \(\ell_2\) constraints:

\[ \max_{\mathbf{x} \in \Omega} \mathbf{a}^\top \mathbf{x}, \qquad \Omega = \{\mathbf{x}: \|\mathbf{x}\|_2\le 1,\; \|\mathbf{x}\|_1\le s\}. \]

The solution has the form

\[ \mathbf{u} = \frac{S(\mathbf a,\lambda)}{\|S(\mathbf a,\lambda)\|_2}, \]

where \(S(\mathbf a,\lambda)\) is the soft-thresholding operator applied componentwise,

\[ S(a_i,\lambda)=\operatorname{sign}(a_i)\,(|a_i| - \lambda)_+. \]

The tuning parameter \(\lambda\) is chosen as follows:

  • if \(\|S(\mathbf a,0)\|_1 / \|S(\mathbf a, 0)\|_2 \le s\), then \(\lambda = 0\),
  • otherwise, \(\lambda > 0\) is chosen so that the active solution satisfies \[ \|\nu\|_1 = s. \]

This is the mathematical heart of SGCCA.

Mapping theory to the rgcca() function

Core arguments

The direct correspondence is as follows.

Theoretical object Meaning rgcca() argument
\(\mathbf{X}_1,\ldots,\mathbf{X}_L\) data blocks blocks
\(C=(c_{kl})\) design / connectivity matrix connection
\(g\) scheme function scheme
\(\mathbf{w}_l\) block weight vectors returned in $a
\(\mathbf{y}_l = \mathbf{X}_l \mathbf{w}_l\) block components returned in $Y
\(\tau_l\) RGCCA shrinkage / regularization tau
\(s_l\) or blockwise \(\ell_1\) bound SGCCA sparsity control sparsity
number of component stages number of extracted components ncomp
deflation orthogonality choice orthogonal components vs orthogonal weights comp_orth
block scaling before analysis equalize block influence scale_block
variable standardization standardize columns scale
algorithm initialization SVD or random start init
convergence threshold stopping rule tol
maximum iterations computational safeguard n_iter_max
covariance denominator \(n\) or \(n-1\) biased vs unbiased covariance estimate bias

tau

The argument tau can be:

  • a single number: same value for all blocks and all component stages,
  • a vector of length J: one value per block,
  • a matrix max(ncomp) x J: one value per block and per component stage,
  • the string "optimal": analytical Schäfer–Strimmer estimation.

Typical interpretations:

  • tau = 0: correlation-oriented RGCCA,
  • tau = 1: covariance-oriented RGCCA,
  • 0 < tau < 1: ridge compromise,
  • tau = "optimal": automatic shrinkage estimation.

sparsity

The argument sparsity is the package-level parameterization of SGCCA.

For block \(j\) with \(p_j\) variables, the package uses:

\[ \|a_{j,h}\|_1 \le \texttt{sparsity}_{h,j}\sqrt{p_j}. \]

So the package parameter is a normalized version of \(s_l\):

\[ s_l = \texttt{sparsity}_{h,j}\sqrt{p_j}. \]

This implies:

  • sparsity = 1 means no sparsity constraint beyond the trivial upper bound,
  • smaller values mean stronger sparsity,
  • the theoretical lower bound becomes \[ \texttt{sparsity}_{h,j} \ge \frac{1}{\sqrt{p_j}}. \]

In practice:

  • if sparsity < 1 for a block, the package uses the SGCCA update for that block;
  • if sparsity = 1, that block behaves as an RGCCA block.

method

The argument method is mostly a shortcut that sets a coherent group of arguments.

For the present note, the most relevant choices are:

  • method = "rgcca": use the RGCCA parameterization (tau matters),
  • method = "sgcca": use the SGCCA parameterization (sparsity matters).

Other methods in the package correspond to special cases obtained by fixing suitable combinations of:

  • scheme,
  • tau,
  • connection,
  • and sometimes superblock.

scheme

scheme can be one of:

  • "horst",
  • "centroid",
  • "factorial",

or an explicit R function such as function(x) x^4, provided it fits the theoretical assumptions used by the algorithm.

connection

This is the exact package representation of the theoretical design matrix \(C\).

For example, with three blocks in a chain \(1 - 2 - 3\):

Code
C <- matrix(c(
  0, 1, 0,
  1, 0, 1,
  0, 1, 0
), 3, 3, byrow = TRUE)

ncomp and sequential extraction

ncomp tells the package how many component stages are extracted for each block.

If ncomp = c(2,2,1), then:

  • block 1 gets two components,
  • block 2 gets two components,
  • block 3 gets one component.

The package implements this through sequential extraction and deflation.

comp_orth

This controls the deflation mode:

  • comp_orth = TRUE: orthogonal components inside each block,
  • comp_orth = FALSE: orthogonal weight vectors instead.

scale and scale_block

These are preprocessing arguments rather than theoretical parameters of the criterion itself, but they strongly affect the analysis.

  • scale = TRUE standardizes variables.
  • scale_block = "inertia" rescales each block by the square root of the total block inertia.
  • scale_block = "lambda1" rescales each block by the square root of its leading eigenvalue.

These options are especially important when blocks have very different sizes or scales.

init, tol, n_iter_max, bias

These are algorithmic parameters.

  • init: initialization strategy ("svd" or "random"),
  • tol: stopping threshold,
  • n_iter_max: maximum number of iterations,
  • bias: whether covariances use \(1/n\) (TRUE) or \(1/(n-1)\) (FALSE).

Minimal examples

RGCCA example

Code
library(RGCCA)

fit_rgcca <- rgcca(
  blocks     = blocks,
  connection = C,
  tau        = c(1, 0.5, 0),
  ncomp      = c(2, 2, 1),
  scheme     = "factorial",
  scale      = TRUE,
  init       = "svd",
  tol        = 1e-8,
  method     = "rgcca"
)

Interpretation:

  • block 1 is covariance-oriented,
  • block 3 is correlation-oriented,
  • block 2 is intermediate.

RGCCA with automatic shrinkage

Code
fit_rgcca_opt <- rgcca(
  blocks     = blocks,
  connection = C,
  tau        = "optimal",
  ncomp      = 2,
  scheme     = "factorial",
  method     = "rgcca"
)

SGCCA example

Code
fit_sgcca <- rgcca(
  blocks     = blocks,
  connection = C,
  sparsity   = c(0.7, 0.5, 1),
  ncomp      = c(2, 2, 1),
  scheme     = "factorial",
  scale      = TRUE,
  method     = "sgcca"
)

Interpretation:

  • blocks 1 and 2 are sparse,
  • block 3 is not sparse.

References

  • Tenenhaus, A., & Tenenhaus, M. (2011). Regularized Generalized Canonical Correlation Analysis. Psychometrika, 76(2), 257–284.
  • Tenenhaus A, Philippe C, Guillemot V, Le Cao KA, Grill J, Frouin V (2014). Variable selection for generalized canonical correlation analysis. Biostatistics, 15(3):569-83.
  • Girka F, Camenen E, Peltier C, Gloaguen A, Guillemot V, Le Brusquet L, Tenenhaus A (2023). RGCCA: Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data. R package version 3.0.3, https://CRAN.R-project.org/package=RGCCA.