Code
C <- matrix(c(
0, 1, 0,
1, 0, 1,
0, 1, 0
), 3, 3, byrow = TRUE)rgcca()This note gives a compact but reasonably detailed presentation of Regularized Generalized Canonical Correlation Analysis (RGCCA) and Sparse Generalized Canonical Correlation Analysis (SGCCA), with explicit equations and a practical mapping to the arguments of the function rgcca() from the RGCCA package.
The emphasis is on:
tau,rgcca() arguments.Most important theoretical knobs:
connection controls which block relationships are targeted.scheme controls how these relationships are aggregated.tau controls the correlation–covariance compromise in RGCCA.sparsity controls variable selection in SGCCA.ncomp controls how many component stages are extracted.Assume we observe \(L\) blocks of centered variables on the same \(n\) individuals:
\[ \mathbf{X}_1,\ldots,\mathbf{X}_L, \qquad \mathbf{X}_l \in \mathbb{R}^{n \times p_l}. \]
For block \(l\), RGCCA looks for a weight vector
\[ \mathbf{w}_l \in \mathbb{R}^{p_l} \]
and the associated block component
\[ \mathbf{y}_l = \mathbf{X}_l \mathbf{w}_l \in \mathbb{R}^{n}. \]
The relationships between blocks are encoded by a symmetric design matrix
\[ C = (c_{kl}) \in \mathbb{R}^{L \times L}, \qquad c_{kl} \ge 0, \]
where usually \(c_{kl}=1\) if blocks \(k\) and \(l\) are connected, and \(0\) otherwise.
A modern formulation of RGCCA is:
\[ \max_{\mathbf{w}_1,\ldots,\mathbf{w}_L} \sum_{k,l=1}^L c_{kl} \, g\!\left(\frac{1}{n} \mathbf{w}_k^\top \mathbf{X}_k^\top \mathbf{X}_l \mathbf{w}_l\right) \quad \text{subject to} \quad \mathbf{w}_l^\top \mathbf{M}_l \mathbf{w}_l = 1, \; l=1,\ldots,L. \]
Here:
This formulation is convenient because many multiblock methods become special cases for suitable choices of \((g, \mathbf{M}_l, C)\).
In the original paper, the block constraint is written with a shrinkage parameter \(\tau_l \in [0,1]\):
\[ \max_{\mathbf{w}_1,\ldots,\mathbf{w}_L} \sum_{k,l=1}^L c_{kl} \, g\!\left(\operatorname{Cov}(\mathbf{X}_k \mathbf{w}_k, \mathbf{X}_l \mathbf{w}_l)\right) \]
subject to
\[ \tau_l \|\mathbf{w}_l\|_2^2 + (1-\tau_l)\operatorname{Var}(\mathbf{X}_l \mathbf{w}_l) = 1, \qquad l=1,\ldots,L. \]
Since
\[ \operatorname{Var}(\mathbf{X}_l \mathbf{w}_l) = \frac{1}{n} \mathbf{w}_l^\top \mathbf{X}_l^\top \mathbf{X}_l \mathbf{w}_l, \]
the constraint is equivalent to
\[ \mathbf{w}_l^\top \left[ \tau_l \mathbf{I}_{p_l} + (1-\tau_l)\frac{1}{n}\mathbf{X}_l^\top \mathbf{X}_l \right] \mathbf{w}_l = 1. \]
Hence the link with the previous formulation is
\[ \mathbf{M}_l = \tau_l \mathbf{I}_{p_l} + (1-\tau_l)\frac{1}{n}\mathbf{X}_l^\top \mathbf{X}_l. \]
The matrix \(C\) determines which block relationships are explicitly optimized.
So RGCCA is not restricted to a complete graph: one can impose a chain, a star, a supervised graph, or any other symmetric connectivity pattern.
The usual choices are:
\[ g(x)=x \]
This leads to maximizing a sum of covariances.
\[ g(x)=|x| \]
This leads to maximizing a sum of absolute covariances.
\[ g(x)=x^2 \]
This leads to maximizing a sum of squared covariances.
The package default is usually the factorial scheme. The scheme controls how strongly large block-to-block associations are emphasized and whether signs matter.
The parameter \(\tau_l\) is the key regularization parameter of RGCCA.
If \(\tau_l = 0\), the constraint becomes
\[ \operatorname{Var}(\mathbf{X}_l \mathbf{w}_l)=1, \]
so the block component is variance-normalized. In that case, covariance terms behave like correlation terms. This corresponds to the correlation-oriented side of RGCCA.
If \(\tau_l = 1\), the constraint becomes
\[ \|\mathbf{w}_l\|_2^2 = 1, \]
which is the covariance-oriented side.
For \(0 < \tau_l < 1\), RGCCA interpolates continuously between these two regimes. In the paper, this is interpreted as a ridge-type compromise between correlation and covariance criteria.
A useful equivalent expression is:
\[ \widehat\Sigma_l(\tau_l) = \tau_l \mathbf{I}_{p_l} + (1-\tau_l)\mathbf{S}_{ll}, \qquad \mathbf{S}_{ll}=\frac{1}{n}\mathbf{X}_l^\top \mathbf{X}_l. \]
Thus \(\tau_l\) shrinks the empirical within-block covariance matrix toward the identity.
This is why the literature often reads \(\tau_l=0\) as a mode close to canonical correlation behavior, and \(\tau_l=1\) as a mode closer to covariance-based methods.
The 2011 paper discusses the Schäfer–Strimmer shrinkage estimator. At the covariance level, one considers
\[ \widehat\Sigma_l(\tau_l)= \tau_l \mathbf{I} + (1-\tau_l)\mathbf{S}_{ll}. \]
The optimal shrinkage intensity is chosen to minimize a Frobenius-risk criterion of the form
\[ \operatorname{MSE} = \mathbb{E}\big[\|\widehat\Sigma_l(\tau_l)-\Sigma_{ll}\|_F^2\big]. \]
The paper gives the corresponding analytical estimator:
\[ \hat\tau_l^* = \frac{ \sum_{k\ne m}\operatorname{Var}(s_{l,km}) + \sum_k \operatorname{Var}(s_{l,kk}) }{ \sum_{k\ne m}s_{l,km}^2 + \sum_k (s_{l,kk}-1)^2 }. \]
The current package exposes this through tau = "optimal".
By inspecting the uploaded package source, the current implementation computes tau = "optimal" block-wise at each deflation stage. Therefore:
tau per block;That behavior is consistent with the package documentation, which allows tau to be scalar, vector, matrix, or the string "optimal".
The inner component associated with block \(j\) is defined by
\[ \mathbf{z}_j = \sum_{k \ne j} c_{jk}\, g'\!\left(\operatorname{Cov}(\mathbf{y}_j,\mathbf{y}_k)\right) \mathbf{y}_k, \]
where \(g'(\cdot)\) is the first derivative of the weight function \(g\).
The sample stationary equation can be written as
\[ \mathbf{w}_j \propto \left[ \tau_j \mathbf{I} + (1-\tau_j)\frac{1}{n}\mathbf{X}_j^\top \mathbf{X}_j \right]^{-1}\mathbf{X}_j^\top \mathbf{z}_j, \]
followed by normalization according to the RGCCA constraint.
This gives the core alternating algorithm:
The standard RGCCA implementation computes the first component of each block, then uses deflation to obtain higher-order components.
If \(\mathbf{y}_l^{(1)}\) is the first component for block \(l\), one deflates the block and solves the same type of optimization problem again on the residualized block. This yields components that are orthogonal either at the component level or at the weight level depending on the chosen deflation rule.
SGCCA keeps the same multiblock objective but introduces an \(\ell_1\) constraint to produce sparse weight vectors and therefore variable selection.
A convenient formulation is:
\[ \max_{\mathbf{w}_1,\ldots,\mathbf{w}_L} \sum_{k,l=1}^L c_{kl} \, g\!\left(\frac{1}{n} \mathbf{w}_k^\top \mathbf{X}_k^\top \mathbf{X}_l \mathbf{w}_l\right) \quad \text{subject to} \quad \mathbf{w}_l \in \Omega_l, \]
with
\[ \Omega_l = \left\{\mathbf{w}_l \in \mathbb{R}^{p_l}: \|\mathbf{w}_l\|_2 \le 1, \; \|\mathbf{w}_l\|_1 \le s_l \right\}. \]
The quantity \(s_l\) controls the size of the feasible set.
Using norm equivalence,
\[ \|x\|_2 \le \|x\|_1 \le \sqrt{p_l}\,\|x\|_2, \]
the meaningful range is
\[ 1 \le s_l \le \sqrt{p_l}. \]
Interpretation:
Caution: in the implementation of SGCCA, we decided to scale the input value between 0 and 1 to make it more interpretable and easier to use. It means that, In R,
See below for more details.
At each block update, one solves a linear maximization under simultaneous \(\ell_1\) and \(\ell_2\) constraints:
\[ \max_{\mathbf{x} \in \Omega} \mathbf{a}^\top \mathbf{x}, \qquad \Omega = \{\mathbf{x}: \|\mathbf{x}\|_2\le 1,\; \|\mathbf{x}\|_1\le s\}. \]
The solution has the form
\[ \mathbf{u} = \frac{S(\mathbf a,\lambda)}{\|S(\mathbf a,\lambda)\|_2}, \]
where \(S(\mathbf a,\lambda)\) is the soft-thresholding operator applied componentwise,
\[ S(a_i,\lambda)=\operatorname{sign}(a_i)\,(|a_i| - \lambda)_+. \]
The tuning parameter \(\lambda\) is chosen as follows:
This is the mathematical heart of SGCCA.
rgcca() functionThe direct correspondence is as follows.
| Theoretical object | Meaning | rgcca() argument |
|---|---|---|
| \(\mathbf{X}_1,\ldots,\mathbf{X}_L\) | data blocks | blocks |
| \(C=(c_{kl})\) | design / connectivity matrix | connection |
| \(g\) | scheme function | scheme |
| \(\mathbf{w}_l\) | block weight vectors | returned in $a |
| \(\mathbf{y}_l = \mathbf{X}_l \mathbf{w}_l\) | block components | returned in $Y |
| \(\tau_l\) | RGCCA shrinkage / regularization | tau |
| \(s_l\) or blockwise \(\ell_1\) bound | SGCCA sparsity control | sparsity |
| number of component stages | number of extracted components | ncomp |
| deflation orthogonality choice | orthogonal components vs orthogonal weights | comp_orth |
| block scaling before analysis | equalize block influence | scale_block |
| variable standardization | standardize columns | scale |
| algorithm initialization | SVD or random start | init |
| convergence threshold | stopping rule | tol |
| maximum iterations | computational safeguard | n_iter_max |
| covariance denominator \(n\) or \(n-1\) | biased vs unbiased covariance estimate | bias |
tauThe argument tau can be:
J: one value per block,max(ncomp) x J: one value per block and per component stage,"optimal": analytical Schäfer–Strimmer estimation.Typical interpretations:
tau = 0: correlation-oriented RGCCA,tau = 1: covariance-oriented RGCCA,0 < tau < 1: ridge compromise,tau = "optimal": automatic shrinkage estimation.sparsityThe argument sparsity is the package-level parameterization of SGCCA.
For block \(j\) with \(p_j\) variables, the package uses:
\[ \|a_{j,h}\|_1 \le \texttt{sparsity}_{h,j}\sqrt{p_j}. \]
So the package parameter is a normalized version of \(s_l\):
\[ s_l = \texttt{sparsity}_{h,j}\sqrt{p_j}. \]
This implies:
sparsity = 1 means no sparsity constraint beyond the trivial upper bound,In practice:
sparsity < 1 for a block, the package uses the SGCCA update for that block;sparsity = 1, that block behaves as an RGCCA block.methodThe argument method is mostly a shortcut that sets a coherent group of arguments.
For the present note, the most relevant choices are:
method = "rgcca": use the RGCCA parameterization (tau matters),method = "sgcca": use the SGCCA parameterization (sparsity matters).Other methods in the package correspond to special cases obtained by fixing suitable combinations of:
scheme,tau,connection,superblock.schemescheme can be one of:
"horst","centroid","factorial",or an explicit R function such as function(x) x^4, provided it fits the theoretical assumptions used by the algorithm.
connectionThis is the exact package representation of the theoretical design matrix \(C\).
For example, with three blocks in a chain \(1 - 2 - 3\):
C <- matrix(c(
0, 1, 0,
1, 0, 1,
0, 1, 0
), 3, 3, byrow = TRUE)ncomp and sequential extractionncomp tells the package how many component stages are extracted for each block.
If ncomp = c(2,2,1), then:
The package implements this through sequential extraction and deflation.
comp_orthThis controls the deflation mode:
comp_orth = TRUE: orthogonal components inside each block,comp_orth = FALSE: orthogonal weight vectors instead.scale and scale_blockThese are preprocessing arguments rather than theoretical parameters of the criterion itself, but they strongly affect the analysis.
scale = TRUE standardizes variables.scale_block = "inertia" rescales each block by the square root of the total block inertia.scale_block = "lambda1" rescales each block by the square root of its leading eigenvalue.These options are especially important when blocks have very different sizes or scales.
init, tol, n_iter_max, biasThese are algorithmic parameters.
init: initialization strategy ("svd" or "random"),tol: stopping threshold,n_iter_max: maximum number of iterations,bias: whether covariances use \(1/n\) (TRUE) or \(1/(n-1)\) (FALSE).library(RGCCA)
fit_rgcca <- rgcca(
blocks = blocks,
connection = C,
tau = c(1, 0.5, 0),
ncomp = c(2, 2, 1),
scheme = "factorial",
scale = TRUE,
init = "svd",
tol = 1e-8,
method = "rgcca"
)Interpretation:
fit_rgcca_opt <- rgcca(
blocks = blocks,
connection = C,
tau = "optimal",
ncomp = 2,
scheme = "factorial",
method = "rgcca"
)fit_sgcca <- rgcca(
blocks = blocks,
connection = C,
sparsity = c(0.7, 0.5, 1),
ncomp = c(2, 2, 1),
scheme = "factorial",
scale = TRUE,
method = "sgcca"
)Interpretation: