stats
PLS
PLS(regions, group1, group2, *groups, marker=None)
This class facilitates mean-centered task Partial Least Squares Correlation on brain-wide results. This statistical tool analyzes the relationship between brain activity and experimental design.
It should be used when the \(I\) observations are structured into \(N\) groups
or markers
. PLSC searches for latent variables (i.e., \(L_X\) and \(L_Y\))
that express the largest amount of information common to both \(X\) and \(Y\), respectively the brain
activity and the groups matrices.
The implementation follows a tutorial from Krishnan et al., 2011; for better understanding this method, we strongly suggest reading it.
NOTE: if a region is missing from at least one observation, it wont' be taken into account in the analysis.
Parameters:
-
regions
:Sequence[str]
The acronyms of brain regions to take into account in the relationship between brain activities and experimental design.
-
group1
:AnimalGroup
The first cohort to take into account.
-
group2
:AnimalGroup
The second cohort to take into account.
-
*groups
:AnimalGroup
, default=()
Any other cohort to take into account.
-
marker
:str | Sequence[str]
, default=None
The marker whose activity has to be studied. If multiple markers are given, they'll be coupled with the respective
group
.
Raises:
-
ValueError
–If
marker
is empty, and the givengroups
don't have a single marker to choose from.
Ly
instance-attribute
The group scores, projection of the observed groups composition on u. Depending on the experimental design (e.g. the number of cohorts), there may be projections on multiple axes. In order to choose which latent variables to keep, you should generalize the results and test the null hypothesis.
Lx
instance-attribute
The brain scores, projection of the brain activity observations on v. Depending on the experimental design (e.g. the number of cohorts), there may be projections on multiple axes. In order to choose which latent variables to keep, you should generalize the results and test the null hypothesis.
u
property
Group profiles that best characterize \(R\), the matrix of the deviations of the groups to their grand mean.
s
property
Singular values of \(R\), the matrix of the deviations of the groups to their gran mean.
v
property
Brain regions profiles that best characterize \(R\), the matrix of the deviations of the groups to their grand mean.
s_sampling_distribution
property
The sampling distribution of the singular values, result of a random permutation of the singular values. Each row is a single permutation sample.
v_salience_scores
property
The normalised v scores with bootstrapping.
u_salience_scores
property
The normalised u scores with bootstrapping.
n_components
n_components()
Returns the number of components of the current PLS.
The number of components is determined by the number of group/marker compararisons of the PLS.
For example, if the current PLS
is comparing just two groups, it has only 1 component.
Returns:
-
int
–The maximum number of components of the current PLS.
random_permutation
random_permutation(n, seed=None)
Randomly shuffles to which group each brain is part of, and uses this permutation
sample to compute the mean-centered task PLSC. This process is repeated n
times.
The resulting sampling distribution
of the singular values can then be used to generalize the results (i.e. salient scores)
of current PLS as a null hypothesis test.
Parameters:
bootstrap_salience_scores
bootstrap_salience_scores(n, seed=None)
Identifies the regions \(r\) and groups \(g\) that are stable by assigning them a score akin to Z-score.
$$
\frac {v_r} {\hat \sigma(v_r)} \text { and } \frac {u_g} {\hat \sigma(u_g)}
$$
This normalization relies on a set of bootstrap samples to make an estimator of
salience standard error, \(\hat \sigma(v_r)\) and \(\hat \sigma(v_r)\).
This is achieved through repeatedly drawing samples with replacement from the original dataset.
Within each sample, the brains' group remains unchanged while the composition of such groups may change.
Mean-centered task PLS is computed on each sample, effectively creating a large number
of \(u\) and \(v\) salience scores samples.
NOTE: any interpretation of the resulting normalized scores should be coupled with the result of a permutation test.
Parameters:
test_null_hypothesis
test_null_hypothesis()
Tests the null hypothesis on the sampling distribution of the singular values.
Returns:
-
float
–A p-value for each latent variable/component.
above_threshold
above_threshold(threshold, component=1)
Get the component
-th regions salience scores that are above threshold
.
Parameters:
-
threshold
:float
A Z-score value. See
to_zscore
-
component
:int
, default=1
The n-th component (or latent variable) of the salience scores on which to apply the filter. It cannot be less than 1.
Returns:
to_zscore
staticmethod
to_zscore(p, two_tailed=True)
density
density(brain)
For each region \(r\) of brain
, it computes the density \(D(m)\) for each marker \(m\):
$$
D(m_r) : \frac {m_r} {size_r}
$$
with \(m_r\) being the raw number of \(m\) detections in region \(r\).
Parameters:
-
brain
:AnimalBrain
The brain to compute density on.
Returns:
-
AnimalBrain
–A new brain with density data.
Raises:
-
ValueError
–If
brain
does not contain raw data of marker countings.
percentage
percentage(brain)
For each region \(r\) of brain
, it computes the percentage \(P(m)\) for each marker \(m\)
detection compared to brain-wide \(m\) counts:
$$
P(m_r) : \frac {m_r} {m_{root}}
$$
with \(m_r\) being the raw number of \(m\) detections in region \(r\).
Parameters:
-
brain
:AnimalBrain
The brain to compute percentage on.
Returns:
-
AnimalBrain
–A new brain with percentage data.
Raises:
-
ValueError
–If
brain
does not contain raw data of marker countings.
relative_density
relative_density(brain)
For each region \(r\) of brain
, it computes the density fold change
of each marker \(m\) compared to brain-wide marker density:
$$
RD(m_r) : \frac {m_r/size_r} {m_{root}/size_{root}}
$$
with \(m_r\) being the raw number of \(m\) detections in region \(r\).
Parameters:
-
brain
:AnimalBrain
The brain to compute relative density on.
Returns:
-
AnimalBrain
–A new brain with relative density data.
Raises:
-
ValueError
–If
brain
does not contain raw data of marker countings.
fold_change
fold_change(brain, group)
For each brain region in brain
, compute the
fold change of its markers with respect to group
's mean.
Parameters:
-
brain
:AnimalBrain
The brain for which to compute the fold change.
-
group
:AnimalGroup
The group whose mean is the basis of the fold change.
Returns:
-
AnimalBrain
–A new brain with fold change data.
See also
diff_change
diff_change(brain, group)
For each brain region in brain
, compute the difference between its markers and the group
's mean.
Parameters:
-
brain
:AnimalBrain
The brain for which to compute the difference.
-
group
:AnimalGroup
The group whose mean is subtracted.
Returns:
-
AnimalBrain
–A new brain of the difference of
brain
fromgroup
's mean.
See also
markers_overlap
markers_overlap(brain, marker1, marker2)
For each region, it computes the ratio of positive cells and double positive counts;
for both marker1
and marker2
:
$$
O(m_1,m_{1,2}) : \frac {m_1} {m_{1,2}}
$$
$$
O(m_2,m_{1,2}) : \frac {m_2} {m_{1,2}}
$$
with \(m_{1,2}\) being the number of detections being marker1
and marker2
positive.
Parameters:
-
brain
:AnimalBrain
The brain for which to compute the markers overlapping rate.
-
marker1
:str
The first overlapping marker.
-
marker2
:str
The second overlapping marker.
Returns:
-
AnimalBrain
–A new brain of the overlapping rate for
marker1
andmarker2
.
Raises:
-
ValueError
–If
brain
does not contain raw data of marker countings. -
ValueError
–If
brain
does not have any raw data ofmarker1
ormarker2
countings -
ValueError
–If
brain
does not contain any raw data of themarker1
andmarker2
double positive countings.
markers_jaccard_index
markers_jaccard_index(brain, marker1, marker2)
For each region, it computes the Jaccard index
measuring the similarity between two markers activity and the respective double positivity.
$$
J(m_1,m_2) : \frac {m_{1,2}} {m_1+m_2-m_{1,2}}
$$
with \(m_{1,2}\) being the number of detections being marker1
and marker2
positive.
Parameters:
-
brain
:AnimalBrain
The brain for which to compute the markers Jaccard index.
-
marker1
:str
The first overlapping marker.
-
marker2
:str
The second overlapping marker.
Returns:
-
AnimalBrain
–A new brain of the Jaccard index for
marker1
andmarker2
.
Raises:
-
ValueError
–If
brain
does not contain raw data of marker countings. -
ValueError
–If
brain
does not have any raw data ofmarker1
ormarker2
countings -
ValueError
–If
brain
does not contain any raw data of themarker1
andmarker2
double positive countings.
markers_similarity_index
markers_similarity_index(brain, marker1, marker2)
For each region, it computes an index of similarity between two markers activity
and the respective double positivity; it is defined as:
$$
S(m_1,m_2) : \frac {m_{1,2}^2} {m_1 \cdot m_2}
$$
with \(m_{1,2}\) being the number of detections being marker1
and marker2
positive.
NOTE: \(S(m_1,m_2) = 1 \iff m_1 = m_2 = m_{1,2}\).
Additionally, if either \(m1\) or \(m2\) is zero, it goes to infinite.
Parameters:
-
brain
:AnimalBrain
The brain for which to compute the markers similarity index.
-
marker1
:str
The first overlapping marker.
-
marker2
:str
The second overlapping marker.
Returns:
-
AnimalBrain
–A new brain of the similarity index between
marker1
andmarker2
.
Raises:
-
ValueError
–If
brain
does not contain raw data of marker countings. -
ValueError
–If
brain
does not have any raw data ofmarker1
ormarker2
countings -
ValueError
–If
brain
does not contain any raw data of themarker1
andmarker2
double positive countings.
markers_overlap_coefficient
markers_overlap_coefficient(brain, marker1, marker2)
For each region, it computes the overlapping coefficient (or
Szymkiewicz–Simpson coefficient),
an index of similarity between two markers activity and the respective double positivity;
it is defined as:
$$
S(m_1,m_2) : \frac {m_{1,2}} {\min({m_1}, {m_2})}
$$
with \(m_{1,2}\) being the number of detections being marker1
and marker2
positive.
Parameters:
-
brain
:AnimalBrain
The brain for which to compute the markers overlapping coefficient.
-
marker1
:str
The first overlapping marker.
-
marker2
:str
The second overlapping marker.
Returns:
-
AnimalBrain
–A new brain of the overlapping coefficient between
marker1
andmarker2
.
Raises:
-
ValueError
–If
brain
does not contain raw data of marker countings. -
ValueError
–If
brain
does not have any raw data ofmarker1
ormarker2
countings -
ValueError
–If
brain
does not contain any raw data of themarker1
andmarker2
double positive countings.
markers_difference
markers_difference(brain, marker1, marker2)
For each brain region in brain
, compute the difference between two markers.
Parameters:
-
brain
:AnimalBrain
The brain for which to compute the difference.
-
marker1
:str
The first marker to subtract.
-
marker2
:str
The second marker to subtract.
Returns:
-
AnimalBrain
–A new brain of the difference between
marker1
andmarker2
.
markers_correlation
markers_correlation(marker1, marker2, group, other=None, method='pearson')
For each brain region in group
, compute the correlation between two markers
within all animals in the cohort.
Parameters:
-
marker1
:str
The first marker to correlate.
-
marker2
:str
The second marker to correlate.
-
group
:AnimalGroup
The group from which all animals are taken to compute the correlation.
-
other
:AnimalGroup
, default=None
If specified, it uses data from
other
'smarker2
. -
method
:str
, default='pearson'
Any method accepted by
DataFrame.corrwith
.
Returns:
-
BrainData
–Brain data of the correlation between
marker1
andmarker2
.
pls_regions_salience
pls_regions_salience(group1, group2, selected_regions, marker=None, n_bootstrap=5000, component=1, fill_nan=True, seed=None, test_h0=True, p_value=0.05, n_permutation=5000)
Computes PLS between two groups with the same markers.
It estimates the standard error of the regions' saliences by bootstrap.
NOTE: it assumes that the component
-th latent variable is generalisable
by permutation test. If they were not,
the resulting salience scores would not be reliable.
Parameters:
-
group1
:AnimalGroup
The first cohort to analyze.
-
group2
:AnimalGroup
The second cohort to analyze.
-
selected_regions
:list[str]
The acronyms of brain regions to take into account in the relationship between brain activities and experimental design.
-
marker
:str
, default=None
The marker whose activity has to be studied. If
None
, a separatePLS
will be computed on each marker in the groups independently. -
n_bootstrap
:int
, default=5000
The
n
paremeter inbootstrap_salience_scores
. -
component
:int
, default=1
The n-th component (or latent variable) of the salience scores.
-
fill_nan
Whether to fill with
NA
the scores of those regions for which the salience is not computable (e.g. if brain data is missing in at least one brain of the groups). -
seed
A random seed.
Returns: