stats

PLS

PLS(regions, group1, group2, *groups, marker=None)

This class facilitates mean-centered task Partial Least Squares Correlation on brain-wide results. This statistical tool analyzes the relationship between brain activity and experimental design. It should be used when the $I$ observations are structured into $N$ groups or markers. PLSC searches for latent variables (i.e., $L_X$ and $L_Y$) that express the largest amount of information common to both $X$ and $Y$, respectively the brain activity and the groups matrices.

The implementation follows a tutorial from Krishnan et al., 2011; for better understanding this method, we strongly suggest reading it.

NOTE: if a region is missing from at least one observation, it wont' be taken into account in the analysis.

Parameters:

regions : Sequence[str]

The acronyms of brain regions to take into account in the relationship between brain activities and experimental design.
group1 : AnimalGroup

The first cohort to take into account.
group2 : AnimalGroup

The second cohort to take into account.
*groups : AnimalGroup, default= ()

Any other cohort to take into account.
marker : str | Sequence[str], default= None

The marker whose activity has to be studied. If multiple markers are given, they'll be coupled with the respective group.

Raises:

ValueError –

If marker is empty, and the given groups don't have a single marker to choose from.

Ly `instance-attribute`

The group scores, projection of the observed groups composition on u. Depending on the experimental design (e.g. the number of cohorts), there may be projections on multiple axes. In order to choose which latent variables to keep, you should generalize the results and test the null hypothesis.

Lx `instance-attribute`

The brain scores, projection of the brain activity observations on v. Depending on the experimental design (e.g. the number of cohorts), there may be projections on multiple axes. In order to choose which latent variables to keep, you should generalize the results and test the null hypothesis.

u `property`

Group profiles that best characterize $R$, the matrix of the deviations of the groups to their grand mean.

s `property`

Singular values of $R$, the matrix of the deviations of the groups to their gran mean.

v `property`

Brain regions profiles that best characterize $R$, the matrix of the deviations of the groups to their grand mean.

s_sampling_distribution `property`

The sampling distribution of the singular values, result of a random permutation of the singular values. Each row is a single permutation sample.

v_salience_scores `property`

The normalised v scores with bootstrapping.

u_salience_scores `property`

The normalised u scores with bootstrapping.

n_components

n_components()

Returns the number of components of the current PLS. The number of components is determined by the number of group/marker compararisons of the PLS. For example, if the current PLS is comparing just two groups, it has only 1 component.

Returns:

int –

The maximum number of components of the current PLS.

random_permutation

random_permutation(n, seed=None)

Randomly shuffles to which group each brain is part of, and uses this permutation sample to compute the mean-centered task PLSC. This process is repeated n times. The resulting sampling distribution of the singular values can then be used to generalize the results (i.e. salient scores) of current PLS as a null hypothesis test.

Parameters:

n : int

The number of permutations done to create the sampling distribution.
seed : Number, default= None

A random seed.

bootstrap_salience_scores

bootstrap_salience_scores(n, seed=None)

Identifies the regions $r$ and groups $g$ that are stable by assigning them a score akin to Z-score. $$ \frac {v_r} {\hat \sigma(v_r)} \text { and } \frac {u_g} {\hat \sigma(u_g)} $$ This normalization relies on a set of bootstrap samples to make an estimator of salience standard error, $\hat \sigma(v_r)$ and $\hat \sigma(v_r)$.
This is achieved through repeatedly drawing samples with replacement from the original dataset. Within each sample, the brains' group remains unchanged while the composition of such groups may change. Mean-centered task PLS is computed on each sample, effectively creating a large number of $u$ and $v$ salience scores samples.

NOTE: any interpretation of the resulting normalized scores should be coupled with the result of a permutation test.

Parameters:

n : int

The size of the set of bootstrap samples, used to compute the standard error.
seed : Number, default= None

A random seed.

test_null_hypothesis

test_null_hypothesis()

Tests the null hypothesis on the sampling distribution of the singular values.

Returns:

float –

A p-value for each latent variable/component.

above_threshold

above_threshold(threshold, component=1)

Get the component-th regions salience scores that are above threshold.

Parameters:

threshold : float

A Z-score value. See to_zscore
component : int, default= 1

The n-th component (or latent variable) of the salience scores on which to apply the filter. It cannot be less than 1.

Returns:

pd.DataFrame –

The list of brain regions, along with the relative score, that have a salience above threshold.

to_zscore `staticmethod`

to_zscore(p, two_tailed=True)

Given a probability in null-hypothesis significance testing, it computes the equivalent Z-score

Parameters:

p : float

The probability.
two_tailed : bool, default= True

Whether the p corresponds to a two-tailed or one-tailed test.

Returns:

float –

description

density

density(brain)

For each region $r$ of brain, it computes the density $D(m)$ for each marker $m$: $$ D(m_r) : \frac {m_r} {size_r} $$ with $m_r$ being the raw number of $m$ detections in region $r$.

Parameters:

brain : AnimalBrain

The brain to compute density on.

Returns:

AnimalBrain –

A new brain with density data.

Raises:

ValueError –

If brain does not contain raw data of marker countings.

percentage

percentage(brain)

For each region $r$ of brain, it computes the percentage $P(m)$ for each marker $m$ detection compared to brain-wide $m$ counts: $$ P(m_r) : \frac {m_r} {m_{root}} $$ with $m_r$ being the raw number of $m$ detections in region $r$.

Parameters:

brain : AnimalBrain

The brain to compute percentage on.

Returns:

AnimalBrain –

A new brain with percentage data.

Raises:

ValueError –

If brain does not contain raw data of marker countings.

relative_density

relative_density(brain)

For each region $r$ of brain, it computes the density fold change of each marker $m$ compared to brain-wide marker density: $$ RD(m_r) : \frac {m_r/size_r} {m_{root}/size_{root}} $$ with $m_r$ being the raw number of $m$ detections in region $r$.

Parameters:

brain : AnimalBrain

The brain to compute relative density on.

Returns:

AnimalBrain –

A new brain with relative density data.

Raises:

ValueError –

If brain does not contain raw data of marker countings.

fold_change

fold_change(brain, group)

For each brain region in brain, compute the fold change of its markers with respect to group's mean.

Parameters:

brain : AnimalBrain

The brain for which to compute the fold change.
group : AnimalGroup

The group whose mean is the basis of the fold change.

Returns:

AnimalBrain –

A new brain with fold change data.

diff_change

diff_change(brain, group)

For each brain region in brain, compute the difference between its markers and the group's mean.

Parameters:

brain : AnimalBrain

The brain for which to compute the difference.
group : AnimalGroup

The group whose mean is subtracted.

Returns:

AnimalBrain –

A new brain of the difference of brain from group's mean.

markers_overlap

markers_overlap(brain, marker1, marker2)

For each region, it computes the ratio of positive cells and double positive counts; for both marker1 and marker2: $$ O(m_1,m_{1,2}) : \frac {m_{1,2}} {m_1} $$ $$ O(m_2,m_{1,2}) : \frac {m_{1,2}} {m_2} $$ with $m_{1,2}$ being the number of detections being marker1 and marker2 positive.

Parameters:

brain : AnimalBrain

The brain for which to compute the markers overlapping rate.
marker1 : str

The first overlapping marker.
marker2 : str

The second overlapping marker.

Returns:

AnimalBrain –

A new brain of the overlapping rate for marker1 and marker2.

Raises:

ValueError –

If brain does not contain raw data of marker countings.
ValueError –

If brain does not have any raw data of marker1 or marker2 countings
ValueError –

If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_jaccard_index

markers_jaccard_index(brain, marker1, marker2)

For each region, it computes the Jaccard index measuring the similarity between two markers activity and the respective double positivity. $$ J(m_1,m_2) : \frac {m_{1,2}} {m_1+m_2-m_{1,2}} $$ with $m_{1,2}$ being the number of detections being marker1 and marker2 positive.

Parameters:

brain : AnimalBrain

The brain for which to compute the markers Jaccard index.
marker1 : str

The first overlapping marker.
marker2 : str

The second overlapping marker.

Returns:

AnimalBrain –

A new brain of the Jaccard index for marker1 and marker2.

Raises:

ValueError –

If brain does not contain raw data of marker countings.
ValueError –

If brain does not have any raw data of marker1 or marker2 countings
ValueError –

If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_similarity_index

markers_similarity_index(brain, marker1, marker2)

For each region, it computes an index of similarity between two markers activity and the respective double positivity; it is defined as: $$ S(m_1,m_2) : \frac {m_{1,2}^2} {m_1 \cdot m_2} $$ with $m_{1,2}$ being the number of detections being marker1 and marker2 positive.

NOTE: $S(m_1,m_2) = 1 \iff m_1 = m_2 = m_{1,2}$.
Additionally, if either $m1$ or $m2$ is zero, it goes to infinite.

Parameters:

brain : AnimalBrain

The brain for which to compute the markers similarity index.
marker1 : str

The first overlapping marker.
marker2 : str

The second overlapping marker.

Returns:

AnimalBrain –

A new brain of the similarity index between marker1 and marker2.

Raises:

ValueError –

If brain does not contain raw data of marker countings.
ValueError –

If brain does not have any raw data of marker1 or marker2 countings
ValueError –

If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_overlap_coefficient

markers_overlap_coefficient(brain, marker1, marker2)

For each region, it computes the overlapping coefficient (or Szymkiewicz–Simpson coefficient), an index of similarity between two markers activity and the respective double positivity; it is defined as: $$ S(m_1,m_2) : \frac {m_{1,2}} {\min({m_1}, {m_2})} $$ with $m_{1,2}$ being the number of detections being marker1 and marker2 positive.

Parameters:

brain : AnimalBrain

The brain for which to compute the markers overlapping coefficient.
marker1 : str

The first overlapping marker.
marker2 : str

The second overlapping marker.

Returns:

AnimalBrain –

A new brain of the overlapping coefficient between marker1 and marker2.

Raises:

ValueError –

If brain does not contain raw data of marker countings.
ValueError –

If brain does not have any raw data of marker1 or marker2 countings
ValueError –

If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_difference

markers_difference(brain, marker1, marker2)

For each brain region in brain, compute the difference between two markers.

Parameters:

brain : AnimalBrain

The brain for which to compute the difference.
marker1 : str

The first marker to subtract.
marker2 : str

The second marker to subtract.

Returns:

AnimalBrain –

A new brain of the difference between marker1 and marker2.

markers_correlation

markers_correlation(marker1, marker2, group, other=None, method='pearson')

For each brain region in group, compute the correlation between two markers within all animals in the cohort.

Parameters:

marker1 : str

The first marker to correlate.
marker2 : str

The second marker to correlate.
group : AnimalGroup

The group from which all animals are taken to compute the correlation.
other : AnimalGroup, default= None

If specified, it uses data from other's marker2.
method : str, default= 'pearson'

Any method accepted by DataFrame.corrwith.

Returns:

BrainData –

Brain data of the correlation between marker1 and marker2.

pls_regions_salience

pls_regions_salience(group1, group2, selected_regions, marker=None, n_bootstrap=5000, component=1, fill_nan=True, seed=None, test_h0=True, p_value=0.05, n_permutation=5000)

Computes PLS between two groups with the same markers.
It estimates the standard error of the regions' saliences by bootstrap.

NOTE: it assumes that the component-th latent variable is generalisable by permutation test. If they were not, the resulting salience scores would not be reliable.

Parameters:

group1 : AnimalGroup

The first cohort to analyze.
group2 : AnimalGroup

The second cohort to analyze.
selected_regions : list[str]

The acronyms of brain regions to take into account in the relationship between brain activities and experimental design.
marker : str, default= None

The marker whose activity has to be studied. If None, a separate PLS will be computed on each marker in the groups independently.
n_bootstrap : int, default= 5000

The n paremeter in bootstrap_salience_scores.
component : int, default= 1

The n-th component (or latent variable) of the salience scores.
fill_nan

Whether to fill with NA the scores of those regions for which the salience is not computable (e.g. if brain data is missing in at least one brain of the groups).
seed

A random seed.

Returns:

BrainData | dict[str, BrainData] –

A BrainData of the regions salience scores based on marker activity. If marker=None and the groups have multiple markers, it returns a dictionary mapping each marker into the respective regions salience.

stats

PLS

Ly instance-attribute

Lx instance-attribute

u property

s property

v property

s_sampling_distribution property

v_salience_scores property

u_salience_scores property

n_components

random_permutation

bootstrap_salience_scores

test_null_hypothesis

above_threshold

to_zscore staticmethod

density

percentage

relative_density

fold_change

diff_change

markers_overlap

markers_jaccard_index

markers_similarity_index

markers_overlap_coefficient

markers_difference

markers_correlation

pls_regions_salience

Ly `instance-attribute`

Lx `instance-attribute`

u `property`

s `property`

v `property`

s_sampling_distribution `property`

v_salience_scores `property`

u_salience_scores `property`

to_zscore `staticmethod`