Skip to content

stats

PLS

PLS(regions, group1, group2, *groups, marker=None)

This class facilitates mean-centered task Partial Least Squares Correlation on brain-wide results. This statistical tool analyzes the relationship between brain activity and experimental design. It should be used when the \(I\) observations are structured into \(N\) groups or markers. PLSC searches for latent variables (i.e., \(L_X\) and \(L_Y\)) that express the largest amount of information common to both \(X\) and \(Y\), respectively the brain activity and the groups matrices.

The implementation follows a tutorial from Krishnan et al., 2011; for better understanding this method, we strongly suggest reading it.

NOTE: if a region is missing from at least one observation, it wont' be taken into account in the analysis.

Parameters:

  • regions : Sequence[str]

    The acronyms of brain regions to take into account in the relationship between brain activities and experimental design.

  • group1 : AnimalGroup

    The first cohort to take into account.

  • group2 : AnimalGroup

    The second cohort to take into account.

  • *groups : AnimalGroup, default= ()

    Any other cohort to take into account.

  • marker : str | Sequence[str], default= None

    The marker whose activity has to be studied. If multiple markers are given, they'll be coupled with the respective group.

Raises:

  • ValueError

    If marker is empty, and the given groups don't have a single marker to choose from.

Ly instance-attribute

The group scores, projection of the observed groups composition on u. Depending on the experimental design (e.g. the number of cohorts), there may be projections on multiple axes. In order to choose which latent variables to keep, you should generalize the results and test the null hypothesis.

Lx instance-attribute

The brain scores, projection of the brain activity observations on v. Depending on the experimental design (e.g. the number of cohorts), there may be projections on multiple axes. In order to choose which latent variables to keep, you should generalize the results and test the null hypothesis.

u property

Group profiles that best characterize \(R\), the matrix of the deviations of the groups to their grand mean.

s property

Singular values of \(R\), the matrix of the deviations of the groups to their gran mean.

v property

Brain regions profiles that best characterize \(R\), the matrix of the deviations of the groups to their grand mean.

s_sampling_distribution property

The sampling distribution of the singular values, result of a random permutation of the singular values. Each row is a single permutation sample.

v_salience_scores property

The normalised v scores with bootstrapping.

u_salience_scores property

The normalised u scores with bootstrapping.

n_components

n_components()

Returns the number of components of the current PLS. The number of components is determined by the number of group/marker compararisons of the PLS. For example, if the current PLS is comparing just two groups, it has only 1 component.

Returns:

  • int

    The maximum number of components of the current PLS.

random_permutation

random_permutation(n, seed=None)

Randomly shuffles to which group each brain is part of, and uses this permutation sample to compute the mean-centered task PLSC. This process is repeated n times. The resulting sampling distribution of the singular values can then be used to generalize the results (i.e. salient scores) of current PLS as a null hypothesis test.

Parameters:

  • n : int

    The number of permutations done to create the sampling distribution.

  • seed : Number, default= None

    A random seed.

bootstrap_salience_scores

bootstrap_salience_scores(n, seed=None)

Identifies the regions \(r\) and groups \(g\) that are stable by assigning them a score akin to Z-score. $$ \frac {v_r} {\hat \sigma(v_r)} \text { and } \frac {u_g} {\hat \sigma(u_g)} $$ This normalization relies on a set of bootstrap samples to make an estimator of salience standard error, \(\hat \sigma(v_r)\) and \(\hat \sigma(v_r)\).
This is achieved through repeatedly drawing samples with replacement from the original dataset. Within each sample, the brains' group remains unchanged while the composition of such groups may change. Mean-centered task PLS is computed on each sample, effectively creating a large number of \(u\) and \(v\) salience scores samples.

NOTE: any interpretation of the resulting normalized scores should be coupled with the result of a permutation test.

Parameters:

  • n : int

    The size of the set of bootstrap samples, used to compute the standard error.

  • seed : Number, default= None

    A random seed.

test_null_hypothesis

test_null_hypothesis()

Tests the null hypothesis on the sampling distribution of the singular values.

Returns:

  • float

    A p-value for each latent variable/component.

above_threshold

above_threshold(threshold, component=1)

Get the component-th regions salience scores that are above threshold.

Parameters:

  • threshold : float

    A Z-score value. See to_zscore

  • component : int, default= 1

    The n-th component (or latent variable) of the salience scores on which to apply the filter. It cannot be less than 1.

Returns:

  • pd.DataFrame

    The list of brain regions, along with the relative score, that have a salience above threshold.

to_zscore staticmethod

to_zscore(p, two_tailed=True)

Given a probability in null-hypothesis significance testing, it computes the equivalent Z-score

Parameters:

  • p : float

    The probability.

  • two_tailed : bool, default= True

    Whether the p corresponds to a two-tailed or one-tailed test.

Returns:

density

density(brain)

For each region \(r\) of brain, it computes the density \(D(m)\) for each marker \(m\): $$ D(m_r) : \frac {m_r} {size_r} $$ with \(m_r\) being the raw number of \(m\) detections in region \(r\).

Parameters:

Returns:

Raises:

  • ValueError

    If brain does not contain raw data of marker countings.

percentage

percentage(brain)

For each region \(r\) of brain, it computes the percentage \(P(m)\) for each marker \(m\) detection compared to brain-wide \(m\) counts: $$ P(m_r) : \frac {m_r} {m_{root}} $$ with \(m_r\) being the raw number of \(m\) detections in region \(r\).

Parameters:

  • brain : AnimalBrain

    The brain to compute percentage on.

Returns:

Raises:

  • ValueError

    If brain does not contain raw data of marker countings.

relative_density

relative_density(brain)

For each region \(r\) of brain, it computes the density fold change of each marker \(m\) compared to brain-wide marker density: $$ RD(m_r) : \frac {m_r/size_r} {m_{root}/size_{root}} $$ with \(m_r\) being the raw number of \(m\) detections in region \(r\).

Parameters:

  • brain : AnimalBrain

    The brain to compute relative density on.

Returns:

  • AnimalBrain

    A new brain with relative density data.

Raises:

  • ValueError

    If brain does not contain raw data of marker countings.

fold_change

fold_change(brain, group)

For each brain region in brain, compute the fold change of its markers with respect to group's mean.

Parameters:

  • brain : AnimalBrain

    The brain for which to compute the fold change.

  • group : AnimalGroup

    The group whose mean is the basis of the fold change.

Returns:

See also

diff_change

diff_change

diff_change(brain, group)

For each brain region in brain, compute the difference between its markers and the group's mean.

Parameters:

  • brain : AnimalBrain

    The brain for which to compute the difference.

  • group : AnimalGroup

    The group whose mean is subtracted.

Returns:

  • AnimalBrain

    A new brain of the difference of brain from group's mean.

See also

fold_change

markers_overlap

markers_overlap(brain, marker1, marker2)

For each region, it computes the ratio of positive cells and double positive counts; for both marker1 and marker2: $$ O(m_1,m_{1,2}) : \frac {m_1} {m_{1,2}} $$ $$ O(m_2,m_{1,2}) : \frac {m_2} {m_{1,2}} $$ with \(m_{1,2}\) being the number of detections being marker1 and marker2 positive.

Parameters:

  • brain : AnimalBrain

    The brain for which to compute the markers overlapping rate.

  • marker1 : str

    The first overlapping marker.

  • marker2 : str

    The second overlapping marker.

Returns:

  • AnimalBrain

    A new brain of the overlapping rate for marker1 and marker2.

Raises:

  • ValueError

    If brain does not contain raw data of marker countings.

  • ValueError

    If brain does not have any raw data of marker1 or marker2 countings

  • ValueError

    If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_jaccard_index

markers_jaccard_index(brain, marker1, marker2)

For each region, it computes the Jaccard index measuring the similarity between two markers activity and the respective double positivity. $$ J(m_1,m_2) : \frac {m_{1,2}} {m_1+m_2-m_{1,2}} $$ with \(m_{1,2}\) being the number of detections being marker1 and marker2 positive.

Parameters:

  • brain : AnimalBrain

    The brain for which to compute the markers Jaccard index.

  • marker1 : str

    The first overlapping marker.

  • marker2 : str

    The second overlapping marker.

Returns:

  • AnimalBrain

    A new brain of the Jaccard index for marker1 and marker2.

Raises:

  • ValueError

    If brain does not contain raw data of marker countings.

  • ValueError

    If brain does not have any raw data of marker1 or marker2 countings

  • ValueError

    If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_similarity_index

markers_similarity_index(brain, marker1, marker2)

For each region, it computes an index of similarity between two markers activity and the respective double positivity; it is defined as: $$ S(m_1,m_2) : \frac {m_{1,2}^2} {m_1 \cdot m_2} $$ with \(m_{1,2}\) being the number of detections being marker1 and marker2 positive.

NOTE: \(S(m_1,m_2) = 1 \iff m_1 = m_2 = m_{1,2}\).
Additionally, if either \(m1\) or \(m2\) is zero, it goes to infinite.

Parameters:

  • brain : AnimalBrain

    The brain for which to compute the markers similarity index.

  • marker1 : str

    The first overlapping marker.

  • marker2 : str

    The second overlapping marker.

Returns:

  • AnimalBrain

    A new brain of the similarity index between marker1 and marker2.

Raises:

  • ValueError

    If brain does not contain raw data of marker countings.

  • ValueError

    If brain does not have any raw data of marker1 or marker2 countings

  • ValueError

    If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_overlap_coefficient

markers_overlap_coefficient(brain, marker1, marker2)

For each region, it computes the overlapping coefficient (or Szymkiewicz–Simpson coefficient), an index of similarity between two markers activity and the respective double positivity; it is defined as: $$ S(m_1,m_2) : \frac {m_{1,2}} {\min({m_1}, {m_2})} $$ with \(m_{1,2}\) being the number of detections being marker1 and marker2 positive.

Parameters:

  • brain : AnimalBrain

    The brain for which to compute the markers overlapping coefficient.

  • marker1 : str

    The first overlapping marker.

  • marker2 : str

    The second overlapping marker.

Returns:

  • AnimalBrain

    A new brain of the overlapping coefficient between marker1 and marker2.

Raises:

  • ValueError

    If brain does not contain raw data of marker countings.

  • ValueError

    If brain does not have any raw data of marker1 or marker2 countings

  • ValueError

    If brain does not contain any raw data of the marker1 and marker2 double positive countings.

markers_difference

markers_difference(brain, marker1, marker2)

For each brain region in brain, compute the difference between two markers.

Parameters:

  • brain : AnimalBrain

    The brain for which to compute the difference.

  • marker1 : str

    The first marker to subtract.

  • marker2 : str

    The second marker to subtract.

Returns:

  • AnimalBrain

    A new brain of the difference between marker1 and marker2.

markers_correlation

markers_correlation(marker1, marker2, group, other=None, method='pearson')

For each brain region in group, compute the correlation between two markers within all animals in the cohort.

Parameters:

  • marker1 : str

    The first marker to correlate.

  • marker2 : str

    The second marker to correlate.

  • group : AnimalGroup

    The group from which all animals are taken to compute the correlation.

  • other : AnimalGroup, default= None

    If specified, it uses data from other's marker2.

  • method : str, default= 'pearson'

    Any method accepted by DataFrame.corrwith.

Returns:

  • BrainData

    Brain data of the correlation between marker1 and marker2.

pls_regions_salience

pls_regions_salience(group1, group2, selected_regions, marker=None, n_bootstrap=5000, component=1, fill_nan=True, seed=None, test_h0=True, p_value=0.05, n_permutation=5000)

Computes PLS between two groups with the same markers.
It estimates the standard error of the regions' saliences by bootstrap.

NOTE: it assumes that the component-th latent variable is generalisable by permutation test. If they were not, the resulting salience scores would not be reliable.

Parameters:

  • group1 : AnimalGroup

    The first cohort to analyze.

  • group2 : AnimalGroup

    The second cohort to analyze.

  • selected_regions : list[str]

    The acronyms of brain regions to take into account in the relationship between brain activities and experimental design.

  • marker : str, default= None

    The marker whose activity has to be studied. If None, a separate PLS will be computed on each marker in the groups independently.

  • n_bootstrap : int, default= 5000

    The n paremeter in bootstrap_salience_scores.

  • component : int, default= 1

    The n-th component (or latent variable) of the salience scores.

  • fill_nan

    Whether to fill with NA the scores of those regions for which the salience is not computable (e.g. if brain data is missing in at least one brain of the groups).

  • seed

    A random seed.

Returns:

  • BrainData | dict[str, BrainData]

    A BrainData of the regions salience scores based on marker activity. If marker=None and the groups have multiple markers, it returns a dictionary mapping each marker into the respective regions salience.