Data preparation¶
Here we show how you can use braian
python library to prepare the data and perform a preliminary analysis.
In this example we focus specifically on how to read data exported with BraiAn on QuPath.
It assumes the following:
- The brain sections were aligned with ABBA and the registrations imported to a QuPath project.
Remember: 1 QP project == 1 brain; do as many QuPath projects as many animals you have. - Automatic image analysis for objects segmentation (automatic cell detection) was performed in QuPath using BraiAn extension. If you used it, the detection counts are saved into
.csv
/.tsv
files, one per section. - Brain regions that you wish to exclude from further analysis for each project are exported to
.txt
files, one per section.
Remember: this is done within BraiAn on QuPath.
Before we start¶
We want to import all braian
packages and subpackages.
import braian
import braian.config
import braian.plot as bap
import braian.stats as bas
import plotly.io as pio
from pathlib import Path
# This ensures BraiAn's figures works in multiple places:
pio.renderers.default = "plotly_mimetype+notebook"
root_dir = Path.cwd().absolute().parent # braian experiment root
config_file = root_dir/"config_example.yml" # configuration path
Let's first see the content of the configuration file
with config_file.open() as f:
print(f.read())
# SPDX-FileCopyrightText: 2024 Carlo Castoldi <carlo.castoldi@outlook.com> # # SPDX-License-Identifier: CC0-1.0 experiment: name: "example" output_dir: "data/BraiAn_output" groups: HC: ["287HC", "342HC", "343HC", "346HC", "371HC"] CTX: ["329CTX", "331CTX", "355CTX", "400CTX", "401CTX", "402CTX"] FC: ["367FC", "368FC", "369FC", "426FC", "427FC", "428FC"] atlas: version: "v3" excluded-branches: ["retina", "VS", "grv", "fiber tracts", "CB"] brains: raw-metric: "sum" qupath: files: dirs: output: "data/QuPath_output" results_subdir: "results" exclusions_subdir: "regions_to_exclude" suffix: results: "_regions.txt" exclusions: "_regions_to_exclude.txt" markers: AF568: "cFos" AF647: "Arc" exclude-parents: true min-slices: 0
So, this configuration file is meant for you to specify all important information about your experiment (e.g., in experiment>name
you can indicate the name of the experiment, in experiment>output_dir
you can indicate the location where you wish your output data to be saved, ... ). For detailed instructions on how to fill each field in the YAML file, see below.
config = braian.config.BraiAnConfig(config_file, "/tmp") # we instantiate the config
Since we intend on saving some normalized data and outputs of the analysis, we make sure that the directory defined in experiment>output_dir
exists. The path to such directory can be both absolute o relative to the position of the config file.
config.output_dir.mkdir(parents=True, exist_ok=True)
The Allen Institute brain ontology¶
We start by importing the brain atlas ontology defined in atlas>version
in the YAML file.
The ontology stores information about relationships between brain regions (e.g. the Ammon's horn is a subregion of the hippocampal formation, and has Field CA1, CA2 and CA3 as subregions) as well as metadata for each region (e.g. the acronym, the full name, the distance from the root region,...).
This simple organization is at the basis of braian
and ensures that the data is correctly positioned within the brain atlas ontology. It also helps analysing and visualising the data in a coherent manner, with no regions overlapping between each other.
BraiAnConfig
uses the following options from the YAML file to instantiate an ontology:
atlas>version
: the version of the brain ontology against which the data was aligned. Currently it only supports Allen Institute mouse brain ontology throughbraian.AllenBrainOntology
, from v1 to v4.atlas>excluded-branches
: the list of regions that you would like to exclude from the analysis. As if they never existed. All corresponding subregions will also be excluded. Each brain region is identified by its acronym.
You can easily navigate the Allen ontology with an interactive plot showing each region metadata, through bap.hierarchy()
.
atlas_ontology = config.read_atlas_ontology()
bap.hierarchy(atlas_ontology).update_layout(margin_l=15)
If desired, we can list all the regions in the above tree, in depth-first order
atlas_ontology.list_all_subregions("root", mode="depth")[:12] # depth-first order
['root', 'grey', 'CH', 'CTX', 'CTXpl', 'Isocortex', 'FRP', 'FRP1', 'FRP2/3', 'FRP5', 'FRP6a', 'FRP6b']
But most importantly, braian
offers ways to select only regions that don't overlap between each other.
Both manually...
atlas_ontology.add_to_selection(["CTXpl", "STR", "PAL", "IB", "MB", "BS", "CB"])
# IB and MB are both within the brain stem. They are not selected.
# CB was excluded in the config file.
atlas_ontology.get_selected_regions()
['CTXpl', 'STR', 'PAL', 'BS']
or using pre-defined sets of regions at different parcellations levels that ensure whole-brain coverage as well as no overlap between regions (i.e. major divisions, summary structures and leaves as defined by Wang et al. 2020) or aided sets (i.e. depth in the ontology and Allen's structural level)
print("#summary structures:", len(atlas_ontology.get_regions("summary structures")))
print("#structural level 9:", len(atlas_ontology.get_regions("structural level 9")))
#summary structures: 295 #structural level 9: 110
Reading from QuPath¶
Now, we're ready to read the .csv
/.tsv
files with previously exported cell counts—e.g. using BraiAn's extension in QuPath (BraiAnDetect).
BraiAnConfig
uses the following options from the YAML file to instantiate a sliced brain from QuPath:
qupath>files>dirs>output
: the directory where QuPath exported the output files.qupath>files>dirs>results_subdir
: the sub-directory in which the cell counts files for each image are stored. Can be nullqupath>files>dirs>exclusions_subdir
: the sub-directory in which the region exclusions files for each image are stored. Can be nullqupath>files>suffix
: cell counts files for each image should follow this naming:<IMAGE_ID><SUFFIX>.<FILE_EXTENSION>
.results
: anything after<IMAGE_ID>
to identify cell counts files.exclusions
: anything after<IMAGE_ID>
to identify region exclusions files.
qupath>files>markers
: a list of key-values, where the key is the name of an image channel in QuPath, and the value is the name of the marker visible on that same image channel.qupath>exclude-parents
: whether the cell counts of the parent regions should be excluded from the image (=true
) or subtracted (=false
).
braian
organises whole-brain data by slice, by animal, by group and by experiment.
For further data analysis data from brain sections are reduced into a comprehensive dataset of a single brain. braian
offers ways to reduce a SlicedBrain
to an AnimalBrain
, where each region has a single value instead of one for each section. The function used to make the reduction can be chosen between the following SliceMetrics
:
SUM
MEAN
STD
CVAR
Before aggregating, however, we are interested in keeping it sliced=True
in order to perform some checks on the cell countings computed in QuPath, capitalizing on between slices variability.
experiment_sliced = config.experiment_from_qupath(sliced=True)
Animal '287HC' - The corresponding regions_to_exclude excludes everything: SILVA_n287_cFos Arc HC_2023_12_22__3976.czi - Scene #08 Animal '287HC' - The corresponding regions_to_exclude excludes everything: SILVA_n287_cFos Arc HC_2023_12_22__3979.czi - Scene #07 Animal '287HC' - The corresponding regions_to_exclude excludes everything: SILVA_n287_cFos Arc HC_2023_12_22__3979.czi - Scene #09 Animal '287HC' - The corresponding regions_to_exclude excludes everything: SILVA_n287_cFos Arc HC_2023_12_22__3979.czi - Scene #11 Animal '346HC' - The corresponding regions_to_exclude excludes everything: SILVA_n346_cFos Arc HC_2023_12_22__3969.czi - Scene #03 Animal '346HC' - The corresponding regions_to_exclude excludes everything: SILVA_n346_cFos Arc HC_2023_12_22__3972.czi - Scene #05 Animal '346HC' - The corresponding regions_to_exclude excludes everything: SILVA_n346_cFos Arc HC_2023_12_22__3974.czi - Scene #2 Animal '346HC' - The corresponding regions_to_exclude excludes everything: SILVA_n346_cFos Arc HC_2023_12_22__3974.czi - Scene #3 Animal '371HC' - The corresponding regions_to_exclude excludes everything: SILVA_n371_cFos Arc HC_2023_12_22__3952.czi - Scene #03 Animal '371HC' - The corresponding regions_to_exclude excludes everything: SILVA_n371_cFos Arc HC_2023_12_22__3956.czi - Scene #08 Animal '400CTX' - The corresponding regions_to_exclude excludes everything: SILVA_400.5 CTX.czi - Scene #05 Animal '427FC' - The corresponding regions_to_exclude excludes everything: SILVA_427.4 FC.czi - Scene #06 Animal '427FC' - The corresponding regions_to_exclude excludes everything: SILVA_427.5 FC.czi - Scene #09 Animal '427FC' - The corresponding regions_to_exclude excludes everything: SILVA_427.6 FC.czi - Scene #05 Animal '427FC' - The corresponding regions_to_exclude excludes everything: SILVA_427.6 FC.czi - Scene #06 Animal '427FC' - The corresponding regions_to_exclude excludes everything: SILVA_427.6 FC.czi - Scene #07 Animal '427FC' - The corresponding regions_to_exclude excludes everything: SILVA_427.6 FC.czi - Scene #08 Animal '428FC' - The corresponding regions_to_exclude excludes everything: SILVA_428.5 FC.czi - Scene #7
If we check which markers the first animal of the Home Cage group has, we indeed see that it confirms what's written in he configuration file: we have positive cell counts for "cFos", "Arc" and the double positive (i.e. "cFos+Arc")
experiment_sliced.hc.animals[0].markers
['cFos', 'Arc', 'cFos+Arc']
Quality control of automatic cell detection results¶
To have an idea of the variability of our data we can plot the global (root's) cell density (number of detected cells/mm²) for each of them.
In the example below, we can notice that some animals (e.g. 400CTX) have very few images. This is due to exclude-parents=true
parameter we chose in the config file and applied when reading the QuPath files.
bap.slice_density(experiment_sliced, ["Left: root", "Right: root"], width=None, height=500)\
.update_layout(
margin=dict(l=40, r=0, t=60, b=120),
legend=dict(
orientation="h",
yanchor="bottom",
y=1,
xanchor="left",
x=0,
))
The global density throughout each brain section of each marker is useful to give us a general idea and identify possible outlier sections that could derive from artifacts. However, it doesn't show the whole picture: are there regions whose density changes a lot between sections of the same animal? If so, we might want to go back to the cell segmentation of that image and check that it was accurate.
For this, it is very convenient to plot the standard deviation between sections of the markers density of each region. STD, however, is meaningful only when coupled with its mean...
Introducing: the coefficient of variation:
$$
CV = \frac {\sigma(\bar d)} {\mu(\bar d)}
$$
That's what we use to analyse the cell density variability in all slices of all brains, of all group of the experiment.
experiment_cvar = experiment_sliced.to_experiment(braian.SliceMetrics.CVAR, min_slices=0, densities=True, hemisphere_distinction=False, validate=False)
We are interested in identifying the regions that have a coefficient of variation $CV > 1.5$, meaning that we are selecting those regions whose standard deviation of the cell density is at least $\times 1.5$ times larger than the mean.
CVAR_THRESHOLD = 1.5
print("%regions > CV threshold")
for group_cvar in experiment_cvar.groups:
print(f"Group: {group_cvar.name}")
for marker in group_cvar.markers:
n = sum([len(brain_cvar[marker]) for brain_cvar in group_cvar.animals])
n_above_avg = sum([(brain_cvar[marker].data > CVAR_THRESHOLD).sum() for brain_cvar in group_cvar.animals])
print(f"\t{marker}: {n_above_avg/n*100:.2f}%")
%regions > CV threshold Group: HC cFos: 1.77% Arc: 4.46% cFos+Arc: 19.13% Group: CTX cFos: 1.74% Arc: 11.50% cFos+Arc: 17.27% Group: FC cFos: 1.55% Arc: 12.59% cFos+Arc: 16.57%
braian.plot
also comes in handy as it allows to plot, for each animal, how many (bars) and which (scatter) brain region are above a certain threshold.
When applied to the coefficient of variation of the experiment, we expect to spot inconsistencies if there was some problem in the slice registration or segmentation steps.
bap.above_threshold(experiment_cvar, CVAR_THRESHOLD, atlas_ontology.get_regions("summary structures"), width=None, height=500)\
.update_layout(
margin=dict(l=40, r=0, t=100, b=120),
legend=dict(
orientation="h",
yanchor="bottom",
y=1,
xanchor="left",
x=0,
))
If we spot regions with a high CV between sections of the same animal, we may want to go back to the corresponding image files and segmentation results to manually check that everything is ok.
This can be done thanks to these two functions:
def print_region_stats(brain: braian.SlicedBrain, region_acronym: str, marker=None):
brain = braian.SlicedBrain.merge_hemispheres(brain)
slice_count = brain.count()
if region_acronym not in slice_count:
print(f"Can't find region '{region_acronym}' for animal '{brain.name}'")
return
markers = brain.markers if marker is None else [marker]
brain_avg = braian.AnimalBrain.from_slices(brain, metric=braian.SliceMetrics.MEAN, densities=True)
brain_std = braian.AnimalBrain.from_slices(brain, metric=braian.SliceMetrics.STD, densities=True)
brain_cvar = braian.AnimalBrain.from_slices(brain, metric=braian.SliceMetrics.CVAR, densities=True)
for m in markers:
print(f"""Summary for brain region '{region_acronym}' of marker '{m}':
- N slices: {slice_count[region_acronym]}
- Mean: {brain_avg[m][region_acronym]:.2f} {m}/mm²),
- S.D.: {brain_std[m][region_acronym]:.2f} {m}/mm²,
- Coefficient of Variation: {brain_cvar[m][region_acronym]}
""")
import pandas as pd
def check_slices(brain: braian.SlicedBrain, region_acronym: str):
slices = []
brain = braian.SlicedBrain.merge_hemispheres(brain)
for slice in brain.slices:
if region_acronym not in slice.markers_density.index:
continue
region_densities = slice.markers_density.loc[region_acronym].copy()
region_densities.index += " density"
region_densities.name = slice.name
slices.append(region_densities)
return pd.concat(slices, axis=1) if len(slices) != 0 else None
animal_name = "343HC"
region_acronym = "ACAd"
from IPython.display import display
if animal_name in experiment_sliced:
brain = experiment_sliced[animal_name]
print_region_stats(brain, region_acronym, marker="cFos")
display(check_slices(brain, region_acronym))
else:
print(f"Can't find region '{region_acronym}' for animal '{animal_name}'")
Summary for brain region 'ACAd' of marker 'cFos': - N slices: 17 - Mean: 826.71 cFos/mm²), - S.D.: 3021.21 cFos/mm², - Coefficient of Variation: 3.6544789214730993
LINDA_343.1_HC_cFos_Arc1.czi - Scene #01 | LINDA_343.1_HC_cFos_Arc1.czi - Scene #02 | LINDA_343.1_HC_cFos_Arc1.czi - Scene #03 | LINDA_343.1_HC_cFos_Arc1.czi - Scene #04 | LINDA_343.1_HC_cFos_Arc1.czi - Scene #05 | LINDA_343.1_HC_cFos_Arc1.czi - Scene #06 | LINDA_343.1_HC_cFos_Arc1.czi - Scene #07 | LINDA_343.1_HC_cFos_Arc1.czi - Scene #08 | LINDA_343.2_HC_cFos_Arc1.czi - Scene #02 | LINDA_343.2_HC_cFos_Arc1.czi - Scene #03 | LINDA_343.2_HC_cFos_Arc1.czi - Scene #04 | LINDA_343.2_HC_cFos_Arc1.czi - Scene #05 | LINDA_343.2_HC_cFos_Arc1.czi - Scene #06 | LINDA_343.2_HC_cFos_Arc1.czi - Scene #07 | LINDA_343.2_HC_cFos_Arc1.czi - Scene #08 | LINDA_343.3_HC_cFos_Arc1.czi - Scene #01 | LINDA_343.3_HC_cFos_Arc1.czi - Scene #02 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cFos density | 177.015134 | 112.805284 | 115.307008 | 186.715314 | 18.178486 | 52.624729 | 139.047501 | 84.003661 | 52.465409 | 127.947189 | 61.052466 | 127.208744 | 103.218842 | 67.188914 | 80.270744 | 0.000000 | 12549.098347 |
Arc density | 145.860470 | 186.833752 | 121.375798 | 243.164130 | 68.169324 | 251.960823 | 202.453161 | 205.587906 | 180.714188 | 189.551391 | 223.859043 | 146.054483 | 250.674330 | 204.766214 | 508.381376 | 615.336398 | 0.000000 |
cFos+Arc density | 14.161211 | 17.625826 | 14.160510 | 28.224408 | 4.544622 | 9.568133 | 13.348560 | 13.263736 | 11.658980 | 9.477570 | 11.871313 | 9.422870 | 23.592878 | 3.199472 | 53.513829 | 0.000000 | 0.000000 |
This allows to easily expose the specific segmentation results for the region of interest so that we can go back to QuPath, and we check each of these images to see if there was any problem with the segmentation or registration steps. If so, we fix the issues, export the data from QuPath and restart the analysis! This can be re-iterated for each region showing a high CV.
Data aggregation¶
Once we are satisfied with the quality of the data, we can save the data. This can be done for each single brain, as well as for the collective aggregated data from a group.
BraiAnConfig
uses the following options from the YAML file to apply the reduction slices→brain:
brains/raw-metric
: one of "raw", "sum" or "mean"/"avg". It specifies the metric used for the reduction. It should be a metric that retains the same units of measurements—i.e. the raw/absolute number of detections-per-region.qupath/min-slices
: the minimum number of sections needed in order for the reduction value of a brain region to be reliable. If a region has less sections thanmin-slices
, its data will be discarded.
# NOTE: groups are being written WITH Left/Right discrimination. The single brains are not.
experiment = config.experiment_from_sliced(experiment_sliced, validate=True)
for group in experiment.groups:
group.to_csv(config.output_dir, overwrite=True)
for animal in group.animals:
animal = animal.merge_hemispheres()
output = animal.to_csv(config.output_dir, overwrite=True)
print(f"{animal} saved to {output}")
AnimalBrain(name='287HC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/287HC_sum.csv AnimalBrain(name='342HC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/342HC_sum.csv AnimalBrain(name='343HC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/343HC_sum.csv AnimalBrain(name='346HC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/346HC_sum.csv AnimalBrain(name='371HC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/371HC_sum.csv AnimalBrain(name='329CTX', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/329CTX_sum.csv AnimalBrain(name='331CTX', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/331CTX_sum.csv AnimalBrain(name='355CTX', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/355CTX_sum.csv AnimalBrain(name='400CTX', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/400CTX_sum.csv AnimalBrain(name='401CTX', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/401CTX_sum.csv AnimalBrain(name='402CTX', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/402CTX_sum.csv AnimalBrain(name='367FC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/367FC_sum.csv AnimalBrain(name='368FC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/368FC_sum.csv AnimalBrain(name='369FC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/369FC_sum.csv AnimalBrain(name='426FC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/426FC_sum.csv AnimalBrain(name='427FC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/427FC_sum.csv AnimalBrain(name='428FC', mode=sum, markers=['cFos', 'Arc', 'cFos+Arc']) saved to /home/castoldi/Projects/BraiAn/data/BraiAn_output/428FC_sum.csv