Classifying Neuropsychiatric Disorder Diagnoses Using Resting State BOLD fMRI Connectivity Data

By Peter Brotherwood
Published on August 6, 2022 August 6, 2022

"Can functional connectivity data be used as a predictor for neuropsychiatric diagnosis? This project explores the usefulness of connectivity data in predicting ADHD, Bipolar Disorder, and Schizophrenia diagnoses using machine learning classification methods."

About Me

_{Peter Brotherwood}

I am a first year PhD student at the University of Montreal studying in computational neuroscience. I come from a fairly different background, with a BSc in Genetics and an MSci in Bioinformatics from the University of Birmingham. Much of my current work is in perception, using machine learning based approaches to model representational spaces in individual subjects. My hope is that BHS will introduce me to tools and best prectices I need to learn in order to fully integrate into the field of computational cognitive neuroscience.

Project Summary

Introduction

Brain regions with correlated temporal activity are seen to form functional networks of varying scale and distribution. Regions correlated at rest form resting state networks. Aberrant functional connectivity of resting state networks has been observed in multiple populations sufering from neuropsychiatric disorders; including ADHD (Sudre et al., 2017), Bipolar Disorder (Syan et al., 2018), and Schizophrenia (Sheffield & Barch, 2016). Given these observed differences in resting state network connectivity, this project aims to apply machine learning methods to investigate if aberrations in resting state functional connectvitiy can be used to identify neuropsychiatric diagnoses.

Main Objectives

Provide a full neuroimaging workflow from preprocessing of raw data to visualisation of results.
Emphasize reproducibility, making all elements of the project as reproducible as possible.
Investigate ability of machine learning algorithms in predicting phenotype based on connectivity data.

Personal Objectives

Learn more about open data and project reproducibility.
Gain an understanding of fMRI and neuroimaging database structures and best practices.
Develop skills in proprocessing and analysis of fMRI data.
Apply knowledge of machine learning to neuroscientfic studies.

Tools

Compute Canada for Job Submission
Git and Github for Version Control
DataLad for Reproducibility
Singularity for Reproducibility
fMRIPrep for data preprocessing
Python Packages: matplotlib, seaborn, scikit-learn, nilearn

Data

The dataset used in this study comes from the UCLA Consortium for Neuropsychiatric Phenomics LA5c Study (Poldrack et al., 2016). The dataset is comprised of fMRI data for 122 healthy individuals, and 142 individuals with neuropsychiatric disorders. Of these 139 individuals, 50 are diagnosed with schizophrenia, 49 with bipolar disorder, and 40 with ADHD. The dataset contains fMRI data collected at rest and over a series of attentional tasks. The fMRI data is in nifti format and the dataset is provided in BIDS format. More information on this dataset can be found at https://openneuro.org/datasets/ds000030/versions/1.0.0.

A summary of the dataset is as follows:

	Participants	Male	Female	Average Age	Age Std
Control	122	65	57	32.05	10.28
Schizophrenia	50	38	21	35.29	8.94
Bipolar	49	28	19	31.59	8.77
ADHD	40	21	12	36.46	8.79
Total	261	152	109	33.29	9.29

Project Deliverables

Reproducible project workflow, detailed in git repo and via datalad logs, reproducible via containers.
Executable Python scripts for data preparation and machine learning
Markdown file introducing the project and detailing project results

Results

Preprocessing using `fMRIprep`

Preprocessing of raw fMRI data was done using fMRIprep (Esteban et al., 2018) and executed via the preprocessing.sh script on Alliance Canada’s Beluga HPC Cluster. Due to lack of resting state data for some subjects, 260 preprocessed resting state BOLD fMRI and their associated confound files were returned. fMRIprep was run using singularity 3.8 with the following steps:

Brain masking and tissue segementation of T1w image
Spatial normalization of the anatomical T1w reference
Surface reconstruction using FreeSurfer
Alignment of functional and anatomical MRI data
Brain masking and confound extraction

Getting Connectivity Data using `nilearn`

Brain masking and connectivity data retrieval was done using nilearn (Abraham et al., 2014). The BASC multiscale deterministic atlas (Bellec et al., 2009) with 64 regions of interest (ROIs) was used to mask the preprocessed voxel-wise BOLD activity data, in order to reduce the complexity of the features. A plot of this atlas can be seen below:

Confounds detected by fMRIprep were loaded using nilearn’s load_confounds_strategy method and regressed out during masking. Following this, connectivity matrices showing correlations in BOLD timeseries activity between ROIs were generated for each subject. The impact of confound removal strategy on the output connectivity matrices can be seen below:

The upper triangular vector (UTV) of each subject’s connectivity matrix forms a set of features which will serve as input to the machine learning step. The UTVs of all matrices can be combined to form a feature matrix:

Machine Learning using `scikit-learn`

Phenotype was learned and predicted using a support vector classifier with a linear kernel from scikit-learn (Pedregrosa et al., 2011). 30% of individuals were used as unseen test data, and a grid search with 5-fold cross validation was performed on the remaining 70% of subjects to find the optimum value for the the C regularizer. This process was done once with the full dataset, and once with a stratified dataset, as seen below:

Using each model to predict the unseen test data, the folllowing multi-label confusion matrices were obtained:

With the corresponding classification summaries:

All Subjects

Accuracy: 0.456

	Control	ADHD	Bipolar Disorder	Schizophrenia	Macro Average	Weighted Average
Precision	0.528	0.0	0.083	0.583	0.299	0.374
Recall	0.757	0.0	0.067	0.467	0.323	0.456
F1 Score	0.622	0.0	0.074	0.519	0.304	0.404
Support	37	12	15	15

Stratified Subjects

Accuracy: 0.316

	Control	ADHD	Bipolar Disorder	Schizophrenia	Macro Average	Weighted Average
Precision	0.384	0.176	0.294	0.500	0.338	0.347
Recall	0.333	0.250	0.333	0.333	0.313	0.316
F1 Score	0.357	0.207	0.313	0.400	0.319	0.325
Support	15	12	15	15

Following this, SVC coefficients were extracted to identify those features which had the most extreme weights when predicting phenotype:

Control vs Schizophrenia