Editor's note: This is a guest post by Ani Eloyan. She is an Assistant Professor of Biostatistics at Brown University. Dr. Eloyan’s work focuses on semi-parametric likelihood based methods for matrix decompositions, statistical analyses of brain images, and the integration of various types of complex data structures for analyzing health care data. She received her PhD in statistics from North Carolina State University and subsequently completed a postdoctoral fellowship in the Department of Biostatistics at Johns Hopkins University. Dr. Eloyan and her team won the ADHD200 Competition discussed in this article. She tweets @eloyan_ani.
Neuroscience is one of the exciting new fields for biostatisticians interested in real world applications where they can contribute novel statistical approaches. Most research in brain imaging has historically included studies run for small numbers of patients. While justified by the costs of data collection, the claims based on analyzing data for such small numbers of subjects often do not hold for our populations of interest. As discussed in this article, there is a huge demand for biostatisticians in the field of quantitative neuroscience; so called neuroquants or neurostatisticians. However, while more statisticians are interested in the field, we are far from competing with other substantive domains. For instance, a quick search of abstract keywords in the online program of the upcoming JSM2015 conference of “brain imaging” and “neuroscience” results in 15 records, while a search of the words “genomics” and “genetics” generates 76 records.
Assuming you are trained in statistics and an aspiring neuroquant, how would you go about working with brain imaging data? As a graduate student in the Department of Statistics at NCSU several years ago, I was very interested in working on statistical methods that would be directly applicable to solve problems in neuroscience. But I had this same question: “Where do I find the data?” I soon learned that to reallyapproach substantial relevant problems I also needed to learn about the subject matter underlying these complex data structures.
In recent years, several leading groups have uploaded their lab data with the common goal of fostering the collection of high dimensional brain imaging data to build powerful models that can give generalizable results. Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC) founded in 2006 is a platform for public data sharing that facilitates streamlining data processing pipelines and compiling high dimensional imaging datasets for crowdsourcing the analyses. It includes data for people with neurological diseases and neurotypical children and adults. If you are interested in Alzheimer’s disease, you can check out ADNI. ABIDE provides data for people with Autism Spectrum Disorder and neurotypical peers. ADHD200 was released in 2011 as a part of a competition to motivate building predictive methods for disease diagnoses using functional magnetic resonance imaging (MRI) in addition to demographic information to predict whether a child has attention deficit hyperactivity disorder (ADHD). While the competition ended in 2011, the dataset has been widely utilized afterwards in studies of ADHD. According to Google Scholar, the paper introducing the ABIDE set has been cited 129 times since 2013 while the paper discussing the ADHD200 has been cited 51 times since 2012. These are only a few examples from the list of open access datasets that could of utilized by statisticians.
Anyone can download these datasets (you may need to register and complete some paperwork in some cases), however, there are several data processing and cleaning steps to perform before the final statistical analyses. These preprocessing steps can be daunting for a statistician new to the field, especially as the tools used for preprocessing may not be available in R. This discussion makes the case as to why statisticians need to be involved in every step of preprocessing the data, while this R package contains new tools linking R to a commonly used platform FSL. However, as a newcomer, it can be easier to start with data that are already processed. This excellent overview by Dr. Martin Lindquist provides an introduction to the different types of analyses for brain imaging data from a statisticians point of view, while ourpaper provides tools in R and example datasets for implementing some of these methods. At least one course on Coursera can help you get started with functional MRI data. Talking to and reading the papers of biostatisticians working in the field of quantitative neuroscience and scientists in the field of neuroscience is the key.