Inside the black box: Unlocking the potential of big data in neurodegenerative diseases

Big data, artificial intelligence and machine learning offer huge opportunities for the collection and assimilation of medical information on a large scale. In a keynote lecture at the 11th CTAD Congress, Professor Cristina Sampaio described how machine learning is currently being employed for problem solving in neurodegenerative diseases.  

In Alzheimer’s disease, a machine learning algorithm may help to identify prodromal Alzheimer’s disease patients in the general population, promoting timely referral for assessment and treatment.

Understanding how machine learning works is key to maximizing the potential for big data and machine learning in neurodegenerative diseases, said Professor Cristina Sampaio, CHDI Foundation, Princeton, USA in her keynote lecture at the 11th CTAD Congress.

Regulatory bodies are currently cautious regarding the evidentiary value of big data and machine learning technologies. Consequently, it is important to build trust with regulatory agencies to furnish acceptance of big data, advised Professor Sampaio. The ‘black box’ approach – not understanding the mechanism, why and how the results are generated – should be minimized. The use of good quality data analyzed with machine learning might be a stepping stone to build that trust, she added.

Big data is associated with large unstructured data sets. Most clinical research analyses are based in classical structured medical databases. Currently and in the near term, researchers are mostly using structured databases, i.e. quality data, to deploy artificial intelligence and machine learning technologies.

For an in-depth article on Big Data, read the Lundbeck Institute Campus feature: Big Data in Healthcare

Machine learning may be best applied to 3 areas of problem-solving:

  • Classification: providing the best possible assignment of a set of observations to a set of labels representing the classes
  • Clustering: where the focus is to identify subgroups
  • Regression: identification of the effects of a feature on the response variable, associated to a strong emphasis on prediction.

Where machine learning currently excels is in predicting the future said Professor Sampaio. A good example of the employment of machine learning for prediction is given by the IBM-CHDI progression and staging model for Huntington disease

The IBM-CHDI progression and staging model is a fully integrated data-driven approach for a problem that usually is addressed by expert consensus methods. Its success may change the paradigm of how disease staging systems are developed in future.

Machine learning currently excels in predicting the future

Where are the big data in Alzheimer’s disease?

In Alzheimer’s disease, there are numerous national and international network databases, including the Alzheimer’s disease Neuroimaging Initiative (ADNI), Real world Outcomes across the Alzheimer’s Disease spectrum for better care: Multi-modal data Access Platform (ROADMAP), Global Alzheimer’s association interactive network (GAAIN), the National Alzheimer’s disease Coordinating Center (NACC) database, and over 400+ clinical trials conducted since 2000.

A machine learning algorithm may help to identify prodromal Alzheimer’s disease patients in the general population

In an oral communication at CTAD 2018, Chatiyana Alamuri, IQVIA Analytics Center of Excellence described a machine learning algorithm that helps identify prodromal Alzheimer’s disease patients in the general population.4 Using data from an initial cohort of 405 million subjects they identified over 660,000 subjects with a family history of Alzheimer’s disease, and features such as diagnostic procedures, medical interventions, concomitant pathologies and other characteristics that differentiated those with Alzheimer’s disease from controls. The proposed machine learning predictive algorithm developed may allow for more accurate and earlier prodromal Alzheimer’s disease diagnosis at the primary care level, with timely referral for inclusion in clinical trials and earlier assessment and treatment.

Further reading: Making the most of big data in Alzheimer’s disease.


  1. Jack CR Jr et al. Alzheimers Dement. 2018;14:535-62.
  2. Ross CA et al. Nat Rev Neurol. 2014;10:204-16.
  3. Wilson H et al. Front Neurol. 2017 Jan 30;8:11. doi: 10.3389/fneur.2017.00011.
  4. Uspenskaya-Cadoz O. et al. CTAD October 24-27 2018; Abstract OC17.