From group-level significance to single-subject prediction: Using machine learning to classify and predict outcomes for individual patients

Identifying which risk factors for schizophrenia can be used as biomarkers would allow prediction of those at risk of disease and who will transition from ‘at-risk mental state’ to psychosis1, identify those at familial risk, and determine who will respond to particular treatments. Machine learning may be the key to using group-level data to make individual patient predictions.

Need for biomarkers for schizophrenia


Machine learning may be the key to using group-level data to make individual patient predictions

Despite many years of psychiatric research, making a diagnosis may still be ambiguous, prognosis is unpredictable and response to treatment is uncertain. This is especially true for heterogeneous disorders like schizophrenia, which have a complex interaction of multiple risk factors. Howes and Murray proposed an integrated sociodevelopmental-cognitive model of schizophrenia2, involving genetic, developmental and social adversity risk factors. Translating these into biomarkers would facilitate prevention, early intervention, and effective targeting of therapies and resources.

Translating risk factors into biomarkers would facilitate prevention, early intervention, and effective targeting of therapies

How can multimodal machine learning help?

Machine learning (ML) is ‘the scientific discipline that focuses on how computers learn from data’3, building statistical models from large data sets. This technique can be used to study patterns of risk factors from large numbers of individuals to classify or predict outcomes at the individual patient level.

Combining data sources to help diagnosis and prediction

Multimodal machine learning outperformed unimodal algorithms by combining different data sources to model real-world scenarios

Linda Antonucci (University of Bari Aldo Moro, Italy) discussed three applications of multimodal ML:

  • Diagnosis of schizophrenia is unclear - can we discriminate between individuals in different groups?
  • Familial risk for schizophrenia – can we identify commonalities and differences between disease and risk-related brain signatures?
  • Clinical high risk for schizophrenia – can we predict an individual prognostic outcome along time?

Their experiments showed that cognitive factors alone were most informative for schizophrenia classification, but gene-environment stratification modulated schizophrenia cognitive performance versus healthy control. Some functional connectivity abnormalities were shared by patient with schizophrenia and healthy siblings, but there was evidence for a potential sibling-specific functional connectivity signature, suggesting familial risk-related compensatory neurofunctional mechanisms exist4. In those at-risk of schizophrenia, occupational functioning was more strongly determined by environmental and clinical factors than social functioning. Multimodal ML outperformed unimodal algorithms by combining different data sources, which could help producing models that are closer to real-world scenarios for translation into clinical practice.

Personalizing interventions

Predictors of treatment response could allow more personalized interventions

Meta-analysis has shown that cognitive training improves cognitive dysfunction in schizophrenia5, but results are heterogeneous. Lana Kambeitz-Ilankovic (University of Cologne, Germany) described how their group was using multivariate pattern analysis6 to predict response to cognitive training in recent onset psychosis. They elucidated treatment response neuromarkers, based on resting state functional connectivity sensory processing change, which could allow more personalized interventions.

Predicting transition to psychosis

Diana Prata (University of Lisbon, Portugal) concluded the session by discussing her work to predict individual transition from at-risk mental state to psychosis using ML and genetic, environmental and structural magnetic resonance imaging data. Median balanced accuracy results were in the range 50-66%, which suggests that this is not yet an effective tool. Further work needs to be done with larger sample sizes and more homogeneous populations.

Our correspondent’s highlights from the symposium are meant as a fair representation of the scientific content presented. The views and opinions expressed on this page do not necessarily reflect those of Lundbeck.


  1. Mechelli A, et al. Drug Discov Today 2015;20:924-7
  2. Howes OD, Murray RM. Lancet 2014;383:1677-87
  3. Deo RC. Circulation 2015; 132: 1920–30
  4. Antonucci LA, et al. Neuropsychopharmacology 2020;45:613-621
  5. Kambeitz-Ilankovic L, et al. Neurosci Biobehav 2019;107:828-45
  6. Kambeitz J et al. Neuropsychopharmacology 2015;40:1742-51