Developing multifactorial dementia prediction models using clinical variables from cohorts in the us and australia

Developing multifactorial dementia prediction models using clinical variables from cohorts in the us and australia


Play all audios:


ABSTRACT Existing dementia prediction models using non-neuroimaging clinical measures have been limited in their ability to identify disease. This study used machine learning to re-examine


the diagnostic potential of clinical measures for dementia. Data was sourced from the Australian Imaging, Biomarkers, and Lifestyle Flagship Study of Ageing (AIBL) and the Alzheimer’s


Disease Neuroimaging Initiative (ADNI). Clinical variables included 21 measures across medical history, hematological and other blood tests, and APOE genotype. Tree-based machine learning


algorithms and artificial neural networks were used. APOE genotype was the best predictor of dementia cases and healthy controls. Our results, however, demonstrated that there are


limitations when using publicly accessible cohort data that may limit the generalizability and interpretability of such predictive models. Future research should examine the use of routine


APOE genetic testing for dementia diagnostics. It should also focus on clearly unifying data across clinical cohorts. SIMILAR CONTENT BEING VIEWED BY OTHERS MACHINE LEARNING MODELS IDENTIFY


PREDICTIVE FEATURES OF PATIENT MORTALITY ACROSS DEMENTIA TYPES Article Open access 28 February 2024 AI-BASED DIFFERENTIAL DIAGNOSIS OF DEMENTIA ETIOLOGIES ON MULTIMODAL DATA Article Open


access 04 July 2024 MACHINE LEARNING METHODS TO PREDICT AMYLOID POSITIVITY USING DOMAIN SCORES FROM COGNITIVE TESTS Article Open access 01 March 2021 INTRODUCTION More than 55 million people


worldwide have dementia, a number that is rapidly rising by an additional 10 million cases yearly [1]. Dementia is the leading cause of disability among older adults over the age of 65 and


costs the global economy more than $1.3 trillion USD annually [1]. The timely diagnosis of dementia is essential to ensure that patients and their families can access early support and


interventions, thereby improving quality of life, prolonging independence, and reducing the healthcare burden [2]. Improving dementia diagnostic capabilities and rates, therefore, is an


urgent priority [1, 3]. The recent advent of increased computational power and of ‘big data’ from large clinical cohorts has led to an increase in machine learning models for developing


diagnostic tools to identify dementia. Most current models have relied on neuroimaging from magnetic resonance imaging (MRI) and positron emission tomography (PET) scans [4, 5]. Despite


showing a high level of performance and predictive power, practicality constraints limit the day-to-day clinical usefulness of such models. Importantly, neuroimaging is difficult for many


patients to access to due factors including health insurance status, out-of-pocket costs that limit their affordability, and if the patient lives rurally or in a major metropolitan area [2,


6, 7]. Additionally, waitlists for neuroimaging can be upwards of several months and require specialist services for processing and interpretation, leading to delays in dementia diagnosis


[7]. These issues were further highlighted by the recent Alzheimer’s Association Primary Care Physician Dementia Care Training Survey that found more than half of primary care physicians


feel they do not have the local specialist resources to meet patient demand [8]. In line with these practical constraints, machine learning models based on diagnostic imaging have had


limited utility in real world clinical applications [6]. Primary care physicians are essential for patient triage, diagnosis, and management [2]. Therefore, from a practical perspective,


machine learning-based dementia diagnostic models should focus on the predictive power of easy-to-obtain clinical measures. Indeed, previous studies have used machine learning to look at the


diagnostic utility of routine clinical measures including, for example, the Cardiovascular Risk Factors, Aging, and Dementia (CAIDE) [9], Study on Aging, Cognition and Dementia (AgeCoDe)


[10], Australian National university Alzheimer’s Disease Risk Index (ANU-ADRI) [11], Rapid Assessment of Dementia Risk (RADaR) for older adults [12], and Brief Dementia Screening Indictor


(BDSI) [13]. There are limitations, however, to these models. First, some models used clinical variables that were used to define a dementia patient relative to a healthy control, including


the Mini-Mental State Exam (MMSE) and Clinical Dementia Rating (CDR) scores [14, 15]. In machine learning models, this constitutes “data leakage” where an experimental group label (e.g.


dementia defined by MMSE < 24) is also included as a characteristic (MMSE scores per person) in the model. This leads to an artificially improvement in model performance while


significantly limiting generalizability and interpretability. Second, although these models report a high specificity (identification of true healthy controls) and negative predictive value


(NPV; ratio of true healthy controls to all healthy controls identified), they report very low sensitivity (identification of true dementia patients) and positive predictive value (PPV;


ratio of true dementia patients to all dementia patients identified [9, 14, 16, 17]. Reported sensitivities and PPVs ranged from 0.1–0.47, indicating that models using clinical measures can


readily identify a healthy person but are unable to identify someone with dementia. In direct support of this, a recent study confirmed that these existing dementia prediction models missed


84–91% of patients with incident dementia therefore demonstrating little, if any, clinical utility for dementia diagnostics [18]. This underlies the need for the development of sensitive


prediction models that identify patients with dementia. In this context, using two large cohorts from Australia (the Australian Imaging Biomarkers and Lifestyle Flagship Study of Ageing;


AIBL) and the US (Alzheimer’s Disease Neuroimaging Initiative; ADNI), the primary aim of this study was to use several different machine learning models to re-examine the diagnostic


potential of easy-to-obtain clinical measures for dementia in older adults over age 65. METHODS DATA AND PARTICIPANTS The data used in the present study was obtained from two publicly


available databases: ADNI and AIBL (https://ida.loni.usc.edu/) and the data was downloaded on October 10, 2023. All participants provided informed consent. Participants included those


enrolled in ADNI and AIBL who were identified as either a healthy control or diagnosed dementia patient (probable Alzheimer’s disease (AD)) at their baseline visit and assessments. ADNI


participant characteristics have been described elsewhere [19]. In brief, those with a dementia diagnosis were identified as having subjective memory complaints, an MMSE range of 20–26, and


CDR of >0.5 [19]. AIBL participant characteristics have also been described elsewhere [20]. Here, participants were identified as having dementia (probable AD) as defined by National


Institute of Neurological and Communicative Diseases and Stroke/Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) criteria [21]. AIBL study methodology has been reported


previously [20]. Unlike ADNI, AIBL included participants who expressed a concern about their memory function or had memory complaints in response to being asked “Do you have difficulties


with your memory” in the healthy control group. This resulted in approximately 50% of the healthy control group made up of those who had subjective concerns about their memory [20].


Participants in both the ADNI and AIBL cohorts that were diagnosed with mild cognitive impairment (MCI) were excluded from the current study as MCI doesn’t necessarily progress to dementia.


Prior to undertaking any analysis, we limited both datasets to participants over age 65 to capture older adults who were more likely to have age-associated dementia, rather than early onset.


We identified that there was a statistical difference in the ages between the ADNI and AIBL cohorts with average participant ages of 76 and 74 years, respectively (Wilcoxon test; W = 


346983, _p_ = 3.17e-11). To remove this age bias between the cohorts, we created a random sample of 150 ADNI participants within the age range of 80–96 and removed them to eliminate this


statistical difference. After doing this, the average age was an equal 74 across both the ADNI and AIBL cohorts. Table 1 shows the demographic and clinical characteristics of the


participants included in this study after controlling for age. FEATURE SELECTION (PREDICTORS) We first ensured that the features (or predictive variables) used in our models matched across


the ADNI and AIBL cohorts. Common features across both cohorts included apolipoprotein E (APOE) genotype, nine medical history questions, and test results from six hematological and five


blood tests. These features along with the median and interquartile range (IQR) across participants and cohorts are shown in Table 1. Although both cohorts had data for MMSE and CDR, these


were not used as features due to the potential for data leakage and artificial inflation of model performance, as discussed in the Background. We also had to remove two variables from our


analyses: thyroid stimulating hormone test (AXT117) and a health history of smoking (MH16SMOK). Both variables were removed due to too many participants missing data for these. Thyroid


stimulating hormone test results were missing for 45% of the ADNI cohort and smoking history was missing for 42% of the AIBL cohort. For the remaining variables, we imputed the missing


values where required using the group median value. Of note, only ADNI specified the units of measurement of the hematological and other blood tests. It is not clear, therefore, if the units


of measurement are the same across them. We therefore treat these variables as reported in their respective publicly available datasets (e.g. no conversions). STATISTICAL ANALYSES To


thoroughly evaluate the diagnostic potential of easy-to-obtain clinical measures for dementia, we used several machine learning algorithms including tree-based algorithms (classification and


regression trees (CART) [22], random forest [23], gradient boosting machines (GBM) [24], extreme gradient boosting (XGBoost) [25], and artificial neural networks. Datasets (ADNI only, AIBL


only, or a combined ADNI and AIBL) were split into a 70% training dataset and a 30% held-out testing dataset. Machine learning models were built, fine-tuned, and validated on the training


datasets using five-fold cross-validation repeated five times. Where required due to class imbalances of the output variable, an oversampling technique was used. Here, the underrepresented


class was randomly resampled to ensure that the algorithms received approximately the same number of classes. For all algorithms, a fine-tuning grid method was used where we estimated all


possible combinations of parameters within the predetermined ranges (see Supplementary Table 1 for hyperparameters). All analyses were done in R (version 4.3.1, R Core Team, 2023) using


‘caret’ package [26]. Code is available through A.S. GitHub at https://github.com/Art83. MODEL EVALUATION The performance of the machine learning models was evaluated using a 30% held-out


dataset. For each model, we report several metrics including: sensitivity (correctly identified dementia cases), positive predictive value (PPV also known as precision; number of dementia


cases / total number of predicted dementia cases (true and false)), specificity (correctly identified healthy controls), negative predictive value (NPV; number of healthy controls / total


number of predicted healthy controls (true and false)), and AUC (ability to distinguish dementia cases and healthy controls). The main performance indicators we used in the present study


were sensitivity and PPV. Although AUC is a commonly used metric for evaluating the performance of machine learning models, it only provides limited insight into model performance [27].


Given that previous dementia prediction models have been unable to identify incident dementia cases [18], we chose to focus specifically on sensitivity and PPV as performance metrics in our


study. Further, using these reduces the likelihood of a dementia case being identified as a healthy control (false negative) and provides the probability that a person with a positive result


indeed has dementia [28]. Where sensitivities and PPVs were similar (<0.2 difference) between two models, we also considered specificity and NPV when identifying the top performing


model, as we acknowledge that dementia prediction models still need reasonable metrics for identifying true healthy controls. To evaluate the relative contribution of features to overall


performance of our models we performed a SHAP (SHapley Additive exPlanations) analysis. This feature importance selection method allowed us to assign an importance value for each input


variable (feature) for our predictions [29], thus demonstrating which variables are the most important for dementia prediction. SHAP was done in R using package ‘shapviz’. RESULTS APOE


GENOTYPE SHOWS THE HIGHEST UTILITY FOR A DEMENTIA DIAGNOSIS USING A MERGED ADNI-AIBL COHORT We first merged the ADNI and AIBL cohorts into a single large dataset to test the diagnostic


potential of 21 easy-to-obtain clinical measures for dementia. This merged dataset was then randomly split into a 70% training and validation set and a 30% unseen testing dataset. With APOE


genotype included as a feature, model sensitivity ranged from 0.63 to 0.83 and PPV ranged from 0.69 to 0.82 (Table 2). The neural network had the highest sensitivity of 0.83 and a PPV of


0.73 with a specificity of 0.66 and NPV of 0.78. Specificity and NPV was generally higher than sensitivity and PVV, ranging from 0.66 to 0.86 and 0.73 to 0.78, respectively (Table 2). To


identify which of the 21 features (clinical measures) were contributing the most to our models’ performance, we used a SHAP analysis. This showed that APOE genotype had the highest


contribution to our models’ ability to predict a dementia diagnosis (Fig. 1A, B). Further, the high positive SHAP value indicated that APOE genotype was the most valuable for identifying a


dementia patient (sensitivity and PPV; Fig. 1B). The urea nitrogen test (RCT6), was shown to be a second, albeit lesser, contributor to model performance (Fig. 1A, B). Unlike APOE genotype,


urea nitrogen was more important for identifying a healthy control (specificity and NPV; Fig. 1B). Importantly, the urea nitrogen finding may be consequence of differences in the median


between ADNI and AIBL (approximately 18 and 35, respectively). We were unable to determine if this may be reflective of unit of measure differences between ADNI and AIBL, as these data were


not reported. To further demonstrate that APOE genotype is essential for the models’ predictive performance, we repeated all analyses but excluded APOE genotype from our selected features


(20 features instead of 21). As expected, performance significantly decreased across all models (Table 2). Without APOE genotype, sensitivity and PPV ranged from 0.57 to 0.73 and 0.64 to


0.77, respectively. Although specificity remained relatively high, ranging from 0.60 to 0.81, NPV generally decreased across the models, with a range of 0.61 to 0.79. Urea nitrogen took the


place of APOE genotype as the top contributor to model performance (Fig. 1C, D). As before, urea nitrogen was more important for identifying a healthy control (specificity and NPV; Fig. 1D),


suggesting that this may contribute to the higher specificity of our models without APOE genotype. Though again, it’s unclear if this is due to median differences between ADNI and AIBL.


MULTIVARIATE DEMENTIA PREDICTION MODELS PERFORM POORLY WHEN USING EITHER THE ADNI OR AIBL DATASETS AS TRAINING AND TESTING To further identify if our multivariate prediction models of


dementia are generalizable, we separated the dataset back into the ADNI and AIBL cohorts and re-tested our models. In the first experiment, we trained and validated our 21-feature models


(including APOE genotype) on ADNI and then tested them on AIBL and, in the second, we performed the reverse. When we trained our models using the ADNI dataset, they lost dementia diagnostic


utility. Here, our sensitivity substantially decreased to a range of 0.25 to 0.36 however our PPV remained higher and ranged from 0.62 to 0.78 (Table 3). Specificity, on the other hand, was


very high with a range of 0.90 to 0.93, similar to existing dementia diagnostic models. NPV was lower and highly variable, ranging from 0.47 to 0.97 (Table 3). There were two top performing


models: CART and GBM. Both had sensitivities of 0.35, PPVs of 0.72, specificities of 0.93 and NPVs of 0.73. Interestingly, training our models on the AIBL dataset and testing them on the


ADNI dataset improved their predictive power. In this instance, sensitivity and PPV improved to a range of 0.61 to 0.89 and 0.45 to 0.99, respectively (Table 3). Our specificity and NPV,


however, decreased to a range of 0.37 to 0.61 and 0.47 to 0.97, respectively (Table 3). CART and GBM were again the equally the top performing models with sensitivities of 0.83,


specificities of 0.61, PPVs of 0.71, and NPVs of 0.76. Combined, this suggests that almost all the models, except CART and GBM, have a poor predictive performance when separating the ADNI


and AIBL cohorts. THE RELATIVE DISTRIBUTIONS OF APOE GENOTYPE ACROSS ADNI AND AIBL DRIVE POOR MODEL PERFORMANCE We next sought out to better understand why our CART and GBM models had an


acceptable level of performance when training and testing on the opposite datasets. To do this, we chose to look at CART performance, specifically, as we can use decision trees to better


understand the features used for class identification. The CART decision tree for both experiments (train ADNI, test AIBL and train AIBL, test ADNI) was identical and showed APOE genotype


was the only feature that it was using to predict whether a participant was a healthy control or a dementia patient (Fig. 2). If the APOE genotype contained at least one APOE4 allele (e4,e2


or e4,e3 or e4,e4) then the models categorized that person as having dementia and, if not (i.e. an APOE genotype of e2,e2 or e2,e3 or e3,e3), then as a healthy control. Despite both of our


CART models using only APOE genotype to determine a healthy control from a dementia patient, it was unclear why performance differed dramatically depending on if we used ADNI or AIBL as the


training or testing dataset (Table 3). Specifically, our metrics in Table 3 showed that the AIBL tested model had a high rate of false positives and the ADNI tested model had a higher rate


of false negatives. This suggested that a difference in the relative distribution of APOE genotypes in healthy controls and dementia patients across the ADNI and AIBL cohorts may be driving


performance differences. To confirm that this was the case, we examined the relative distribution of APOE genotypes in healthy controls and dementia patients in ADNI and AIBL. As shown in


Fig. 3A, AIBL indeed has a high rate of false positives – i.e. the CART misclassified APOE4 allele carriers as having dementia when they were instead healthy controls. ADNI, on the other


hand, had a slightly higher rate of false negatives than AIBL (Fig. 3B), but only for the e3,e2 genotype. Here, the CART misclassified these as being healthy controls when they had dementia.


We further confirmed this using the CART confusion matrices. Indeed, when our CART models were tested on AIBL there was a high rate of false positives (i.e. low sensitivity; Fig. 3C). When


they were tested on ADNI, however, there was a higher rate of false negatives (i.e. lower specificity; Fig. 3D). APOE GENOTYPE PREDICTS DEMENTIA CASES IN ADNI WHEREAS IT PREDICTS HEALTHY


CONTROLS IN AIBL To further investigate the basis for our models’ poor predictive performance when training and testing on opposite datasets, we examined the predictive capabilities within


ADNI and AIBL independently. First, we examined the performance of our 21-feature (including APOE genotype) models when we trained and validated on a 70% split of the ADNI dataset and tested


on a withheld 30% of the ADNI dataset. Our models showed strong dementia predictive performance. Here, depending on the algorithm, our models showed a very high sensitivity ranging from


0.80 to 0.86 and a PPV ranging from 0.65 to 0.75 (Table 4). The two models with the highest performance were the CART and XGBoost with an equal sensitivity of 0.86 and PPV of 0.84 (Table 4).


Our SHAP analysis again demonstrated that APOE genetic testing was the main feature driving our predictive power for identifying incident dementia cases (Fig. 4A, B). We then performed the


same experiment, however this time using 70% of the AIBL dataset to train and validate our models and a withheld 30% of AIBL to test them. These models were unable to predict dementia cases.


Here, the sensitivity ranged from 0.26 to 0.44 (Table 4), suggesting that the models miss most dementia cases. The PPV, however, was higher, ranging from 0.56 to 0.85 (Table 4). Despite


this, and unlike the ADNI dataset, our models had very high specificity that ranged from 0.86 to 0.95 (Table 4). The NPV ranged from 0.45 to 0.75 (Table 4). Again, APOE genotype remained the


strongest predictor that drove model performance (Fig. 4C, D) and urea nitrogen testing came in as a second predictor, albeit to a lesser extent than APOE genotype. Combined, these findings


indicate that APOE genotype is predictive of dementia cases in the ADNI cohort (high sensitivity) whereas it’s predictive of healthy controls in the AIBL cohort (high specificity).


DISCUSSION Developing diagnostic models of dementia that rely on “easy-to-obtain” clinical measures rather than neuroimages is important for timely diagnoses by primary care physicians.


Using cohort data from the ADNI and AIBL datasets, this study used machine learning to examine the diagnostic potential of 21 clinical measures, including APOE genotype, medical history,


hematological and other blood tests. Using a combined ADNI and AIBL cohort dataset, we showed that artificial neural networks have the best predictive performance. Our sensitivity was 0.83


and PPV was 0.73, suggesting that our neural network would only miss approximately 27% of incident dementia cases. These metrics showed that our neural network outperformed existing models


[9, 14, 16,17,18] and that it may hold practical utility for dementia diagnostics. APOE genotype was the top performing variable with urea nitrogen the second, albeit lesser, performing


variable. Interestingly, when we separated the ADNI and AIBL cohorts and used them either as training or testing datasets, respectively, our models lost predictive power. When we trained our


models on ADNI and tested them on AIBL, our models’ sensitivity decreased to a clinically invaluable range of 0.25–0.36. When we performed the reverse, training on AIBL and testing on ADNI,


our sensitivity metrics improved however our specificity fell to 0.37–0.61. We identified that these discrepancies were due to differences in the relative distribution of APOE genotype


across ADNI and AIBL. ADNI had a small increase in the number of false negatives due to APOE4 non-carriers being identified as healthy controls when they were instead dementia cases. AIBL,


on the other hand, had a very high number of false positives because APOE4 carriers were being classified as having dementia when they were healthy controls. Therefore, although APOE


genotype was the main predictive variable across all our models, APOE genotype better predicted dementia cases in ADNI whereas it was predictive of healthy controls in AIBL. The reasons for


this discrepancy in APOE genotype are largely unclear. It may be due to class imbalances across the datasets: ADNI had more dementia cases whereas AIBL had more healthy controls. This may


have led to higher sensitivity using ADNI and a higher specificity when using AIBL. It may also be due to differences in participant group allocations between ADNI and AIBL. ADNI reported


that participants with a subjective memory complaint were placed into the dementia group whereas AIBL reported that these participants went into the healthy control group [19, 20]. As


reported in the baseline and methodology characteristics study, this resulted in approximately 50% of the healthy control group in AIBL made up of participants that had a subjective memory


complaint – i.e. said yes in response to being asked “Do you have difficulties with your memory” [20]. Subjective memory complaints are an important consideration in the context of APOE


genotype. Numerous studies have identified that APOE4 carriers with subjective memory complaints or cognitive dysfunction have worse baseline memory [30], memory-guided attention [31], and


episodic memory [32]. Age has also been shown to play a significant role in the relationship between APOE4 and a faster rate of memory decline [30, 31]. Further, some studies have shown that


APOE4 carriers with subjective memory complaints and cognitive dysfunction are similar to people with early mild cognitive impairment (MCI) [33] and are more likely to develop clinical MCI


[34, 35]. People with subjective memory complaints and an APOE4 allele also have changes in the brain that are indicative of MCI and dementia including hippocampal volume changes [32, 36].


One study even found evidence to suggest that subjective memory complaints may be a realistic appraisal of cognitive decline-associated brain changes in people with and APOE4 allele [36]. It


may be the case, therefore, that AIBL contains healthy controls, especially those with an APOE4 allele, that are in the early stages of MCI or dementia. It’s important to note that the


finding that our AIBL-derived models have a high rate of false positives may be due to the data that has been made open source and publicly available. For example, a study using a subgroup


of participants from the AIBL cohort found that subjective memory complaints were indicative of a higher amyloid β burden and APOE4 carrier status [37]. The subjective memory complaint


status of the participants in the publicly available AIBL cohort is not available. We were unable, therefore, to further investigate if re-coding the subjective memory complainers as


dementia cases led to improved model performance in line with what we found in our ADNI-based models. Future research would benefit from further examining this. Large open source and


publicly available dementia datasets are increasingly becoming available and, in line with this, it is becoming easier to pool participants to increase study sample sizes and improve


statistical power. This is particularly important for studies like ours that use machine learning or other artificial intelligence-based methods to unravel complex relationships between


variables for dementia diagnostic and treatment modelling. Our results, however, highlight several limitations of using such cohort data. First, there needs to be increased effort in


ensuring that the data obtained between cohorts is the same. For example, although the test results of the hematological and other blood tests largely looked to be similar across ADNI and


AIBL, there were clear differences in the urea nitrogen results. This is likely due to differences in the units of measurement used (e.g. ng/dL vs. nmol/L) however without this information


it’s not possible to confirm. Unclear units of measurement also may limit the clinical implementation of these types of models and their predictive features. On the other hand, it might be


the case that urea nitrogen is indeed a potential diagnostic biomarker for dementia. For example, recent work has shown that high blood urea nitrogen was associated with an increased


incidence of dementia [38, 39]. Future work should continue to examine this. There were also many variables missing between the cohorts that may be important. For example, ADNI included


measures like ethnicity, educational attainment, and type of residence that were not including in AIBL, thus potentially limiting the generalizability of our models (e.g. to multiple


ethnicities). Second, our results highlight that it’s important to independently examine the behaviour of different dementia cohorts prior to undertaking studies using merged or unified


datasets. Here, our merged ADNI-AIBL models performed very well, despite there being clear and opposing differences when the cohorts were examined individually. It’s not clear, therefore,


whether the merging of the two datasets produced a cohort that is truly generalizable to the population. For example, merging datasets may lead to the unintentional washing out of important


variables and effects that are relevant to dementia. This may limit the interpretability and generalizability of studies that only use merged cohorts and hold important implications for


future research on predictive dementia models. Irrespective of the differences between the ADNI and AIBL cohorts, APOE genotype remained the strongest predictor of both dementia cases and


healthy controls. In line with the growing body of research, this finding highlights the importance of considering APOE genotype in the context of dementia diagnostics. APOE genetic testing


is a low-cost and readily available assay. In the US, an APOE genetic screen costs from $100–$125 USD and can even be done from home and mailed into a diagnostic testing lab. Similarly, APOE


testing in Australia costs $150–$200 AUD and can be done through routine pathology labs. Although the UK’s National Health Service (NHS) doesn’t offer APOE testing, there are services


available offering APOE genetic screens in the context of cardiovascular disease for similar price ranges (£180). Further, APOE genetic results are easy to interpret in the absence of


specialist resources. This highlights that it’s a practical diagnostic tool that primary care physicians can readily use to identify patients with subjective memory complaints who are at a


high risk of dementia. Future research, therefore, should examine the use of routine APOE genetic testing in primary care clinics for the purposes of early identification and diagnosis of


dementia. In conclusion, we have identified that across several easy-to-obtain clinical measures, APOE genotype remains the best predictor of dementia cases relative to healthy controls.


APOE genotype remains a relatively widely available test and consideration should be given to its utility as a routine diagnostic test for dementia. We also highlighted that there are


limitations associated with using publicly available cohort data to generate multivariate dementia prediction models. These limitations warrant further efforts to unify existing and future


dementia cohorts. DATA AVAILABILITY Data used in preparation of this article was obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database and the Australian Imaging,


Biomarkers, and Lifestyle Flagship Study of Ageing (AIBL) funded by the Commonwealth Scientific and Industrial Research Organization (CSIRO). Both were made available at the ADNI database


(www.loni.usc.edu/ADNI). REFERENCES * World Health Organization. Dementia. https://www.who.int/news-room/fact-sheets/detail/dementia. Accessed July 1 2024. * Liss JL, Assuncao SS, Cummings


J, Atri A, Geldmacher DS, Candela SF, et al. Practical recommendations for timely, accurate diagnosis of symptomatic Alzheimer’s disease (MCI and dementia) in primary care: a review and


synthesis. J Intern Med. 2021;290:310–34. Article  CAS  PubMed  PubMed Central  Google Scholar  * Parker M, Barlow S, Hoe J, Aitken L. Persistent barriers and facilitators to seeking help


for a dementia diagnosis: a systemtic review of 30 years of the perspectives of carers and people with dementia. Int Psychogeriatr. 2020;32:611–34. Article  PubMed  Google Scholar  *


Pellegrini E, Ballerini L, Hernandez MCV, Chappell FM, Gonzalez-Castro V, Anblagan D, et al. Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: a


systematic review. Alzheimer’s Dementia: Diagnosis, Assess Dis Monit. 2022;10:519–35. Google Scholar  * Javeed A, Dallora AL, Berglund JS, Ali A, Ali L, Anderberg P. Machine learning for


dementia prediction: a systematic review and future research directions. J Med Syst. 2023;47:17. Article  PubMed  PubMed Central  Google Scholar  * Leming MJ, Bron EE, Bruffaerts R, Ou Y,


Iglesias JE, Gollub RL, et al. Challenges of implementing computer-aided diagnostic models for neuroimages in a clinical setting. npj Digital Med. 2023;6:129. Article  Google Scholar  *


Mansfield E, Noble N, Sanson-Fisher R, Mazza D, Bryant J. Primary care physicians’ perceived barriers to optimal demenetia care: a systematic review. Gerontologist. 2018;59:697–708. Article


  Google Scholar  * Alzheimer’s Association. Alzheimer’s association facts and figures. _Alzheimer’s & Dementia_ 2020:391–460. https://doi.org/10.1002/alz.12068. * Kivipelto M, Ngandu T,


Laatikainen T, Winblad B, Soininen H, Tuomilehto J. Risk score for the prediction of dementia risk in 20 years among middle aged people: a longitudinal, population-based study. Lancet


Neurol. 2006;5:735–41. Article  PubMed  Google Scholar  * Luck T, Riedel-Heller SG, Luppa M, Wiese B, Wollny A, Wagner M, et al. Risk factors for incident mild cognitive impairment - results


from the German study on ageing, cognition and dementia in primary care patients (AgeCoDe). Acta Psychiatr Scand. 2010;121:260–72. Article  CAS  PubMed  Google Scholar  * Anstey KJ,


Cherbuin N, Herath PM, Qiu C, Kuller LH, Lopez OL, et al. A self-report risk index to predict occurrence of dementia in three independent cohorts of older adults: the ANU-ADRI. PLoS One.


2014;9:e86141. Article  PubMed  PubMed Central  Google Scholar  * Capuano AW, Shah RC, Blanche P, Wilson RS, Barnes LL, Bennett D, et al. Derivation and validation of the rapid assessment of


dementai risk (RADaR) for older adults. PLoS One. 2022;17:e0265379. Article  CAS  PubMed  PubMed Central  Google Scholar  * Barnes DE, Beiser AS, Lee A, Langa KM, Koyama A, Preis SR, et al.


Development and validation of a brief dementia screening indicator for primary care. Alzheimer’s Dement. 2014;10:656–65. Article  Google Scholar  * James C, Ranson JM, Everson R, Llewellyn


DJ. Performance of machine learning algorithms for predicting progression to dementia in memory clinic patients. JAMA Netw Open. 2021;4:e2136553. Article  PubMed  PubMed Central  Google


Scholar  * So A, Hooshyar D, Park KW, Lim HS. Early diagnosis of dementia from clinical data by machine learning techniques. Appl Sci. 2017;7:651. Article  Google Scholar  * Jessen F, Wiese


B, Bickel H, Eifflander-Gorfer S, Fuchs A, Kaduszkiewicz H, et al. Prediction of dementia in primary care patients. PLoS One. 2011;6:e16852. Article  CAS  PubMed  PubMed Central  Google


Scholar  * K Walters, S Hardoon, I Petersen, S Iliffe, RZ Omar, I Nazareth et al. Predicting dementia risk in primary care: Developmet and validation of the Dementia Risk Score using


routinely collected data. _BMC Med_ 2016; 14, https://doi.org/10.1186/s12916-016-0549-y. * Kivimaki M, Livingston G, Singh-Manoux A, Mars N, Lindbohm JV, Pentti J, et al. Estimating dementia


risk using multifactorial prediction models. JAMA Netw Open. 2023;6:e2318132. Article  PubMed  PubMed Central  Google Scholar  * Petersen RC, Aisen PS, Beckett LA, Donohue MC, Gamst AC,


Harvey DJ, et al. Alzheimer’s disease neuroimaging initiative (ADNI): clinical characterization. Neurology. 2009;74:201–9. Article  PubMed  Google Scholar  * Ellis KA, Bush AI, Darby D, De


Fazio D, Foster J, Hudson P, et al. The Australian imaging, biomarkers and lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a


longitudinal study of Alzheimer’s disease. Int Psychogeriatr. 2009;21:672–87. Article  PubMed  Google Scholar  * McKahnn G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical


diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA work group under the auspices of department of health and human services task force on Alzheimer’s disease. Neurology.


1984;34:939–44. Article  Google Scholar  * L Breiman, JH Friedman, RA Olshen, CJ Stone _Classification and Regression Trees_. Wadsworth & Brooks/Cole Advanced Books & Software; 1984.


https://doi.org/10.1201/9781315139470. * Breiman L. Random forests. Mach Learn. 2001;45:5–32. Article  Google Scholar  * Friedman JH. Greedy function approximation: a gradient boosting


machine. Ann Stat. 2001;29:1189–232. Article  Google Scholar  * T Chen, C Guestrin XGBoost: a scalable tree boosting system. _Proceedings of the 22nd ACM SIGKDD International Conference on


Knowledge Discovery and Data Mining_ 2016:785–94. https://doi.org/10.1145/2939672.2939785. * Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28:1–26.


Article  Google Scholar  * Roberts M, Hazan A, Dittmer S, Rudd JHF, Schonlieb C-B. The curious case of the test set AUROC. Nat Mach Intell. 2024;6:373–6. Article  Google Scholar  * Trevethan


R. Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice. Front Public Health. 2017;5:307. Article  PubMed  PubMed Central 


Google Scholar  * Lundberg S, Lee S-I. A unified appraoch to interpreting model predictions. _Proceedings of the 31st Conference on Neural Inforamtion Processing Systems (NIPS)_. Long Beach,


California; 2017. https://doi.org/10.48550/arXiv.1705.07874. * Samieri C, Proust-Lima C, Glymour MM, Okereke OI, Amariglio RE, Sperling RA, et al. Subjective cognitive concerns, episodic


memory, and the APOE ε4 allele. Alzheimer’s Dement. 2014;10:752–9. Article  Google Scholar  * Zimmerman J, Alain C, Butler C. Impaired memory-guided attention in asymptomatic APOE4 carriers.


Sci Rep. 2019;9:8138. Article  Google Scholar  * Striepens N, Scheef L, Wind A, Meiberth D, Popp J, Spottke A, et al. Interaction effects of subjective memory impairment and APOE4 genotype


on episodic memory and hippocampal volume. Psychol Med. 2011;41:1997–2006. Article  CAS  PubMed  Google Scholar  * Cho H, Kim Y-E, Chae W, Kim KW, Kim J-W, Kim HJ, et al. Distribution and


clinical impact of apolipoprotein E4 in subjective memory impairment and early mild cognitive impairment. Sci Rep. 2020;10:13365. Article  CAS  PubMed  PubMed Central  Google Scholar  *


Muller-Gerards D, Weimar C, Abramowski J, Tebrugge S, Jokisch M, Dragano N, et al. Subjective cognitive decline, APOE ε4, and incident mild cognitive impairment in men and women. Alzheimer’s


Dement. 2019;11:221–30. Google Scholar  * Ali JI, Smart CM, Gawryluk JR. Subjective cognitive decline and APOE ε4: a systematic review. J Alzheimer’s Dis. 2018;65:303–20. Article  Google


Scholar  * Stewart R, Godin O, Crivello F, Maillard P, Mazoyer B, Tzourio C, et al. Longitudinal neuroimaging correlates of subjective memory impairment: 4-year prospective community study.


Br J Psychiatry. 2011;198:199–205. Article  PubMed  Google Scholar  * Zwan MD, Villemagne V, Dore V, Buckley R, Bourgeat P, Veljanoski R, et al. Subjective memory complaints in APOE ε4


carriers are associated with high amyloid-β burden. J Alzheimer’s Dis. 2015;49:1115–22. Article  Google Scholar  * Huang Y-Y, Wang H-F, Wu B-S, Ou Y-N, Ma L-Z, Yang L, et al. Clinical


laboratory tests and dementia incidence: a prospective cohort study. J Affect Disord. 2024;351:1–7. Article  PubMed  Google Scholar  * He X-Y, Kuo K, Yang L, Zhang Y-R, Wu B-S, Chen S-D, et


al. Serum clinical laboratory tests and risk of incident dementia: a prospective cohort study of 407,190 individuals. Transl Psychiatry. 2022;12:312. Article  CAS  PubMed  PubMed Central 


Google Scholar  Download references ACKNOWLEDGEMENTS The authors are grateful to ADNI and AIBL for providing the data. AUTHOR INFORMATION Author notes * These authors contributed equally:


Caitlin A. Finney, Artur Shvetcov. AUTHORS AND AFFILIATIONS * Translational Dementia Research Group, Centre for Immunology and Allergy Research, Westmead Institute for Medical Research,


Sydney, NSW, 2145, Australia Caitlin A. Finney & Artur Shvetcov * School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, 2006, Australia


Caitlin A. Finney * Neuroinflammation Research Group, Centre for Immunology and Allergy Research, Westmead Institute for Medical Research, Sydney, NSW, 2145, Australia David A. Brown *


Westmead Clinical School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, 2006, Australia David A. Brown * Department of Immunopathology, Institute for Clinical


Pathology and Medical Research-New South Wales Health Pathology, Sydney, NSW, 2145, Australia David A. Brown * Department of Psychological Medicine, Sydney Children’s Hospitals Network,


Sydney, NSW, 2145, Australia Artur Shvetcov * Discipline of Psychiatry and Mental Health, School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney,


NSW, 2052, Australia Artur Shvetcov Authors * Caitlin A. Finney View author publications You can also search for this author inPubMed Google Scholar * David A. Brown View author publications


You can also search for this author inPubMed Google Scholar * Artur Shvetcov View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS C.A.F. and


A.S. conceived, designed, and performed the experiments and analyzed the data. C.A.F., A.S., and D.A.B. contributed to the interpretation of the results. C.A.F. drafted the first version of


the manuscript. C.A.F., A.S., and D.A.B. reviewed and edited the manuscript and approved the final version. CORRESPONDING AUTHOR Correspondence to Caitlin A. Finney. ETHICS DECLARATIONS


ETHICS APPROVAL AND CONSENT TO PARTICIPATE This study only used publicly accessible, de-identified data from the ADNI and AIBL cohorts and therefore did not require ethics approval. Ethics


approval for the ADNI and AIBL cohorts was obtained by the relevant study sites and all participants provided informed consent. COMPETING INTERESTS The authors declare no conflict of


interest. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. C.A.F. is supported by philanthropic funding from the


Neil & Norma Hill Foundation, John and Anne Leece Family, Annemarie & Arturo Gandiolo-Fumagalli Foundation, Perpetual Foundation – John Williams Endowment, and Hillcrest Foundation.


Open access funding provided to C.A.F. by the Westmead Institute for Medical Research. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional


claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY TABLE 1 RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons


Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give


appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission


under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons


licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by


statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit


http://creativecommons.org/licenses/by-nc-nd/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Finney, C.A., Brown, D.A. & Shvetcov, A. Developing multifactorial


dementia prediction models using clinical variables from cohorts in the US and Australia. _Transl Psychiatry_ 15, 15 (2025). https://doi.org/10.1038/s41398-025-03247-0 Download citation *


Received: 04 July 2024 * Revised: 11 December 2024 * Accepted: 14 January 2025 * Published: 21 January 2025 * DOI: https://doi.org/10.1038/s41398-025-03247-0 SHARE THIS ARTICLE Anyone you


share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the


Springer Nature SharedIt content-sharing initiative