Objective: Electronic health records (EHR) are widely available to complement
administrative data-based disease surveillance and healthcare performance
evaluation. Defining conditions from EHR is labour-intensive and requires
extensive manual labelling of disease outcomes. This study developed an
efficient strategy based on advanced large language models to identify multiple
conditions from EHR clinical notes. Methods: We linked a cardiac registry
cohort in 2015 with an EHR system in Alberta, Canada. We developed a pipeline
that leveraged a generative large language model (LLM) to analyze, understand,
and interpret EHR notes by prompts based on specific diagnosis, treatment
management, and clinical guidelines. The pipeline was applied to detect acute
myocardial infarction (AMI), diabetes, and hypertension. The performance was
compared against clinician-validated diagnoses as the reference standard and
widely adopted International Classification of Diseases (ICD) codes-based
methods. Results: The study cohort accounted for 3,088 patients and 551,095
clinical notes. The prevalence was 55.4%, 27.7%, 65.9% and for AMI, diabetes,
and hypertension, respectively. The performance of the LLM-based pipeline for
detecting conditions varied: AMI had 88% sensitivity, 63% specificity, and 77%
positive predictive value (PPV); diabetes had 91% sensitivity, 86% specificity,
and 71% PPV; and hypertension had 94% sensitivity, 32% specificity, and 72%
PPV. Compared with ICD codes, the LLM-based method demonstrated improved
sensitivity and negative predictive value across all conditions. The monthly
percentage trends from the detected cases by LLM and reference standard showed
consistent patterns.