Clinicians need accurate and timely information on the impact of treatments on patient outcomes. The electronic health record (EHR) offers the potential for insight into real-world patient experiences and outcomes, but it is difficult to tap into. Our goal was to apply artificial intelligence technology to the EHR to characterize the clinical course of patients with stage III breast cancer.
PATIENTS AND METHODS
Data from patients with stage III breast cancer who presented between 2013 and 2015 were extracted from the EHR, de-identified, and imported into the IBM Cloud. Specialized natural language processing (NLP) annotators were developed to extract medical concepts from unstructured clinical text and transform them to structured attributes. In the validation phase, these annotators were applied to 19 additional patients with stage III breast cancer from the same period. The resulting data were compared with that in the medical chart (gold standard) for nine key indicators.
Information was extracted for 50 patients, including tumor stage (94% stage IIIA, 6% stage IIIB), age (28% 50 years or younger, 52% between 51 and 70 years, and 24% older than 70 years), receptor status (84% estrogen receptor positive, 74% progesterone receptor positive), and first treatment (72% surgery, 26% chemotherapy, 2% endocrine). Events in the patient’s journey were compiled to create a timeline. For 171 data elements, NLP and the chart disagreed for 41 (24%; 95% CI, 17.8% to 31.1%). With additional manipulation using simple logic, the disagreement was reduced to six elements (3.5%; 95% CI, 1.3% to 7.5%; F1 statistic, 0.9694).
It is possible to extract, read, and combine data from the EHR to view the patient journey. The agreement between NLP and the gold standard was high, which supports validity.