Validation of a natural language processing algorithm to identify adenomas and measure adenoma detection rates across a health system: a population-level study

abstract

BACKGROUND AND AIMS: Measuring adenoma detection rates (ADRs) at the population level is challenging because pathology reports are often reported in an unstructured format; further, there is significant variation in reporting methods across institutions. Natural language processing (NLP) can be used to extract relevant information from text-based records. We aimed to develop and validate an NLP algorithm to identify colorectal adenomas that could be used to report ADR at the population level in Ontario, Canada. METHODS: The sampling frame included pathology reports from all colonoscopies performed in Ontario in 2015 and 2016. Two random samples of 450 and 1000 reports were selected as the training and validation sets, respectively. Expert clinicians reviewed and classified reports as adenoma or other. The training set was used to develop an NLP algorithm (to identify adenomas) that was evaluated using the validation set. The NLP algorithm test characteristics were calculated using expert review as the reference. We used the algorithm to measure ADR for all endoscopists in Ontario in 2019. RESULTS: The 1450 pathology reports were derived from 62 laboratories, 266 pathologists, and 532 endoscopists. In the training set, the NLP algorithm for any adenoma had a sensitivity of 99.60% (95% confidence interval (CI), 97.77-99.99), specificity of 99.01% (95% CI, 96.49-99.88), positive predictive value of 99.19% (95% CI, 97.12-99.90), and F1 score of .99. Similar results were obtained for the validation set. The median ADR was 33% (interquartile range, 26%-40%). CONCLUSIONS: When we used a population-based sample from Ontario, our NLP algorithm was highly accurate and was used at the system level to measure ADR.

authors

Tinmouth, Jill
Swain, Deepak
Chorneyko, Katherine
Lee, Vicki
Bowes, Barbara
Li, Yingzi
Gao, Julia
Morgan, David

status

published

publication date

January 2023

has subject area

1103 Clinical Sciences (FoR)
Adenoma (MeSH)
Algorithms (MeSH)
Colonoscopy (MeSH)
Gastroenterology & Hepatology (Science Metrix)
Humans (MeSH)
Natural Language Processing (MeSH)
Ontario (MeSH)

published in

Gastrointestinal Endoscopy Journal

Validation of a natural language processing algorithm to identify adenomas and measure adenoma detection rates across a health system: a population-level study Journal Articles

Overview

abstract

authors

status

publication date

has subject area

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

start page

end page

volume

issue