Automatically quantifying the scientific quality and sensationalism of news records mentioning pandemics: validating a maximum entropy machine-learning model

OBJECTIVE: To develop and validate a method for automatically quantifying the scientific quality and sensationalism of individual news records. STUDY DESIGN: After retrieving 163,433 news records mentioning the Severe Acute Respiratory Syndrome (SARS) and H1N1 pandemics, a maximum entropy model for inductive machine learning was used to identify relationships among 500 randomly sampled news records that correlated with systematic human assessments of their scientific quality and sensationalism. These relationships were then computationally applied to automatically classify 10,000 additional randomly sampled news records. The model was validated by randomly sampling 200 records and comparing human assessments of them to the computer assessments. RESULTS: The computer model correctly assessed the relevance of 86% of news records, the quality of 65% of records, and the sensationalism of 73% of records, as compared to human assessments. Overall, the scientific quality of SARS and H1N1 news media coverage had potentially important shortcomings, but coverage was not too sensationalizing. Coverage slightly improved between the two pandemics. CONCLUSION: Automated methods can evaluate news records faster, cheaper, and possibly better than humans. The specific procedure implemented in this study can at the very least identify subsets of news records that are far more likely to have particular scientific and discursive qualities.

Automatically quantifying the scientific quality and sensationalism of news records mentioning pandemics: validating a maximum entropy machine-learning model Journal Articles