Public health surveillance of changes in attitudes towards cancer risk factors during the COVID-19 pandemic: Sentiment and emotion analysis of Twitter data (Preprint) Academic Article uri icon

  • Overview
  • Research
  • Identity
  • View All



    The COVID-19 pandemic and associated public health mitigation strategies dramatically changed patterns of daily life activities worldwide. Public health restrictions during the pandemic had unintentional consequences on chronic disease risk factors. Cancer is a leading chronic disease worldwide with several known modifiable risk factors, including smoking, alcohol, poor nutrition, and physical inactivity.


    The study objectives were to conduct a sentiment and emotion analysis using Twitter data to evaluate changes in attitudes towards four cancer risk factors (physical inactivity, poor nutrition, alcohol, and smoking) over time during the first year of the COVID-19 pandemic.


    Tweets during 2020 relating to COVID-19 and the four cancer risk factors were extracted from the George Washington University Libraries’ Dataverse. From there, Tweets were defined and filtered using key words to create four unique datasets. We trained and tested a machine learning classifier using a pre-labelled Twitter dataset. This was applied to find the sentiment (positive, negative, or neutral) of each tweet. A natural language processing package was used to identify the emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust) based on the words contained in the tweets. Sentiments and emotions related to each of the risk factors were evaluated over time and word clouds were presented to evaluate common key words that emerged.


    The sentiment analysis revealed that 57% of tweets about physical activity were positive, 16% negative and 27% neutral (n=90,813 tweets). Similar patterns were observed for nutrition, where 55%, 16%, and 29% of tweets were classified as positive, negative, or neutral, respectively (n=50,396 tweets). For alcohol the proportion of positive, negative, and neutral tweets were 47%, 23%, and 30% (from a total n=74,484 tweets) and for smoking the distribution was 41%, 24% and 35%, respectively (n=28,220 tweets). The sentiments were relatively stable over time. Results from the emotion analysis suggest that the most common emotion expressed across physical activity and nutrition tweets was trust, whereas for alcohol the most common emotion was joy and for smoking it was fear. The emotions expressed remained relatively constant over the observed time period. Analysis of the word clouds revealed some further insights into common themes expressed in relation to some of the risk factors and revealed possible sources of bias.


    The results of this analysis provided insight into the attitudes towards cancer risk factors as expressed on Twitter during the first year of the COVID-19 pandemic. Overall, for all four risk factors, most tweets had a positive sentiment and varied emotions across the different datasets. While these results can play a role in promoting public health, more work is needed to understand how this can be translated into meaningful data to inform public health interventions in a timely manner.



publication date

  • March 5, 2023