Abstract

People often struggle to understand scientific texts, which leads to miscommunication and often to inaccurate and even sensationalistic reports of research. Identifying and achieving a better understanding of the factors that affect comprehension would be helpful to analyze what improves public understanding of science. In this study, we generate features from scientific text that represent some common text structures and use them to predict the semantic similarity between the scientific text and the textual content posted by the general public about the same scientific text online. In this endeavor, we built regression models to achieve this purpose and evaluated them based on their R-squared values and mean squared errors. R-squared values as high as 0.73 were observed, indicating a high chance of a relationship between certain textual features and the public’s understanding of science.

Authors: Harish Varma Siravuri, Akhil Pandey Akella, Christian Bailey, Hamed Alhoori

Paper: https://dl.acm.org/doi/abs/10.1145/3197026.3203890

Code: https://github.com/harishsiravuri/public_understanding