Finding Other Sources: Using Machine Learning and Thematic Analysis to Extract Information about Diabetes Self-Management from Open-Source Data
Background
The amount of health-related online content is steadily increasing due to the proliferation of m-health applications and their growing use. Those content, which is publicly available on open-data platforms, may be used to support public health.
Objective
This paper’s main aim is to explore the possibility of using topic modeling techniques to extract comparable topics from user reviews of diabetes applications to those found in a conventional Diabetes self-management questionnaire.
Method
Leveraging Python (Google Play Scraper) and a set of predefined search terms ("Blood Sugar", "Diabetes", and "Glucose"), we extracted a total of 153.3k diabetes apps users reviews and related metadata such as user ratings and timestamps. We performed sentiment analysis on our dataset and analyzed the collected reviews using single (unigrams) word frequencies and double words (bigrams). To classify the topics discussed in the articles, we used the Latent Dirichlet Allocation for topic modeling. Lastly, we further mapped the user reviews to the found topics using a string-matching technique and measured the interaction rate per topic.
Results
Our analysis identified a total of 15 topics in the user reviews of diabetes applications, which were grouped into two main themes: "Self-Monitoring of Diabetes" and "Diabetes Self-management Technology". These topics were compared to the four categories found in the Diabetes Self-Management (DSM) questionnaire, with eight of the user reviews topics correlating with the questionnaire's four categories.
Conclusion
The dataset extracted from user reviews of diabetes apps was used to generate topics similar to that collected from the DSM questionnaire and some additional data. Consequently, these findings suggest that analyzing open-source data to assess chronically ill populations' general trends and behaviors appears to be a highly feasible choice that could allow for continuous monitoring at a reduced cost.
History
Language
- English
Publication Year
- 2021
License statement
© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.Institution affiliated with
- Hamad Bin Khalifa University
- College of Science and Engineering - HBKU
Degree Date
- 2021
Degree Type
- Master's