Manara - Qatar Research Repository
Browse

Automatically Evaluating the Credibility of Health News and Measuring Its Engagement on Social Media

Download (1.09 MB)
thesis
submitted on 2025-02-20, 10:40 and posted on 2025-02-20, 10:41 authored by Asma Hamid El Fadl

Background

Online sources covering variety of topics including health related information are increasing at a tremendous speed. It is important that users are aware of the authenticity and credibility of these information shared on social media and online sources. However, manually apprising the credibility of these online information is not feasible due its sheer volume; therefore, it is important to find methods to estimate the credibility of these information using automatic methods.

Aim

The aim of this thesis is to develop machine learning models to estimate the credibility of health news in social media.

Methods

We have downloaded the data from the public repository (Fake Health) that contains two separate datasets, Health Release and Health Story. Both are Web pages that were shared on Twitter. These Web pages were manually labelled against a 10-point checklist. The dataset also, included tweets, retweets, and replies in which these Web pages were shared. We used these manually labelled data to evaluate the performance of our classifiers. We used H2O Python library to evaluate a set of classifiers and used Accuracy, AUC, Specificity and Sensitivity to compare the classifiers. Additionally, we also measured the engagement of the Web pages shared on social media through the number of tweets, retweets, replies, likes and potential exposure.

Results

The highest performing classifier has 93% accuracy for Health Release and 87% for Health Story. AUC for highest performing classifier was 87% for Health Release and 90% for Health Story. Specificity and sensitivity were 95%, 85% in Health Release and 94%, 89% in Health Story, respectively. We found that low credible Web pages were posted more frequently on Twitter than high credible Web pages.

Conclusion

The results indicate that it is possible to estimate the credibility of Web pages automatically using machine learning based methods. Users tend to disseminate low credibility Web pages more often than high credibility Web pages.

History

Language

  • English

Publication Year

  • 2021

License statement

© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Science and Engineering - HBKU

Degree Date

  • 2021

Degree Type

  • Master's

Advisors

Zubair Shah

Committee Members

Raian Ali ; Yusuf Bicer ; Tanvir Alam ; Dena Al-Thani

Department/Program

College of Science and Engineering

Usage metrics

    College of Science and Engineering - HBKU

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC