A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis

Abd-Alrazaq, Alaa; Schneider, Jens; Mifsud, Borbala; Alam, Tanvir; Househ, Mowafa; Hamdi, Mounir; Shah, Zubair

doi:10.2196/23703

PDF.pdf (493.03 kB)

A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis

journal contribution

submitted on 2024-05-06, 10:25 and posted on 2024-05-06, 10:25 authored by Alaa Abd-Alrazaq, Jens Schneider, Borbala Mifsud, Tanvir Alam, Mowafa Househ, Mounir Hamdi, Zubair Shah

Background

Shortly after the emergence of COVID-19, researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccinations. This led to a rapid increase in the number of COVID-19–related publications. Identifying trends and areas of interest using traditional review methods (eg, scoping and systematic reviews) for such a large domain area is challenging.

Objective

We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature.

Methods

We used the COVID-19 Open Research Dataset (CORD-19) that consists of a large number of research articles related to all coronaviruses. We used a machine learning–based method to analyze the most relevant COVID-19–related articles and extracted the most prominent topics. Specifically, we used a clustering algorithm to group published articles based on the similarity of their abstracts to identify research hotspots and current research directions. We have made our software accessible to the community via GitHub.

Results

Of the 196,630 publications retrieved from the database, we included 28,904 in our analysis. The mean number of weekly publications was 990 (SD 789.3). The country that published the highest number of COVID-19–related articles was China (2950/17,270, 17.08%). The highest number of articles were published in bioRxiv. Lei Liu affiliated with the Southern University of Science and Technology in China published the highest number of articles (n=46). Based on titles and abstracts alone, we were able to identify 1515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, and 362 randomized control trials. We identified 19 different topics covered among the publications reviewed. The most dominant topic was public health response, followed by clinical care practices during the COVID-19 pandemic, clinical characteristics and risk factors, and epidemic models for its spread.

Conclusions

We provide an overview of the COVID-19 literature and have identified current hotspots and research directions. Our findings can be useful for the research community to help prioritize research needs and recognize leading COVID-19 researchers, institutes, countries, and publishers. Our study shows that an AI-based bibliometric analysis has the potential to rapidly explore a large corpus of academic publications during a public health crisis. We believe that this work can be used to analyze other eHealth-related literature to help clinicians, administrators, and policy makers to obtain a holistic view of the literature and be able to categorize different topics of the existing research for further analyses. It can be further scaled (for instance, in time) to clinical summary documentation. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors.

Other Information

Published in: Journal of Medical Internet Research
License: https://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.2196/23703

History

Language

English

Publisher

JMIR Publications

Publication Year

2021

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License

Institution affiliated with

Hamad Bin Khalifa University
College of Science and Engineering - HBKU
College of Health and Life Sciences - HBKU

Methodology

We used the COVID-19 Open Research Dataset (CORD-19) that consists of a large number of research articles related to all coronaviruses. We used a machine learning–based method to analyze the most relevant COVID-19–related articles and extracted the most prominent topics. Specifically, we used a clustering algorithm to group published articles based on the similarity of their abstracts to identify research hotspots and current research directions. We have made our software accessible to the community via GitHub.

Usage metrics

Keywords

novel coronavirus disease COVID-19 SARS-CoV-2 2019-nCoV bibliometric analysis literature machine learning research review

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis

Background

Objective

Methods

Results

Conclusions

Other Information

History

Language

Publisher

Publication Year

License statement

Institution affiliated with

Methodology

Usage metrics

Categories

Keywords

Licence

Exports