Manara - Qatar Research Repository
Browse

Avey: An Accurate AI Algorithm for Self-Diagnosis

Download (4.8 MB)
preprint
submitted on 2025-02-03, 11:16 and posted on 2025-05-05, 06:12 authored by Mohammad Hammoud, Shahd Douglas, Mohamad Darmach, Sara Alawneh, Swapnendu Sanyal, Youssef Kanbour

Objectives

To present our AI-based symptom checker, rigorously measure its accuracy, and compare it against existing popular symptom checkers and seasoned primary care physicians.

Design

Vignettes study.

Setting

400 gold-standard primary care vignettes.

Intervention/Comparator

We utilized 7 standard accuracy metrics for evaluating the performance of 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced general practitioners. To the best of our knowledge, this yielded the largest benchmark vignette suite in the field thus far. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further directly compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 years.

Primary Outcome Measures

We thoroughly studied the diagnostic accuracies of symptom checkers and physicians from 7 standard angles, including: (a) M1, M3, and M5 as measures of a symptom checker’s or a physician’s ability to return a vignette’s main diagnosis at the top, among the first 3 diseases, or among the first 5 diseases of their differential diagnosis, respectively (b) recall as a measure of the percentage of relevant diseases that are returned in a symptom checker’s or a physician’s differential diagnosis, (c) precision as a measure of the percentage of diseases in a symptom checker’s or a physician’s differential diagnosis that are relevant, (d) F1-measure as a trade-off measure between recall and precision, and (e) Normalized Discounted Cumulative Gain or NDCG as a measure of ranking quality of a symptom checker’s or a physician’s differential diagnosis.

Results

Our AI-based symptom checker, namely, Avey significantly outperformed 5 popular symptom checkers, namely, Ada, WebMD, K Health, Buoy, and Babylon by averages of 24.5%, 175.5%, 142.8%, 159.6%, 2968.1% using M1; 22.4%, 114.5%, 123.8%, 118.2%, 3392% using M3; 18.1%, 79.2%, 116.8%, 125%, 3114.2% using M5; 25.2%, 65.6%, 109.4%, 154%, 3545% using recall; 8.7%, 88.9%, 66.4%, 88.9%, 2084% using F1-measure; and 21.2%, 93.4%, 113.3%, 136.4%, 3091.6% using NDCG, respectively. Under precision, Ada outperformed Avey by an average of 0.9%, while Avey surpassed WebMD, K Health, Buoy, and Babylon by averages of 103.2%, 40.9%, 49.6%, and 1148.5%, respectively. To the contrary of symptom checkers, physicians outperformed Avey by averages of 37.1% and 1.2% using precision and F1-measure, while Avey exceeded them by averages of 10.2%, 20.4%, 23.4%, 56.4%, and 25.1% using M1, M3, M5, recall, and NDCG, respectively. To facilitate the reproducibility of our study and support future related studies, we made all our gold-standard vignettes publicly and freely available. Moreover, we posted online all the results of the symptoms checkers and physicians (i.e., 45 sets of experiments) to establish a standard of full transparency and enable verifying and cross validating our results.

Conclusions

Avey tremendously outperformed the considered symptom checkers. In addition, it compared favourably to physicians, whereby it underperformed them under some accuracy metrics (e.g., precision and F1-measure), but outperformed them under some others (e.g., M1, M3, M5, recall, and NDCG). We will continue evolving Avey’s AI model. Furthermore, we will study its usability with real patients, examine how they respond to its suggestions, and measure its impact on their subsequent choices for care, among others.

Other Information

Published on: medRxiv
License: https://creativecommons.org/licenses/by-nc/4.0/
See article on publisher's website: https://dx.doi.org/10.1101/2022.03.08.22272076

Funding

This study was fully funded by Rimads QSTP-LLC.

History

Language

  • English

Publisher

Cold Spring Harbor Laboratory

Publication Year

  • 2022

License statement

This Item is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Institution affiliated with

  • Qatar Science & Technology Park
  • Rimads QSTP-LLC