Manara - Qatar Research Repository
Browse

New world of big data - new challenges for evidence synthesis: Impact of data duplication on estimates generated by meta-analyses, and the development of a framework for its identification and management

Download (660.84 kB)
journal contribution
submitted on 2025-02-04, 07:09 and posted on 2025-02-04, 07:47 authored by Merilyn Lock, Walid El Ansari

Objective

The aim of this study was to highlight the effects of entering duplicated or overlapping data from published studies using the same data registries into a meta-analysis, including its identification and management using a novel structured framework.

Study Design and Setting

Secondary analysis of data from a proportional meta-analysis of 30-day cumulative incidence of venous thromboembolic events (VTE) after metabolic and bariatric surgery was performed. Sensitivity analysis was conducted a) including all studies regardless of duplication (uncorrected sample) and b) comparing it to a corrected sample of studies. We developed a decision tree framework to identify duplicated data from prospective studies and data registries.

Results

We demonstrated that biasing from duplicated data, primarily from data registries, underestimated the incidence of VTE in the literature by 0.15% of the patient population (an erroneous difference equivalent to 22.06% of total VTE). This error persisted at 8.16% of total VTE when limiting to studies using a primarily laparoscopic approach. The decision tree framework used a comparison of the data source (country and hospital or registry), sampling timeframe (dates/years of included data) and inclusion characteristics (included procedures/diagnoses or inclusion criteria) to identify potentially duplicated data. Inter-rater reliability was excellent (κ=1.00, p<0.001), although only 17.86% of studies coded as containing data duplication were be verified by the authors while the remaining studies could not be verified. Lastly, we identified a strong lack of diversity in the geographical origins of the data from the included studies.

Conclusion

We demonstrated that including duplicated data in a meta-analysis can result in substantially inaccurate pooled estimates. We outlined a comprehensive decision tree framework that future researchers can apply to assist with decision making when identifying and managing duplicated data, including that from data registries or other publicly accessible datasets.

Other Information

Published in: Journal of Clinical Epidemiology
License: http://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.1016/j.jclinepi.2024.111641

Funding

Open Access funding provided by the Qatar National Library.

History

Language

  • English

Publisher

Elsevier

Publication Year

  • 2024

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License.

Institution affiliated with

  • Hamad Medical Corporation
  • Qatar University
  • Qatar University Health - QU
  • College of Medicine - QU HEALTH

Usage metrics

    Hamad Medical Corporation

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC