Manara - Qatar Research Repository
Browse

DiMA: sequence diversity dynamics analyser for viruses

journal contribution
submitted on 2025-06-10, 09:41 and posted on 2025-06-30, 06:43 authored by Shan Tharanga, Eyyüb Selim Ünlü, Yongli Hu, Muhammad Farhan Sjaugi, Muhammet A Çelik, Hilal Hekimoğlu, Olivo Miotto, Muhammed Miran Öncel, Asif M Khan

Sequence diversity is one of the major challenges in the design of diagnostic, prophylactic, and therapeutic interventions against viruses. DiMA is a novel tool that is big data-ready and designed to facilitate the dissection of sequence diversity dynamics for viruses. DiMA stands out from other diversity analysis tools by offering various unique features. DiMA provides a quantitative overview of sequence (DNA/RNA/protein) diversity by use of Shannon’s entropy corrected for size bias, applied via a user-defined k-mer sliding window to an input alignment file, and each k-mer position is dissected to various diversity motifs. The motifs are defined based on the probability of distinct sequences at a given k-mer alignment position, whereby an index is the predominant sequence, while all the others are (total) variants to the index. The total variants are sub-classified into the major (most common) variant, minor variants (occurring more than once and of incidence lower than the major), and the unique (singleton) variants. DiMA allows user-defined, sequence metadata enrichment for analyses of the motifs. The application of DiMA was demonstrated for the alignment data of the relatively conserved Spike protein (2,106,985 sequences) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the relatively highly diverse pol gene (2637) of the human immunodeficiency virus-1 (HIV-1). The tool is publicly available as a web server (https://dima.bezmialem.edu.tr), as a Python library (via PyPi) and as a command line client (via GitHub).

Other Information

Published in: Briefings in Bioinformatics
License: https://creativecommons.org/licenses/by-nc/4.0/
See article on publisher's website: https://dx.doi.org/10.1093/bib/bbae607

Funding

The computational resources and services used in this work were provided by Perdana University School of Data Sciences, Malaysia and Bezmialem Vakif University, Turkey.

AMK was supported by University of Doha for Science and Technology, Qatar, Perdana University, Malaysia, Bezmialem Vakif University, Turkey, and The Scientific and Technological Research Council of Turkey (TÜBİTAK).

2232 International Fellowship for Outstanding Researchers Program of TÜBİTAK (Project No: 118C314).

History

Language

  • English

Publisher

Oxford University Press

Publication Year

  • 2024

License statement

This Item is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License

Institution affiliated with

  • University of Doha for Science and Technology
  • College of Computing and Information Technology - UDST

Related Datasets

https://github.com/PU-SDS/DiMA