Manara - Qatar Research Repository
Browse
10.1109_access.2021.3082311.pdf (5.62 MB)

A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data

Download (5.62 MB)
journal contribution
submitted on 2023-08-29, 06:58 and posted on 2023-09-24, 11:32 authored by Edin Salkovic, Halima Bensmail

The Negative Binomial distribution (NBD) is used for modeling many types of count data, including gene expression counts obtained by RNA sequencing technologies (RNA-Seq). Finding outliers in this type of data has been shown in recent research to help in identifying rare genetic disorders in humans. Existing Bayesian approaches to detecting outliers in data following the NBD are either computationally inefficient or too general and hence do not leverage the NBD's specificities in an optimal way. We present a novel Bayesian outlier score for data following the NBD, relying on recent advances in the inference of its dispersion parameter through a special method of Gibbs sampling. The novel Bayesian model on which our score is based - OutPyRX (Outlier detection in Python for RNA-Seq, eXtended version) - improves the model of its predecessor OutPyR by introducing novel parameters that are derived from OutPyR's. These novel parameters allow more than 6 times faster convergence of the novel outlier score compared to OutPyR's while having a negligible computational impact on the Gibbs sampling procedure. We show that, in terms of area under precision-recall curve (AUC) values, the novel score outcompetes existing scores on 21 out of 24 datasets that we derived from 4 real datasets by injecting artificial outliers. However, OutPyRX does not perform confounder control which is required for some datasets containing biological outliers. The model is general and can be applied to other similar count data. The code for our model is available at https://github.com/esalkovic/outpyrx.

Other Information

Published in: IEEE Access
License: https://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.1109/access.2021.3082311

Funding

Open Access funding provided by the Qatar National Library.

History

Language

  • English

Publisher

IEEE

Publication Year

  • 2021

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License.

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Science and Engineering - HBKU
  • Qatar Computing Research Institute - HBKU