OutPyR: Bayesian inference for RNA-Seq outlier detection
High-throughput RNA sequencing technologies (RNA-Seq) have recently started being used as a tool for helping diagnose rare genetic disorders, as they can indicate abnormal gene expression counts — a telltale sign of genetic pathology. Existing solutions either require a large number of samples or do not provide proper statistical significance testing.
We present a Bayesian model (OutPyR) for identifying abnormal RNA-Seq gene expression counts in datasets, particularly those with a small number of samples. The model incorporates recently introduced data-augmentation techniques to efficiently and accurately infer parameters of the underlying negative binomial process, while also assessing the uncertainty of the inference, and giving the possibility to generate simulated data. The model's software implementation is object oriented and thus easily extensible, provides parameter-trace exploration, fault-tolerance and recovery during the parameter estimation process. We also develop a P-value based outlier score that naturally stems from our model. We apply the model to real and simulated datasets, for different organisms and tissues, and present comparisons with existing models.
Other Information
Published in: Journal of Computational Science
License: http://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.1016/j.jocs.2020.101245
History
Language
- English
Publisher
ElsevierPublication Year
- 2020
License statement
This Item is licensed under the Creative Commons Attribution 4.0 International LicenseInstitution affiliated with
- Hamad Bin Khalifa University
- College of Science and Engineering - HBKU
- Qatar Computing Research Institute - HBKU
- Qatar Biomedical Research Institute - HBKU
- College of Health and Life Sciences - HBKU