Adaptive Approaches to Anomaly Detection in RNA-Seq Data Using Deep Autoencoding Gaussian Mixture Models
We explore the application of a Deep Autoencoding Gaussian Mixture Model for identifying aberrant genes in RNA sequencing data, aiming to advance the understanding of Mendelian disease through genomics. By implementing energy-based anomaly detection and clustering techniques, we evaluate the model's performance across various configurations, including different latent space dimensions and the integration of different features. Our research findings suggest that model configuration has a significant influence on the detection of anomalies.
Employing two distinct approaches revealed notable insights. The first approach, focusing on energy-based detection, underscored the model's capability to identify anomalies through extreme energy values, with the known anomalous gene MGST1 detected within the highest energy percentiles (95%). Similarly, the second approach, which deployed clustering-based detection, showed an enhanced capacity for isolating anomaly genes. Specifically, a trial employing a more refined model configuration demonstrated the model's precision in clustering three known anomalies within a significantly smaller subset (5%) of the data, suggesting a robust framework for anomaly detection in genomics.
These observations highlight the potential of Deep Autoencoding Gaussian Mixture Models in genomic anomaly detection, while also pointing to the complexities of model configuration and the necessity for further research to optimize detection accuracy. Our work contributes valuable insights into the application of machine learning techniques for genetic analysis, marking a significant step towards utilizing the power of computational methods in unraveling the complexities of genetic diseases.
History
Language
- English
Publication Year
- 2024
License statement
© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.Institution affiliated with
- Hamad Bin Khalifa University
- College of Science and Engineering - HBKU
Degree Date
- 2024
Degree Type
- Master's