Speeding Up T-SNE with GPU to Dissect Genetic Variants Pathogenicity Classification Using Several Machine Learning Algorithms
The ClinVar dataset is a huge dataset that depends on the degree of pathogenicity. The T-Distributed Neighbor Embedding (T-SNE) methodology is one of the most powerful methods for clustering this data due to its inherent high dimensionality and a large number of samples. It is one of the most popular visualization methods that employ dimensionality reduction other than principal component analysis. However, because every multivariate data point in large dimensions requires paired calculation, this technique is intrinsically sluggish. To shorten t-SNe running time and improve visualization quality, the GPU was the best alternative. In this thesis, we used the Graphics Processing Unit (GPU) to apply the t-SNE algorithm on the ClinVar data in order to speed up the calculation while maintaining good classification performance. We employed a number of procedures to accomplish both high-performance classification and reduced computing time. Several classifiers were used, and the Shapley additive explanations scores were fed to t-SNE to generate the classifier's performance on the data. After each phase, the pathogenicity grouping improved dramatically, according to the comparison.
History
Language
- English
Publication Year
- 2021
License statement
© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.Institution affiliated with
- Hamad Bin Khalifa University
- College of Science and Engineering - HBKU
Degree Date
- 2021
Degree Type
- Master's