Manara - Qatar Research Repository
Browse

Machine Learning Approaches to Predict Protein Crystallization Propensities

Download (2.96 MB)
thesis
submitted on 2025-02-18, 07:50 and posted on 2025-02-20, 07:07 authored by Abdurrahman Elbasir
Protein structure determination has primarily been performed using X-ray crystallography. In order to overcome the expensive cost, high attrition rate, and series of trial and error settings, many in-silico methods have been developed to predict protein crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors through extracting features from protein sequences. This is computationally expensive and can explode the feature space. Moreover, most of the developed methods rely on extracting physio-chemcial, sequence-based, and functional features from the raw protein sequences. Nevertheless, the challenge remains to identify novel biological features that can accurately estimate protein crystallization propensity. The focus of this work is to apply deep learning and machine learning models that can predict the protein crystallization propensity in two different directions. The first method is by using deep learning techniques to predict proteins that will crystallizable from the proteins that will not; without the need to manually engineer any features. The second method is based on applying a machine learning method that uses many common sequence-derived and physiochemical features, as well as novel secondary structure and disordered features derived from the protein sequences. Additionally, interpretation for the output of the machine learning method is provided by explaining the most important features which drive the model predictions, either towards non-crystallizable proteins or the protein diffraction-quality crystals class. The two in-silico methods in this work successfully outperformed all the existing state of the art protein crystallization classifiers. Furthermore, they provide highly accurate and precise predictions to the crystallographers who can use them to pick proteins of their interest which are highly likely to produce diffraction-quality crystals.

History

Language

  • English

Publication Year

  • 2020

License statement

© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Science and Engineering - HBKU

Degree Date

  • 2020

Degree Type

  • Doctorate

Advisors

Halima Bensmail

Committee Members

Muammer Koc ; Tanvir Alam ; Abdelilah Arredouani ; Othmane Bouhali ; Abdesselam Bouzerdoum

Department/Program

College of Science and Engineering

Usage metrics

    College of Science and Engineering - HBKU

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC