submitted on 2025-02-18, 07:50 and posted on 2025-02-20, 07:07authored byAbdurrahman Elbasir
Protein structure determination has primarily been performed using X-ray crystallography. In order to overcome the expensive cost, high attrition rate, and series of trial and error settings, many in-silico methods have been developed to predict protein crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors through extracting features from protein sequences. This is computationally expensive and can explode the feature space. Moreover, most of the developed methods rely on extracting physio-chemcial, sequence-based, and functional features from the raw protein sequences. Nevertheless, the challenge remains to identify novel biological features that can accurately estimate protein crystallization propensity. The focus of this work is to apply deep learning and machine learning models that can predict the protein crystallization propensity in two different directions. The first method is by using deep learning techniques to predict proteins that will crystallizable from the proteins that will not; without the need to manually engineer any features. The second method is based on applying a machine learning method that uses many common sequence-derived and physiochemical features, as well as novel secondary structure and disordered features derived from the protein sequences. Additionally, interpretation for the output of the machine learning method is provided by explaining the most important features which drive the model predictions, either towards non-crystallizable proteins or the protein diffraction-quality crystals class. The two in-silico methods in this work successfully outperformed all the existing state of the art protein crystallization classifiers. Furthermore, they provide highly accurate and precise predictions to the crystallographers who can use them to pick proteins of their interest which are highly likely to produce diffraction-quality crystals.