submitted on 2024-08-01, 08:24 and posted on 2024-08-01, 08:25authored byMuhammad Arif, Ge Fang, Ali Ghulam, Saleh Musleh, Tanvir Alam
<h3>Background</h3><p dir="ltr">Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor’s performance is still not satisfactory.</p><h3>Methods</h3><p dir="ltr">In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF.</p><h3>Results</h3><p dir="ltr">The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew’s-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process.</p><h2>Other Information</h2><p dir="ltr">Published in: BMC Bioinformatics<br>License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1186/s12859-024-05744-3" target="_blank">https://dx.doi.org/10.1186/s12859-024-05744-3</a></p><p><br></p>
This Item is licensed under the Creative Commons Attribution 4.0 International License.
Institution affiliated with
Hamad Bin Khalifa University
College of Science and Engineering - HBKU
Methodology
In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF.
Related Datasets
Muhammad Arif. 2023. DPI_CDF. Last modified 2023. GitHub Repository. https://github.com/Muhammad-Arif-NUST/DPI_CDF