submitted on 2025-03-16, 05:35 and posted on 2025-03-16, 05:37authored byOmar Maddouri
Biological data and knowledge bases are increasingly relying on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. Over the last decade, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. In this thesis, we have developed a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate node representations (embeddings) that encode for related information within knowledge graphs. Through the use of symbolic logic, we have shown that these embeddings contain both explicit and implicit information. We have applied these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations. Similarly, we have learned and applied our embeddings to the prediction of disease comorbidities in an additional knowledge graph designed for this purpose and centered on disease instances. Importantly, our approach have demonstrated a performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Interestingly, our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge databases in biology and will expand its usage in machine learning and data analytics.