Manara - Qatar Research Repository
Browse

A Novel Two-Fold Loss Function for Data Clustering and Reconstruction: Application to Document Analysis

Download (1.87 MB)
journal contribution
submitted on 2024-02-18, 08:26 and posted on 2024-02-18, 08:26 authored by Mebarka Allaoui, Mohammed Lamine Kherfi, Oussama Aiadi, Samir Brahim Belhaouari

In the midst of the ongoing COVID-19 pandemic, there has been a surge in scientific literature aimed at understanding the virus and its impact. However, it has become challenging for a researcher to deal with thousands of articles published daily. This paper proposes a novel deep-learning architecture to organize a large dataset of COVID-19-related scientific literature and provides a clear overview of the current state of knowledge. The proposed model is developed based on two main bases to ensure robustness and efficiency. In particular, we trained a denoising autoencoder with clean and noisy data to make the model can balance, preserving the underline structure and generalizing the new unseen data. Furthermore, the cornerstone of the proposed architecture lies in training the autoencoder using a two-fold objective function that jointly incorporates the data’s reconstruction and clustering. The advantage behind this combination is to avoid the distortion of the latent space and to improve the model efficiency. Afterward, we use the Latent Dirichlet Allocation (LDA) to analyze the document’s topics. For the sake of computational efficiency, instead of feeding the LDA with the whole dataset of documents, we fed it with the clusters produced in the phase of dimensionality reduction and clustering to count the frequency of topics in each cluster. The model was trained on a large public corpus of COVID-19-related articles and evaluated using a set of evaluation metrics. Experimental results indicate the superiority of our proposed model compared to several recent studies.

Other Information

Published in: IEEE Access
License: https://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.1109/access.2023.3312622

Funding

Open Access funding provided by the Qatar National Library.

History

Language

  • English

Publisher

IEEE

Publication Year

  • 2023

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Science and Engineering - HBKU