Manara - Qatar Research Repository
Browse

AI-Based End-to-End Solutions for Fine-Grained Classification, Detection and Segmentation in Real-World Scenarios

Download (148.34 MB)
thesis
submitted on 2025-06-18, 08:04 and posted on 2025-06-18, 08:07 authored by Nauman Ullah Gilal

This thesis presents solutions to the complex challenges posed by Learning with Noisy Labels (LNL) and class imbalance in machine learning applications. LNL represents a significant hurdle; although deep neural networks (DNNs) have the inherent ability to memorize labels, they tend to overfit when trained on noisy samples. Simultaneously, class imbalance, characterized by uneven data representation across categories, leads to models that favor more frequently represented classes. To mitigate these issues, we propose domain-specific methodologies designed to improve learning efficiency and model accuracy.

We introduce a dual-stage training scheme named Cyclic Filtering and Loss Damping (CYFLOD) to combat noisy labels in fine-grained datasets, which seamlessly integrates with any standard transfer learning framework. This approach consists of (i) a cyclical iterative filtering scheme for ongoing learning process refinement and (ii) a cyclical loss damping technique using a SmoothStep function that enhances any loss function. CYFLOD effectively removes noisy labels iteratively, improves data quality, and boosts model generalization. Its efficacy is demonstrated across various datasets, including Stanford Cars, Aircraft, CIFAR-10, and Food-101, under different label noise conditions (symmetric and asymmetric), showcasing CYFLOD's robustness.

We applied this general solution to a variety of frameworks for real-world compelling scenarios:

AutoCleanDeepFood, tailored for food computing, encompasses (i) an automated pre-processing pipeline for creating custom datasets reflective of regional gastronomy; (ii) a transfer learning-based training scheme that filters noisy labels through loss ranking and addresses class imbalance with a Russian Roulette probabilistic scheme; and (iii) a GPU-based model deployment on smartphones for real-time inference. Specializing in Arabic regional gastronomy and Qatari cuisine, this framework significantly enhances accuracy on curated and noisy datasets, such as ETH Food-101. It also introduces MENA-150, which features 150 traditional dishes from the Middle East and North Africa (MENA) region, achieving notable accuracy improvements.

Mobile Dermoscopy, an end-to-end framework for melanoma detection, incorporates (i) a radial blurring data augmentation scheme for oversampling; (ii) iterative dataset cleaning to achieve class balance by selectively removing benign images; (iii) a custom loss function integrating weighted Recall Cross-entropy loss for precise false positive and false negative differentiation; and (iv) a mobile application for real-time mole detection. Demonstrating high performance, this framework achieves an accuracy of 96.9%, a balanced accuracy of 98%, and a ROCAUC of 0.98 on the ISIC 2020 dataset.

Noise2Seg, framework tailored for marine science in the context of detection and segmentation of tropical fishes, includes (i) an automated web scraping tool for data collection; (ii) a robust few-shot selection (FSS) technique that ranks images by their losses to select those with minimal losses for manual annotation; (iii) a manual annotation process via the cloud-based platform Roboflow; and (iv) the application of YOLOv8 for detection and segmentation tasks. Additionally, the introduction of the Qatar Tropical Fishes-10 (QTF-10) dataset highlights the effectiveness of our automatic FSS method over manual segmentation, achieving superior performance metrics.

Together, all these frameworks provide a significant advance in the handling of LNL and class imbalance, showcasing scalable and adaptable applications that address similar challenges globally. This thesis contributes to the development of machine learning models capable of efficiently learning from noisy and imbalanced data across heterogeneous domains.

History

Language

  • English

Publication Year

  • 2024

License statement

© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Science and Engineering - HBKU

Geographic coverage

Qatar

Degree Date

  • 2024

Degree Type

  • Doctorate

Advisors

Marco Agus | Jens Schneider

Committee Members

Raian Ali | Dena A. Al-Thani | Yusuf Bicer | Filippo Stanco | Johan Ericsson

Department/Program

College of Science and Engineering

Usage metrics

    College of Science and Engineering - HBKU

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC