Manara - Qatar Research Repository
Browse

Open-Domain Conversational Search: Addressing Challenges and Limitations Using Reformulation and Data Augmentation

Download (4.5 MB)
thesis
submitted on 2024-10-28, 10:26 and posted on 2024-10-30, 09:55 authored by Haya Al-Thani
Conversational search (CS) presents unique challenges, particularly context-awareness and data limitation. This dissertation investigates these challenges and proposes novel solutions to address them. We explore conversational query reformulation using a text-to-text model and a binary-term classifier, highlighting their advantages and applying techniques to improve performance, such as query clarity score and multi-model fusion. By combining both reformulation models, we achieved state-of-the-art results. Additionally, we developed a system that selectively incorporates responses into conversation history, improving the CS system’s ability to retrieve passages for ambiguous queries. To tackle the issue of limited training data, we introduced paraphrasing as a data augmentation method, increasing the size of our CS dataset by over 665% and enhancing language diversity. We used automatic paraphrase generation combined with human-in-the-loop techniques to produce a high-quality dataset, the Expanded-CAsT (ECAsT). The ECAsT dataset serves as a valuable resource for the CS research community, offering numerous applications. We utilized ECAsT to assess the robustness of CS evaluation concerning language diversity and to train two novel multi-turn paraphrasing models with potential applications in query expansion, data augmentation, and passage retrieval. This dissertation contributes to the advancement of conversational search by addressing its main challenges and providing innovative techniques and resources for the research community.

History

Language

  • English

Publication Year

  • 2023

License statement

© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Science and Engineering - HBKU

Degree Date

  • 2023

Degree Type

  • Doctorate

Advisors

J. Jansen Bernard ; Tamer Elsayed

Committee Members

Yin Yang ; Jens Schneider ; Muammer Koc ; Henning Horn ; Junaid Qadir

Department/Program

College of Science & Engineering

Usage metrics

    College of Science and Engineering - HBKU

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC