Manara - Qatar Research Repository
Browse

Altruistic Crowdsourcing for Arabic Speech Corpus Annotation

Download (504.52 kB)
journal contribution
submitted on 2024-09-18, 10:26 and posted on 2024-09-18, 10:26 authored by Soumia Bougrine, Hadda Cherroun, Ahmed Abdelali

Crowdsourcing is an emerging collaborative approach that can be used for effective annotations of linguistic resources. There are many crowdsourcing genres: paid-for, games with a purpose, or altruistic (volunteer-based) approaches. In this paper, we investigate the use of altruistic crowdsourcing for speech corpora annotation by narrating our experience of validating a semi-automatic task for dialect annotation of Kalam’DZ, a corpus dedicated to Arabic Algerian dialectal varieties. We start by describing the whole process of designing altruistic crowdsourcing project. Using the unpaid crowdcrafting platform, we have performed experiments on a sample of 10% of Kalam’DZ corpus, totaling more than 10 h with 1012 speakers. The evaluation of this crowdsourcing job is ensured through a comparison with a gold standard annotation done by experts which affirms a high level of inter-annotation agreements of 81%. Our results confirm that altruistic crowdsourcing is an effective approach for speech dialect annotation. In addition, we present a set of best practices for altruistic crowdsourcing for corpus annotations.

Other Information

Published in: Procedia Computer Science
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
See article on publisher's website: https://dx.doi.org/10.1016/j.procs.2017.10.102

Funding

Open Access funding provided by the Qatar National Library.

History

Language

  • English

Publisher

Elsevier

Publication Year

  • 2017

License statement

This Item is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Institution affiliated with

  • Hamad Bin Khalifa University
  • Qatar Computing Research Institute - HBKU

Usage metrics

    Qatar Computing Research Institute - HBKU

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC