Altruistic Crowdsourcing for Arabic Speech Corpus Annotation
Crowdsourcing is an emerging collaborative approach that can be used for effective annotations of linguistic resources. There are many crowdsourcing genres: paid-for, games with a purpose, or altruistic (volunteer-based) approaches. In this paper, we investigate the use of altruistic crowdsourcing for speech corpora annotation by narrating our experience of validating a semi-automatic task for dialect annotation of Kalam’DZ, a corpus dedicated to Arabic Algerian dialectal varieties. We start by describing the whole process of designing altruistic crowdsourcing project. Using the unpaid crowdcrafting platform, we have performed experiments on a sample of 10% of Kalam’DZ corpus, totaling more than 10 h with 1012 speakers. The evaluation of this crowdsourcing job is ensured through a comparison with a gold standard annotation done by experts which affirms a high level of inter-annotation agreements of 81%. Our results confirm that altruistic crowdsourcing is an effective approach for speech dialect annotation. In addition, we present a set of best practices for altruistic crowdsourcing for corpus annotations.
Other Information
Published in: Procedia Computer Science
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
See article on publisher's website: https://dx.doi.org/10.1016/j.procs.2017.10.102
Funding
Open Access funding provided by the Qatar National Library.
History
Language
- English
Publisher
ElsevierPublication Year
- 2017
License statement
This Item is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.Institution affiliated with
- Hamad Bin Khalifa University
- Qatar Computing Research Institute - HBKU