Manara - Qatar Research Repository
Browse

Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap

Download (1.08 MB)
journal contribution
submitted on 2024-07-23, 08:45 and posted on 2024-07-23, 10:50 authored by George Mikros, Dimitris Boumparis

This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author’s Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.

Other Information

Published in: Digital Scholarship in the Humanities
License: https://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.1093/llc/fqae028

Funding

Open Access funding provided by the Qatar National Library.

History

Language

  • English

Publisher

Oxford University Press

Publication Year

  • 2024

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License.

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Humanities and Social Sciences - HBKU