How good are Large Language Models on African Languages?Download PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We evaluate the performance of three large language models on five NLP tasks across 30 African languages. We find that LLMs produce lower performance for African languages with large performance gap compared to high-resource languages for most tasks.
Abstract: Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on unseen tasks and languages. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of three popular large language models (mT0, LLaMa 2, and GPT-4) on five tasks (news topic classification, sentiment classification, machine translation, question answering, and named entity recognition) across 30 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average or good performance for classification tasks, but very poor results on generative tasks such as machine translation. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code centric (around 98%) pre-training corpus. Our findings confirm that performance on African languages remains challenging for current large language models and that there is a need for additional efforts to close this gap.
Paper Type: short
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Surveys
Languages Studied: Hausa, Amharic, Oromo, Algerian Arabic, Moroccan Arabic, Somali, Tigrinya, Kiswahili, Yorùbá, Igbo, Kinyarwanda, Twi, Luganda, isiXhosa, isiZulu, chiShona, Wolof, Bambara, Fon, Éwé, Ghomálá, Chichewa, Mossi, Setswana, Bemba, Lingala, Rundi, Xitsonga, Luo and Naija(pcm)
0 Replies

Loading