Syrielle mejias wiki

Are Multilingual Language Models an Off-ramp for Under-resourced Languages? Will we arrive at Digital Language Equality in Europe in 2030?

Georg Rehm
DFKI GmbH, Germany
Humboldt-Universität zu Berlin, Germany
georg.rehm@dfki.de (corresponding)

\AndAnnika Grützner-Zahn
DFKI GmbH, Germany

\AndFabio Barth
DFKI GmbH, Germany

Abstract

Large language models (LLMs) demonstrate unprecedented capabilities and define the state of the art for almost all natural language processing (NLP) tasks and also for essentially all Language Technology (LT) applications. LLMs can only be trained for languages for which a sufficient amount of pre-training data is available, effectively excluding many languages that are typically characterised as under-resourced. However, there is both circumstantial and empirical evidence that multilingual LLMs, which have been trained using data sets that cover multiple languages (including under-resourced ones), do exhibit strong capabilities for some of these under-resourced languages. Eventually, this approach may have the potential to be a technological off-ramp for those under-resourced languages for which “native” LLMs – and LLM-based technologies – cannot be developed due to a lack of training data. This paper, which concentrates on European languages, examines this idea, analyses the current situation in terms of technology support and summarises related work. The article concludes by focusing on the key open questions that need to be answered for the approach to be put into practice in a systematic way.

Are Multilingual Language Models an Off-ramp for Under-resourced Languages? Will we arrive at Digital Language Equality in Europe in 2030?


Georg RehmDFKI GmbH, GermanyHumboldt-Universität zu Berlin, Germanygeorg.rehm@dfki.de (corresponding)Annika Grützner-ZahnDFKI GmbH, GermanyFabio BarthDFKI GmbH, Germany


1 Introduction

Especially in today’s data-driven, machine learning and language model-based era of Language Techn

  • Alika Del Sol. Country of
  • Share, rate and discuss pictures
    1. Syrielle mejias wiki

    MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching

    Fabian David Schmidt, Florian Schneider, Chris Biemann, Goran Glavaš
    Center for Artificial Intelligence and Data Science, University of Würzburg, Germany
    Language Technology Group, University of Hamburg, Germany
    fabian.schmidt@uni-wuerzburg.de, florian.schneider-1@uni-hamburg.de
    Dataset:MVL-SIBEqual contribution.

    Abstract

    Existing multilingual vision-language (VL) benchmarks often only cover a handful of languages. Consequently, evaluations of large vision-language models (LVLMs) predominantly target high-resource languages, underscoring the need for evaluation data for low-resource languages. To address this limitation, we introduce MVL-SIB, a massively multilingual vision-language benchmark that evaluates both cross-modal and text-only topical matching across 205 languages—over 100 more than the most multilingual existing VL benchmarks encompass. We then benchmark a range of of open-weight LVLMs together with GPT-4o(-mini) on MVL-SIB. Our results reveal that LVLMs struggle in cross-modal topic matching in lower-resource languages, performing no better than chance on languages like N’Koo. Our analysis further reveals that VL support in LVLMs declines disproportionately relative to textual support for lower-resource languages, as evidenced by comparison of cross-modal and text-only topical matching performance. We further observe that open-weight LVLMs do not benefit from representing a topic with more than one image, suggesting that these models are not yet fully effective at handling multi-image tasks. By correlating performance on MVL-SIB with other multilingual VL benchmarks, we highlight that MVL-SIB serves as a comprehensive probe of multilingual VL understanding in LVLMs.

    MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching


    Fabian David Schmidtthanks: Equal contribution., Florian Schnei

    Soda (TV series)

    French television series

    Soda is a French television shortcom consisting of 690 episodes of two- to three-minute-long snippets, produced by Frank Bellocq, David Soussan, Kev Adams, and Cyril Cohen, which premiered on M6 on 4 July 2011 and on W9 on 5 May 2012.

    The series follows Adam, an 18-year-old high school student, as he struggles with amusing yet realistic everyday issues.

    Premise

    Adam, a teenager with a reputation for being a lost cause, tries to find the best means to make girls fall for him—especially his crush, Jenna. His younger sister, Ève (nicknamed Chucky), never stops annoying him, and vice versa.

    The title, Soda, is the anagram of the word “ados”, "teenager", in French.

    Production

    An episode lasts twenty-four minutes, each divided into sequences of three minutes and a half.

    The shooting takes place in Bry-Sur-Marne.

    Cast and characters

    Main

    Recurring

    • Dominique Frot as Solange Vergneaux, High-school principal/headteacher
    • Alika Del Sol as Malika Elboughi, Slimane's mother and Babath's friend
    • Frank Bellocq as Patrick, High-school monitor (seasons 2 and 3)
    • Chantal Garrigues as Gisèle Favrot, Babeth's mother and Adam and Ève's grandmother
    • John Eledjam as Uskur, Liberty kebab shop manager and Adam's boss
    • Gaël Mectoob as Pascal, High-school cafeteria cook
    • Alex Lutz as Thierry, ex-high-school monitor (seasons 1 and 2)

    Season 1

    The first season, broadcast on M6, has 244 episodes.

    Adam is a normal teenager who lives a quiet life in high school (although he never gets good grades) and has two friends, Slimane and Ludovic. He is in love with the most beautiful girl in school, Jenna, who doesn't reciprocate his feelings. He is the oldest child in a middle-class family. His father, Michel, works in a bank and his mother Elizabeth (aka Babeth) is a beautician who has her own home business (her only regular customer is Malika, Slimane's mother). Adam dreams of living in the US and thin

    .