Aya: An Open Science Initiative to Accelerate Multilingual AI Progress

Access to cutting-edge breakthroughs in large language models (LLMs) has been limited to speakers of only a few, primarily English, languages. The Aya project aimed to change that by focusing on accelerating multilingual AI through an open-source initiative. This initiative resulted in a state-of-the-art multilingual instruction-tuned model and the largest multilingual instruction collection. Built by 3,000 independent researchers across 119 countries, the Aya collection is the largest of its kind, crafted through templating and translating existing NLP datasets across 114 languages. As part of this collection, the Aya dataset is the largest collection of original annotations from native speakers worldwide, covering 65 languages. Finally, trained on a diverse set of instruction mixtures, including the Aya collection and dataset, the Aya model is a multilingual language model that can follow instructions in 101 languages, achieving state-of-the-art performance in various multilingual benchmarks.

Online

Lifelong Learning

Speaker Information

Ahmet Üstün, Cohere For AI

URL

https://www.youtube.com/watch?v=GU8bYaS5X2Y&list=PLhnghgyZINr9Z0wxdeBZhkbUjGuzI889A&index=4

Search

Date and Time

Location

Aya: An Open Science Initiative to Accelerate Multilingual AI Progress

Speaker Information

URL

Erişilebilirlik Aracı

Dolaşım Ayarları

İçerik Ayarları

İmleç

Font Boyutlandırması

Renk Ayarları

Özel Renk

Date and Time

Location

Aya: An Open Science Initiative to Accelerate Multilingual AI Progress

Speaker Information

URL

Cookie Policy

Necessary Cookies

Statistical Cookies

Targeting Cookies

Erişilebilirlik Aracı

Dolaşım Ayarları

İçerik Ayarları

İmleç

Font Boyutlandırması

Renk Ayarları

Özel Renk