← Events

/

Conference

Aligning LLMs to Low-Resource Languages
Aligning LLMs to Low-Resource Languages

Feb 22, 2024

at

00:00 GMT+2

This tutorial provides a detailed guide on collecting data for aligning large language models (LLMs) with low-resource languages (LRLs).

Agenda and key topics

Overview

This tutorial provides a detailed guide on collecting data for aligning large language models (LLMs) with low-resource languages (LRLs). It addresses the challenge of data scarcity in these languages and introduces a pipeline for generating high-quality data, using Swahili as a primary example. The tutorial covers strategies for dataset collection and alignment of LLMs to LRLs, offering comprehensive guidance on producing and utilizing high-quality data for language technology development in under-resourced languages.

Materials

Notebooks