Large Language Models for Extremely Low-Resource Language Revitalization
Date / Time
November 12, 2025
1:00 pm - 2:30 pm
Categories
You are cordially invited to join a Zoom lecture hosted by the UIC Department of Linguistics.
Speaker: Jared Coleman, PhD, assistant professor of computer science, Loyola Marymount University.
Large Language Models have achieved remarkable success across many natural language tasks, yet their benefits remain out of reach for the vast majority of the world’s languages, especially Indigenous and endangered ones. For communities like Coleman’s, who is a member of the Big Pine Paiute Tribe of the Owens Valley, extreme data scarcity is not a temporary obstacle but a lasting consequence of attempted erasure. Advancing language technology for endangered languages requires moving forward without the expectation of abundant data, by building systems that make principled use of the limited resources at hand.
In this talk, Coleman discusses emerging approaches for using AI to support endangered language revitalization, drawing on my work in developing the Large Language Models-Assisted Rule Based Machine Translation paradigm. Coleman highlights how combining linguistic knowledge, small curated resources and the adaptive reasoning of modern Large Language Models can open new possibilities for revitalization. At the same time, Coleman examines the ethical and cultural risks of applying AI to endangered languages, arguing that these technologies must be developed in partnership with the communities they aim to serve.
Meeting ID: 842 6881 9388
Passcode: goling25