Computing gives us tools to preserve disappearing languages

In 100 years, many of the world’s 7,000 languages could be extinct. Hundreds of years of oral storytelling will disappear in the space of a couple of generations. But sounds, words and stories of all these languages are being captured online, for example on community websites and on YouTube.

Source: Getty Images

The knowledge and beauty locked up in these languages is irreplaceable. It goes beyond useful dot points about seasons and cultivation and local medicines to

untranslatable words

and to an entire cosmology. Every language is a multi-generational creative act.

Cyberlinguists of the future will have to devise algorithms to decipher the recordings that were made before this mass extinction event.

My collaborators and I want to determine what language data must be uploaded to ensure that the world’s unwritten linguistic heritage is preserved and made intelligible to all future generations.

Capturing languages

Back in 2012 and 2013, we visited

Papua New Guinea

and

Brazil

to teach people to use our

Aikuma mobile phone software

to record and interpret their languages.

Aikuma acts like a voice recorder, but it adds the ability to save and share phrase-by-phrase commentaries and translations. Others experience the original recording with the interpretation.

A French research team has recently taken Aikuma to Africa and recorded about a

million words of speech in three local languages

Clearly, we can readily amass a large quantity of raw language data. But how can we analyse it all?

We take a clue from the

Rosetta Stone

: the keys for decipherment are parallel texts or – in the case of unwritten languages – bilingual aligned audio recordings.

Bilingual aligned audio: acoustic features are extracted from the source audio, and the spoken translation is transcribed using automatic speech recognition, then the two are correlated.

Deep learning

Our approach was made possible thanks to a recent advance in the processing of digital images. The method uses artificial neural networks and is known as deep learning.

Show a child an image and ask her to point to the dog, and she does it in a split second. Algorithms can do this too; it’s what enables us to search the web for images.

For the child – and the algorithms – to identify the dog, they must first work out where to direct attention within the image.

Can we do the same for audio? Can we take individual words of the English transcription and correlate them with short stretches of audio in the source language?

Our initial experiments are showing promise, and are being presented this week at

an international conference in San Diego

A digital audio Rosetta Stone

The final step is to close the loop. After lining up source language words with English translations, our algorithm reports its confidence.

We need to exploit this information in order to search tens or hundreds of hours of untranslated audio, flag high-value regions, then present these to people for translation. All while there is still time.

A phone app is helping preserve Australia’s Indigenous language

How technology is saving Indigenous languages

It sounds like a gargantuan task. But perhaps we don’t need to go out to the remote corners of the world any more. After all, social media and mobile broadband now reach

50% of the world’s population

and are likely to encompass speakers of every language by the end of the decade.

Accordingly, we are extending Aikuma with

social media features

. If the app could go viral, then speakers of the world’s disappearing languages would use it to record and translate their stories, guided by our algorithms in knowing what to translate next.

A humble app has solved our problem.

Hacking the dominant culture

But there is no technological panacea. Speakers of the world’s disappearing languages are prioritising their survival. And they are adopting the mindset of the speakers of economically powerful languages such as

Chinese, Spanish or English

. Small languages are not relevant.

To preserve languages, then, we must go beyond our technical innovation to hack the dominant culture.

This is the mission of the

Aikuma Project

. Thanks to the

growing urbanisation of the world’s population

, the speakers of the world’s disappearing languages are often

found in cities

Live storytelling in Indigenous and immigrant languages, Oakland, California.

For example, in Australia, Darwin has the greatest diversity of Indigenous languages. And thanks to immigrants and refugees, Darwin is Australia’s most linguistically diverse city by population. Darwin will be a laboratory for experiments on the evolution of language.

We will begin with

Treasure Language Storytelling

at the

Darwin Fringe Festival

next month: storytelling performances in indigenous and immigrant languages, building on our recent events in the San Francisco East Bay.

Each story will be recorded and shared using the Aikuma app, generating public recognition and evoking pride for each storyteller and for each language. And each bilingual story-listener in the audience will, we hope, be motivated to use the app to record and interpret their parents' stories for their children.

In this way, we return to the most ancient mode of social interaction, storytelling around the fire. But this time, it is captured on mobile devices, and our algorithms help prioritise the translation effort.

And just possibly, the world’s treasure languages will be sustained for at least another generation, while linguists construct a digital audio Rosetta Stone to preserve the world’s languages forever.

Let's translate medicine to Indigenous languages, says rural youth summit delegate

Call for compensation over 'stolen' Indigenous languages

Indigenous languages archive uncovered at NSW State Library

Steven Bird is Associate Professor in Computer Science, University of Melbourne.

5 min read

Published 15 June 2016 3:59pm

Updated 16 June 2016 12:45pm

Source: The Conversation

Computing gives us tools to preserve disappearing languages

In 100 years, many of the world’s 7,000 languages could be extinct. Hundreds of years of oral storytelling will disappear in the space of a couple of generations. But sounds, words and stories of all these languages are being captured online, for example on community websites and on YouTube.

Capturing languages

Deep learning

A digital audio Rosetta Stone

Related

Hacking the dominant culture

Related

Popular stories

NITV Radio News - 22/01/2025

Culture, Connection and Country Music With Lance O'Chin

It's a Blak Out! The Month Of January Is Packed With Deadly Things To Do And See With Jacob Nash

Pat Cash reflects on the 2025 Australian Open at Evonne Goolagong Cawley Day

Melbourne Storm to scale back Welcome to Country ceremonies

RIP Chad Morgan, the loveable lad leaves a trail of broken hearts

New exhibition weaves together the old and the new

Peter Dutton says he'll never stand in front of an Aboriginal flag as prime minister

Computing gives us tools to preserve disappearing languages

In 100 years, many of the world’s 7,000 languages could be extinct. Hundreds of years of oral storytelling will disappear in the space of a couple of generations. But sounds, words and stories of all these languages are being captured online, for example on community websites and on YouTube.

Capturing languages

Deep learning

A digital audio Rosetta Stone

Related

Hacking the dominant culture

Related

Popular stories

NITV Radio News - 22/01/2025

Culture, Connection and Country Music With Lance O'Chin

It's a Blak Out! The Month Of January Is Packed With Deadly Things To Do And See With Jacob Nash

Pat Cash reflects on the 2025 Australian Open at Evonne Goolagong Cawley Day

Melbourne Storm to scale back Welcome to Country ceremonies

RIP Chad Morgan, the loveable lad leaves a trail of broken hearts

New exhibition weaves together the old and the new

Peter Dutton says he'll never stand in front of an Aboriginal flag as prime minister

Subscribe to the NITV Newsletter

Receive the latest Indigenous news, sport, entertainment and more in your email inbox.