Ingenious Yandex Translate Helps Keep World's Rarest Languages Alive

© AP Photo / Kevork Djansezian / Desk of a linguist (File)
Desk of a linguist (File) - Sputnik International
Yandex Translate has developed a method of translating the world's rarest languages, helping speakers to preserve and spread their mother tongue online.

Lord of the Rings: The Two Towers film - Sputnik International
Russia's Top Search Engine Yandex Launches Elvish Translation Service
The Russian search engine Yandex has developed a translator which is helping some of the world's rarest languages to flourish.

Catching up with its big rival Google, which is able to translate over 100 languages, Yandex can now translate 94. 

However, Yandex is a translator with a difference: its software is able to translate some of the world's rarest languages thanks to a complex statistical model that can pick up linguistic patterns without a large body of bilingual texts.

One reason why Google and other translation services tend to provide translation of the world's most common languages, is because the translation software depends on having access to a corpus of texts in both languages. Common sources are the Bible or Koran, which have been translated into practically all languages.

While this is easy enough to find for languages such as Russian and English, finding these texts in less common languages is a more difficult task.

© AP Photo / Caleb Jones Oxford English Dictionary
 Oxford English Dictionary - Sputnik International
Oxford English Dictionary
Yandex began to seriously investigate the possibility of adding rare languages to its program after an employee in the firm's office in the Netherlands asked the developers to add his mother tongue, Papiamento, to the translator. 

This was a challenge, because Papiamento is a relatively small language, a spoken by about 330,000 people in the Caribbean.

Since there are so few translations between Papiamento and other languages, the developers decided to try a different approach. They looked at other languages with similarities to Papiamento, in order to identify the relations between them and use that information to build a translator.

© Photo : Screenshot/Yandex / A screenshot of a Yandex translation from German to Elvish, Tolkien's fictional language which was launched last year
A screenshot of a Yandex translation from German to Elvish, Tolkien's fictional language which was launched last year - Sputnik International
A screenshot of a Yandex translation from German to Elvish, Tolkien's fictional language which was launched last year

"We moved away from the traditional perception of each language as an independent system, and began to take into account the kinship between them. In practice, this means that if we need to build a translation for a language where there isn't much data, we can use other, larger, related languages," Yandex Translate developer Anton Dvorkovich explains in a blog post.

"Their individual models (morphology, syntax, vocabulary) can be used to fill the voids in the models of a 'small' language. This might just seem like blind copying of words and rules between languages, but the technology works a little smarter."

"This kinship can be different – for example, in Yiddish, most of the lexicon intersects with German and in Papiamento a lot is borrowed from Spanish and Portuguese. In the Tatar and Bashkir languages, there is similar syntax and morphology."

President-elect Donald Trump gives a thumbs up to members of the media after meeting with Martin Luther King III, son of Martin Luther King Jr., at Trump Tower in New York, Monday, Jan. 16, 2017 - Sputnik International
Lost in Trumpslation: Japanese Student Translates Trump’s Tweets
Yandex explains that its efforts will help speakers of rare languages to preserve their mother tongue. The availability of an internet translator allows them to maintain and spread their language online by creating their own pages on websites such as Wikipedia, which is available in more than 270 languages.

A Bashkir speaker, for example, can take a Wikipedia page in another language, translate it and then edit the results. The technology helps Bashkir speakers to increase their presence on the internet a lot faster than they would be able to otherwise.

In addition, Yandex hopes its statistical model will help linguists to better understand the relationships between languages. 

"We expect that in the future, the technology we have developed for the use of data from related languages will be implemented on other areas and will generally help to better understand the links between languages, and consequently – more accurately translate texts. Therefore, we can say that this technology is not so much about 'small' languages as it is about establishing links between different languages of the world," Dvorkovich said.

Never miss a story again — sign up to our Telegram channel and we'll keep you up to speed! 

To participate in the discussion
log in or register
Заголовок открываемого материала