Translation APIs

WHAT DID WE EXPERIMENT WITH?

We wanted to see the state of Translation APIs in 2019.  Within SAGE most of our content is written in English and with the exciting growth and emerging research happening within other cultures, we wanted to see how plausible it would be to automate translations in order to appeal to a wider audience.

HOW DID WE DO IT?

We settled on building a Proof of Concept (PoC) around two of the big players, namely - Google Cloud API and Microsoft Translator API.

We then set about building a simple React App which mimics stripped-down versions of articles found on our Journal platform.

Floating grandparents.jpg

From there, we added the ability to translate within the app by creating a 'translation' microservice. This microservice takes in the text you would like to translate, the target language, and which engine to use and it returns the translated text.

WHAT DID WE LEARN AND OTHER IDEAS?

Communication is a really hard problem to solve!

Discourse

Translating text word-for-word via API is extremely easy but there is so much more to language than words. It turns out that Language and Culture are intertwined and there are meaning behind a lot of what we say. This is known as discourse, the meaning behind what you're communicating. For example. Take the phrase:

"I'm fed up with this meal."

To a UK speaker, this would be a negative sentence suggesting the meal was not very nice and they are ready to finish. However, for a different culture e.g. Canada, this phrase is used to suggest happiness and satisfaction with their meal i.e. I've been fed with this meal. This is just English, it gets a lot tricker when translating to other languages as although words might exist between languages their emphasis or meaning can be lost.

Authority & Stance - Academic problems

Within academia, particularly journals, we have the problem of Stance and Authority. From culture to culture,  there is a big difference in "Author visibility" and how prominent they are in the text. In the West, the author is more likely to inject themselves into the Text "I" whereas other cultures might use the "one" or remove themselves completely.

Furthermore, we have different cultural approaches to authority. For example, in the West, Academia is the highest authority. In other cultures, such as Islamic countries, Religion is the highest. For China, the government might be.

 Altogether, having reduced voice and a different emphasis on what is the highest authority in other countries academia can be seen as weaker or less rigorous when written in English.

"It could be the case" or "it might be the case" vs. "this demonstrates or my work demonstrates.

WHAT’S NEXT?

While we've cracked translating words, we now need to figure out to translate the meaning behind them.

Where companies like Google are attempting granular sentences based translations using machine learning, there are others such as 'CrossTalk Universal Communication' who is an early startup attempting to find the meaning behind language by breaking down sentences further into their phrases. Their aim is to build up and pass these phrases through proprietary filters to infer context and therefore resulting in a more accurate translation. I'll be keeping a close eye on where they are in 3- 5 years. For now, we'll still need to put the hard work into learning grammar and vocab.

TEAM

Andy Hails  

With thanks to:

  • Dr Jonathan MacDonald - Linguistics researcher and 'CrossTalk Universal Communication' founder

  • Manuela Brun - French Translator

  • Lena Newman - Spanish Translator

 You can find the app we wrote on our GitHub.

Header image credit Dmitry Ratushny via Unsplash

Blog post written by Andy Hails