mardi 9 septembre 2008

Google and the Universal Translation Memory

Blown away from the outset by Google’s speedy and significant break into machine translation, I have long been predicting its brilliant progress in the field of translation. Surely there are quite a few surprises left to come.

I’m at the point where instead of the operator define:, sometimes I test the operator translate: just to see if it has been implemented since I last checked. And it seems that the moment has arrived, with the translation onebox (Via Google Live).

Apparently, it currently only concerns common expressions, although it is likely it will cover all terms already included in Google’s dictionaries.

I tried with Italian but it doesn’t work yet. Even so, it is likely that as soon as the future Google Translation Center is up and working, this operator will also search for translations in the bitexts that will populate the universal translation memory an army of translators is developing daily.

Remember that this memory also helps Google self-teach, and “bitext” refers to a source text (or starting text) and a target text (or final text) set side by side in alignment. Example.

In fact, the operator does not seem to function the same way as define:, where you use the English term no matter what language you speak.

With "translate", it’s apparently the operator that defines the target language: when you type translate ordinateur Google directly interprets this to mean translation from French to English, whereas with traduire computer, it interprets English to French.

So, if we engage in a brief forward-looking exercise, we can easily imagine that in the very near future, not only will Google be able to match the operator to your profile by default (that is, knowing you are English, it will offer you terms translated into English by default, unless you indicate otherwise), but it will also (and most importantly) be able to draw from practically all the terms in human language, in all languages, as the universal translation memory gradually takes shape.

It will be filled as much by the human translators who use Google’s tools to translate, as by its large-scale automated technologies (not to be confused with the implementation of an automatic translation system in a company, for example), or even by the alignment of literary classics that make up our common heritage and which are already translated into countless languages.

For the layman, aligning a text is taking for example Victor Hugo's Les Misérables breaking it into segments and aligning the segments of the original text with the corresponding segments translated into the language(s) of your choice (noting that a segment does not necessarily correspond to a sentence, which will be broken into multiple segments if it is too long, for example). You do this with French-English, and you have the French-English memory of Les Misérables. Then you do the same thing with English-Italian, Spanish-German, Russian-Chinese, etc., and you get as many memories as there are languages into which a work has been translated.

The Rosetta Stone is a perfect example of aligned texts. And if I could only mention one more, think of the thousands of translations of the Bible that already exist...

Add to that the great classics from around the world that are already in digital format, build the corresponding translation memories in the language pairs you have access to, and you can easily understand that we are not far from being able to align practically the whole of human language, from every era.

Since the dawn of humanity, no one has ever been able to do that. Until Google...

The talk about Google and translation is not over yet. In fact, it’s only beginning!

