Live Translate 1.2

A few days ago I released version 1.2 of the Live Translate MediaWiki extension, which is a major update bringing mainly under-the-hood improvements. I’ve worked on this for about 3 days in my free time, mainly to try out some JavaScript techniques I had not utilized yet.

These are the changes for 1.2:

  • Rewrote translation control JavaScript to a jQuery plugin.
  • Added caching layer utilizing memory and LocalStorage when available to speed up local translations.
  • Added API module to query translation memories.
  • Added i18n alias file.
  • Added TMX admin user rights group.

This post is about the first two.

Some background

The Live Translate (LT) extension allows live translation of content in wiki pages. For this it uses translation services such as Google Translate or Microsoft Translator. It also allows specifying your own translations for certain words within the wiki, which will then be left alone by the (remote) translation services. Such specifications of translations are called translation memories (TM), and are typically done in a special XML-based format called Translation Memory eXchange (TMX). LT also supports a more wiki-friendly format, custom written, which is DSV-based. Translation memories in both these formats can be embedded in wiki pages designated as TM or you can point to files hosted somewhere else. What it comes down to is that there are a set of local translations, which require special handling: local translation and be ignored by the remote translation service.

On every translation, the JavaScript needs to know which are the special words that have a local translation, so translations for these can be requested, and measures can be taken to not send them to the remote translation services. This means doing a call to the wikis API to obtain these words. In case of big translation memories, this requires several calls to obtain all words, often resulting in a few seconds wait before local translations are even requested.  If there are words that have local translations on the page, a single request I send to another part of the API to obtain these translations for the language currently translated to. This usually bring the total time to complete local translation to somewhere between 2 and 5 seconds, after which, in version 1.1 and earlier, remote translation is kicked of.

The idea

Translation memories do not tend to change all the time, so it’s very inefficient to request all special words for every translation, and in somewhat lesser degree to always request the translations. The obvious answer to this is local caching, and since I wanted to play around with HTML5 localStorage a bit, this is exactly what I did. I also wanted to make use of JavaScript capabilities I was not really aware of back in last December, when writing Live Translate, so took this as an opportunity to also do some JavaScript refactoring. These being primarily prototypes, closure scopes and callbacks.

The realization

I made a whole bunch of client side changes (and some server side changes to the API), but the most significant ones are the creation of a translation memory object which takes care of all caching, and the rewrite of the translation control to a jQuery plugin.

Translation memory object

In file includes/ext.lt.tm.js.

The translation memory object class is named simply “memory” and resides in the “lt” namespace. It acts as abstraction layer via which special words and translations of those special words can be accessed. It takes care of all API interaction and caching and exposes 2 simple functions, getSpecialWords and getTranslations, which are called by the translation control.

When the cache is empty, the memory will request a new hash via the API, which indicates the “version” of the translation memories on the server, and is later used for cache invalidation. It the proceeds fetching the requested special words or translations of special words and returns these via a callback passed to either getSpecialWords or getTranslations. Before this last step is done, the obtained data is cached in memory (the words and translations fields, one lines 26 and 30, respectively), and, when available, also in HTML5 localStorage. The in memory caching only yields advantages when doing multiple translations on a single page, which is rather rare, so is not that much of a win. The data stored in localStorage on the other hand, persists when navigating to other pages, even when closing the browser and re-opening it. localStorage really isn’t a cache on it’s own, but the lt.memory class uses it as one.

When the cache is not empty, a single request to the API is made to compare the earlier obtained hash and see if any changes to the TMs have been made, and thus if the cache should be invalidated. If changes have been made, the stored data is discarded and pretty much the same as when the cache was empty happens. If no changes have been made, locally stored data is used where possible. In case of the list of special words, no requests will have to be made at all, since all such words are already known. For the translations of these words it’s a little trickier, since the needed data here varies from page to page, and also depends on both the source and destination language. The lt.memory class checks which of the needed data if available, and in case there is a remainder of non-known translations, requests these. The newly obtained translations are then of course also added to the cache.

jQuery plugin

In file jquery.liveTranslate.js.

This plugin contains a lot of already existing code from Live Translate 1.1, but is structured a lot better. It takes care of creating all the HTML needed for the control (while in 1.1, the HTML was provided, and only events where bound to it) in it’s setup function (line 147). The click event handler for the translation button calls the obatinAndInsetSpecialWords which uses the getSpecialWords function of the lt.memory class to obtain the words with local translations, and then inserts them, meaning that occurrences of these words are wrapped into notranslate spans, which then enables finding all words which should be translated locally, and makes them be ignored by the remote translation services. The click handler function passed doTranslations as completion callback to obatinAndInsetSpecialWords, which starts both local and remote translation in parallel. Local translation is done, as you can uncountably guess, by calling getTranslations function.

The results

Once the cache is warm (the user made a translation before) and valid (the TMs have not changed), local translation is practically instant (~0.4 seconds in my tests). Since remote translation now starts as soon as the special words are known, this can take as little as ~0.2 seconds, a huge difference compares to the earlier up to 5 seconds and possibly longer. All assuming you are using a modern browser of course :)

Not to forget, the code is a lot better structures now. It should be a lot easier to track the order of execution, and it’s now possible (JavaScript wise) to place multiple translation controls onto a single page (which has little practical value, but indicates a better design).

And, maybe most importantly for me, I now have a much better grasp of the earlier mentioned prototypes, callbacks and closure scopes. Perhaps most of the time I spend on this version was figuring out how to properly use these and debugging out misconceptions I had about how they worked :)

Live Translate

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.