Posted on August 17, 2011
Live Translate 1.2
These are the changes for 1.2:
- Added caching layer utilizing memory and LocalStorage when available to speed up local translations.
- Added API module to query translation memories.
- Added i18n alias file.
- Added TMX admin user rights group.
This post is about the first two.
The Live Translate (LT) extension allows live translation of content in wiki pages. For this it uses translation services such as Google Translate or Microsoft Translator. It also allows specifying your own translations for certain words within the wiki, which will then be left alone by the (remote) translation services. Such specifications of translations are called translation memories (TM), and are typically done in a special XML-based format called Translation Memory eXchange (TMX). LT also supports a more wiki-friendly format, custom written, which is DSV-based. Translation memories in both these formats can be embedded in wiki pages designated as TM or you can point to files hosted somewhere else. What it comes down to is that there are a set of local translations, which require special handling: local translation and be ignored by the remote translation service.
I made a whole bunch of client side changes (and some server side changes to the API), but the most significant ones are the creation of a translation memory object which takes care of all caching, and the rewrite of the translation control to a jQuery plugin.
Translation memory object
In file includes/ext.lt.tm.js.
The translation memory object class is named simply “memory” and resides in the “lt” namespace. It acts as abstraction layer via which special words and translations of those special words can be accessed. It takes care of all API interaction and caching and exposes 2 simple functions, getSpecialWords and getTranslations, which are called by the translation control.
When the cache is empty, the memory will request a new hash via the API, which indicates the “version” of the translation memories on the server, and is later used for cache invalidation. It the proceeds fetching the requested special words or translations of special words and returns these via a callback passed to either getSpecialWords or getTranslations. Before this last step is done, the obtained data is cached in memory (the words and translations fields, one lines 26 and 30, respectively), and, when available, also in HTML5 localStorage. The in memory caching only yields advantages when doing multiple translations on a single page, which is rather rare, so is not that much of a win. The data stored in localStorage on the other hand, persists when navigating to other pages, even when closing the browser and re-opening it. localStorage really isn’t a cache on it’s own, but the lt.memory class uses it as one.
When the cache is not empty, a single request to the API is made to compare the earlier obtained hash and see if any changes to the TMs have been made, and thus if the cache should be invalidated. If changes have been made, the stored data is discarded and pretty much the same as when the cache was empty happens. If no changes have been made, locally stored data is used where possible. In case of the list of special words, no requests will have to be made at all, since all such words are already known. For the translations of these words it’s a little trickier, since the needed data here varies from page to page, and also depends on both the source and destination language. The lt.memory class checks which of the needed data if available, and in case there is a remainder of non-known translations, requests these. The newly obtained translations are then of course also added to the cache.
In file jquery.liveTranslate.js.
This plugin contains a lot of already existing code from Live Translate 1.1, but is structured a lot better. It takes care of creating all the HTML needed for the control (while in 1.1, the HTML was provided, and only events where bound to it) in it’s setup function (line 147). The click event handler for the translation button calls the obatinAndInsetSpecialWords which uses the getSpecialWords function of the lt.memory class to obtain the words with local translations, and then inserts them, meaning that occurrences of these words are wrapped into notranslate spans, which then enables finding all words which should be translated locally, and makes them be ignored by the remote translation services. The click handler function passed doTranslations as completion callback to obatinAndInsetSpecialWords, which starts both local and remote translation in parallel. Local translation is done, as you can uncountably guess, by calling getTranslations function.
Once the cache is warm (the user made a translation before) and valid (the TMs have not changed), local translation is practically instant (~0.4 seconds in my tests). Since remote translation now starts as soon as the special words are known, this can take as little as ~0.2 seconds, a huge difference compares to the earlier up to 5 seconds and possibly longer. All assuming you are using a modern browser of course
And, maybe most importantly for me, I now have a much better grasp of the earlier mentioned prototypes, callbacks and closure scopes. Perhaps most of the time I spend on this version was figuring out how to properly use these and debugging out misconceptions I had about how they worked