Maintaining localization between Python and PHP (it's not fun)

I reached my hand into the barrel of problems our migration to Python is going to cause and came up with Localization. It figures.

First out of the chute was the .po files. It turns out the actual formatting is different between the two languages. PHP uses %1$s for its substitutions, but python uses either named variables like (num)s or integers like {0}. For the record, they both support %s when you don't need to order the substitutions.
PHP example:
I have %2$s apples and %1$s oranges
Python example:
I have {1} apples and {0} oranges

Since I've worked with the Translate Toolkit before, I decided to write a script to convert between the two formats. If you find yourself in the same unfortunate boat as me, behold
phppo2pypo and pypo2phppo to convert between the two types.

Crisis averted, right? Oh, that's just scratching the surface. Remember how happy I was that PHP finally started supporting msgctxt? Well, Python has had a patch for it since 2008 but no one has bothered to land it. I wrote a new ugettext() and ungettext() that recognizes context in the .po files. To use simply do: from l10n import ugettext as _ at the top of your file.

Along with adding msgctxt support, those two functions also collapse consecutive white space. We're using Jinja2 with Babel and the i18n extension as our template engine. Jinja2 has a concept of stripping white space from the beginning or end of a string but does nothing about the middle. A paragraph of text in a Jinja2 template would look like:

{% trans -%}Mozilla is providing links to these applications
as a courtesy, and makes no representations regarding the
applications or any information related thereto. Any questions,
complaints or claims regarding the applications must be
directed to the appropriate software vendor.
{%- endtrans %}

That's a decent looking template, right? Yeah, well, when Babel extracts that, it includes all the line breaks too, giving you something like this. The localizers would revolt if I sent them that, so I added in auto white-space collapsing. Getting Babel to use the new functions means a new extraction script.

At this point, we're extracting strings from our new code and we can convert between Python and PHP files. All we need now is a Frankenstein mix of xgettext functions to act as glue. Meet the amalgamate script that uses the pypo2php scripts, concatenates the .pot files, and merge updates each locales .po file. After that it's quick tweaks to the build scripts to create z-messages.po files and we're done.

So, all that said, the new process for L10n, while we're in this transitional phase, is:

  1. From the PHP code, run locale/extract-po-remora.sh. That pulls everything from all the PHP files, creates locale/r-keys.pot, updates the messages.po file for each locale, and compiles them. Life used to be so simple.
  2. From the python code, make sure you're up to date, then run ./manage.py extract. That will pull everything from the python code and templates and create locale/z-keys.pot.
  3. Run ./manage.py amalgamate. That will merge the z-keys.pot into the PHP messages.po files.
  4. Localizers can make their changes as usual, and commit back to messages.po.
  5. From PHP, locale/copy-to-zamboni.py locale will create z-messages.po files in the Python format. We could skip right to .mo files, but in case something goes wrong I want to see the .po files.
  6. Then, like today, locale/compile-mo.sh locale will compile all the .po files.

After all those steps are done, we've got duplicate .mo files, aside from formatting, and each application can look at its own .mo to get the strings it needs. All this code is just a big band-aid and there are plenty of things that are more fun than juggling L10n between two applications across two RCSs. But we knew what we were getting in to. I'll post something more positive later to help justify it. :)

7 Comments

Are you going to contribute the improvements back to Babel?
-- Jeff Balogh, 08 Mar 2010
Glad to see that the Translate Toolkit has helped you in the manipulation of localisation files, your scripts should be in v 1.6.0. I've always wanted to see the toolkit emerge as a powerful platform for the manipulating and managing localisation files. So nice to see it getting wider contributions and finding new uses.

If you can get that msgctxt patch into Python then you will be my hero :)
-- Dwayne Bailey, 08 Mar 2010
Btw, take a look at intl, it is an ICU based module and covers some of of the issues you describe here.

http://www.php.net/intl
-- Pierre, 09 Mar 2010
I remember reading somewhere that you'd written some Django middleware that allowed you to access PHP sessions from Django.

I need to do something similar to this for a project I'm working on at the moment. Is this code available anywhere?
-- Paul Stone, 09 Mar 2010
I remember reading somewhere that you’d written some Django middleware that allowed you to access PHP sessions from Django.
I need to do something similar to this for a project I’m working on at the moment. Is this code available anywhere?


That was Dave Dash. I think it's mostly this commit http://github.com/jbalogh/zamboni/commit/5f5c3c881e5ff9d6867749f9be162942ea03d169 but you should look at the newer versions of those files since that's from months ago.
-- Wil Clouser, 09 Mar 2010
I wish you luck. I had to share data between ASP.NET and PHP pages running on the same domain. You can't share Sessions. I passed some params and used Cookies but the security model breaks each time.
I came to the conclusion that you should use one or the other. In the end jQuery and Web Services came to my aid; it's the best way to share this type of work.
-- Les, 27 Jul 2010
I have been working with this  localization tool: https://poeditor.com and it really does a great job. It support a large number of translators on the same project, working on different languages. There are also plenty of features that ease the work. It has API and github integration also.
-- Rasizu, 12 Mar 2014

Post a comment

All comments are held for moderation; basic HTML formatting accepted.

Name: