Skip to content

Status Watch: An add-on for noticing HTTP error codes

Often on complex pages with many assets it can be easy to overlook assets which don’t load. Usually they are minor JS, CSS, or tracking pixels which aren’t noticed until you’ve spent way too long trying to track down the problem (or a month later you log into your stats dashboard to discover you haven’t been collecting stats).

With the launch of the new Add-on Builder (still an alpha product, but usable) I decided to make my first Jetpack to fix this.

In only about 20 lines of code I was able to look for any 4xx or 5xx errors in the HTTP traffic and show a brief notification to the user about what went wrong and where. The builder was a great development experience (lack of documentation aside) and was a breeze to do something relatively complex.

If you’ve wanted a similar add-on, feel free to use Status Watch. It’s a Jetpack so you’ll need Firefox 4, but on the bright side, you won’t even need to restart the browser. Just click and go.

I’ve had some requests to support a whitelist of sites so you don’t get notifications all over the web and I’ll add that when I get time. It’s surprising how many sites have 403s and 404s though.

Update 2011-02-17: Version 2 is on AMO now which supports a whitelist. See the AMO page for details.

Tagged , , , , ,

A Night in the Emergency Department

Within minutes of my arrival at the Emergency Department a call comes in that an ambulance will arrive shortly transporting a man in cardiac arrest.  Orientation can wait.  Over the next 20 minutes he is given a regiment of drugs.  I follow him to a unit that will try to locate and destroy the clot in his heart.  In the next hour his heart stops four times while technicians put two femoral catheters in his legs and follow a dye through his blood stream.  Eventually they finish what they can do and ship him to the Cardiac Care Unit.  No one knows about permanent damage.  On my way back to the Emergency Department I pass a frantic looking woman with a cell phone.  She’s just spied her teenage daughter running in and cries, “they say it’s his heart and it’s serious.”  I don’t make eye contact.

An older couple accompanies a woman on a stretcher with hematemesis and a severely distended abdomen into room 9.  She’s legally blind and keeps asking if they are in the room.  The old man continually assures her with a soft “I’m here, mom.”  I consider the overflowing landfills briefly as my non-latex glove count hits double digits in an hour.  Another bout of black vomit snaps me back to reality.  Mother Earth can take the hit tonight.  The nurse readies an NG tube while I wipe off the patient’s chin with a warm wash cloth and tell her she looks pretty again.  She smiles.

Every time I walk past room 20 I hear a woman sobbing into the phone.  She was out celebrating tonight and a dozen margaritas later she woke up on a stretcher with a fractured tibia and fibula from tripping over a curb in a parking lot.  Somewhere between the bar and the hospital she’s lost her purse, her clothes, and her self respect.  All she can do is apologize to her mother on the phone through sobbing breaths, over and over.

The hours pass by.  A 22 year old woman has an abscess under her eye; the doctor decides to drain it with a needle instead of a knife because he doesn’t want to cut up a young girl’s face.  A 27 year old male has a seizure because he stopped taking his medicine; he says his doctor never gave it to him.  A woman in room 10 watches hospital security put restraints on her husband so he doesn’t roll off his stretcher or hurt someone.  An 84 year old man with dysphasia (he can’t speak) watches in silent pain as a nurse tries to get an IV started for the third time.  An old man tells jokes to his wife and the assistant who is setting up for an EKG; the only interruption of his smile is every few minutes when he’s curled up and clutching his chest in pain.

The Emergency Department showcases the extremes of the emotional spectrum – the best and the worst of human nature.  In one bed is a 27 year old female with cuts on her wrists who washed down all the pills in her medicine cabinet with a bottle of vodka – 20 hours later she wakes up long enough to ask to go to the bathroom and then passes back out.  Two rooms down is a man with Alzheimer’s, cursing at the nurses for trying to remove his shirt.  His wife tells me that half the time he doesn’t remember his name, but he always remembers how to swear.  She tells him she loves him as she calms him down.  They’ll be together forever.

When I drive away from the ER that night Loveline is on the radio.  It’s some girl complaining that her boyfriend doesn’t try hard enough in bed.  The world’s problems seem trivial.


I became an EMT last year and as a part of the course I had to work a twelve hour shift in the emergency department. I wrote this essay for the class, but it seemed like something to share here also. Read out of context it may sound like I didn’t enjoy the night, but that was definitely not the case – I had a great time and learned a lot. I would definitely volunteer to work there again if they had the room.

Tagged ,

md5verify: A script to automatically verify file integrity

I have a lot of files on my computer. Email archives, personal documents, stuff for work, photos I’ve taken…the list goes on – I’m sure most people reading this are in a similar boat. On occasion I’ve found some files to be missing or corrupt which is disturbing but is probably something to be expected. The bad part is, I keep backups, but I rotate them out when they reach a certain age which means if I don’t notice a file is corrupt or missing I’ll eventually lose it forever.

I stayed up late a few nights ago and wrote a script to raise an alert when something has changed. On its first run the script will recursively walk a directory tree hashing each file and storing the hashes in the directory (in an md5sum compatible formatted file). On subsequent runs it will begin tracking new files automatically but it will also print messages for missing and changed files. By saving the checksums in each directory it becomes portable – you can copy a directory somewhere else and still be able to verify nothing changed (a quick md5sum -c checksums.txt will let you know).

By default the script only prints messages when it sees something fishy so it’s perfect to drop into cron and it uses exit statuses so it’ll work for nagios too. I’ve been running it for a few months and have found a couple files that have changed – nothing critical yet but it’s nice to know it’s there.

Tagged , , ,

CLI Split Windows in Vim

I use split windows, both horizontally and vertically, in Vim all the time. I’ve always wanted to be able to split the window and then start a command line shell within that window but up until now that has just been a dream.

My friend sent me a link to conque this morning which I’m so taken with that it’s prompted me to drop coding and write a blog post.

Conque lets me do exactly what I’ve wanted, create shells within Vim. In insert mode you interact with the shell as expected. In normal mode you can scroll back through your history, yank text into buffers, paste and manipulate text. Best thing ever.


Two files open on the left, three shells on the right; iPython, MySQL, and Django’s web server – all with syntax highlighting. Tagged , , ,

addons.mozilla.org ♥s unit tests. Again.

AMO has had an on-again off-again relationship with unit tests. A little over a year ago we had a thousand unit tests that sort of, mostly, ran. The problem is, PHP unit testing just isn’t as good as it should be. CakePHP relies on SimpleTest, one of the main PHP test suites. It worked relatively well for a small number of tests, but as our suite grew, so did our troubles.

Our main issue was hitting a memory limit or the max execution time. We hit the limits often for a variety of reasons, some legitimate bugs, and some because we tried to hack around things to make the tests run. If we change the limits we affect the tests because they are running within the same environment. There wasn’t really a concept of fixtures then, although it looks like CakePHP has stepped up there. The simple test web runner was hard to use and the mock objects were sometimes a little too mocked and missing some attributes.

All in all it was a heroic effort to get that many tests, but we didn’t maintain it because they were so slow to write and difficult to run. Testing can be a pain to write, sure, but it shouldn’t be a burden like that. Enter Django’s testing suite (built on top of Python’s unittest). It has most of our complaints handled out of the box. It’s very well documented, considers a lot of aspects of testing, supports fixtures, a built-in client, etc. It’s a well thought out framework to build tests on.

We’re being more vigilant about requiring tests this time around, but they also aren’t as frustrating to write. When you write them they actually work and they stay working. Most of what you want is built in already. For example, I wrote the password reset form we needed on AMO in Django. With CakePHP and SimpleTest I’d have no idea how to test that the email was actually working. It’s apparently possible with a SimpleTest add-on and enough code that I have to scroll in my browser. With Django’s test suite the actual code was 5 lines, 3 of which were assertions:


    def test_request_success(self):
        self.client.post('/en-US/firefox/users/pwreset',
                        {'email': self.user.email})

        eq_(len(mail.outbox), 1)
        assert mail.outbox[0].subject.find('Password reset') == 0
        assert mail.outbox[0].body.find('pwreset/%s' % self.uidb36) > 0

With the power of the new test suite we’re once again writing and maintaining our unit tests – currently at around 390 tests and increasing steadily. Plenty of people have written about why unit tests are important so I won’t belabor the point, but I will mention that it’s a great feeling to be able to commit something and be confident it hasn’t affected other parts of the site. It’s almost as good of a feeling when you write your code and a completely different test fails pointing out a case that you didn’t even consider but one that would soak up developer time trying to debug down the road.

Building on a foundation that takes testing seriously is great.

Tagged , , , , , ,

AMO brings new levels of pedantry to Mozilla Webdev

And we love it. :)

When we first started writing AMO in PHP we agreed to follow the PEAR coding standards and left it at that. Four years and thousands of lines of code later it’s roughly true, but there are some obvious mistakes and oversights. The main problem is that there is no automation for accountability. If someone happens to see something out of line in a code review, they’ll point it out, but after a while everything starts to look the same. Is that brace in the right place? Is there supposed to be a space there? Minor stuff slips by, it’s not a big deal.

When we moved to python we needed a new style. PEP 8 is the obvious one everyone goes with and we agreed we should follow it too. Then Jeff Balogh showed us a check script he wrote that automatically verifies code against PEP 8, PyFlakes, and complains if any lines are over 80 characters. With an automated system there is easy accountability and by checking it before you commit it’s easy to stay in line.

It’s not a big deal if there are spaces around your equals signs, or if there are two lines above your class declarations, but after thousands of lines of code it can make scanning and finding what you want a lot easier. The 80 character line limit is a classic programmer holy war and I think we settled on a good compromise: actual code is limited to 80 characters, templates can be more but should be a “reasonable” length. I took a screenshot of my actual workspace one day as I was digging into some code.

Click to enlarge

What do you notice about style? Aside from the Django code in the small window closest to the center, all the files fit on the screen without ugly line wrapping making it (relatively) easy to look at the full scope of MVC flow, along with the unit tests and some reference files. Open up any of our PHP files and try that and it’ll just be a mess of wrapped lines or horizontal scrolling.

Code standards checks aren’t something we took very seriously before but so far I’m really happy with what we’ve got in AMO. Automation makes all the difference.

PS: I don’t have to worry about blocking out any code in screenshots – isn’t open source great? Sometimes people complain about not being able to share their code and it always sounds so alien to me. I don’t usually advertise on here, but if you want to share your code, we’re always hiring.

Continuous Integration comes to AMO

It’s time to hail another milestone for AMO in our epic push for improvements in 2010. This time I’m happy to announce our Hudson continuous integration server which has been humming along for a few months.

Hudson Integration Screenshot. Click to enlarge.

AMO is the first Mozilla Webdev site to use continuous integration, and it’s been a long time coming. With the way it’s currently configured we’ve got code coverage trending, unit test trending, code quality trending, as well as detailed reports for all the above for every single check in.

If anything fails or oversteps a threshold our IRC bot complains and we can get it fixed up quickly. It’s a boon to productivity to know that all the code being checked in is being tested automatically, plus it gives everyone a stable state to compare to.

Thanks to everyone that helped get Hudson going, from the people that write it, to the IT team that keeps it alive, to the webdev team that helped work out the kinks.

Tagged , , , ,

Libraries to connect to a Citrix NetScaler or Zeus Traffic Manager

The first front end cache we used on AMO was the Citrix NetScaler. I’ve complained about it’s API before but apparently never announced the library I wrote to purge items from the cache. So, a little late, but I have some reusable PHP code that will talk to your NetScaler and let you expire objects.

We hit some limitations with the NetScaler that we weren’t happy with. Cost aside, it ignored some pretty standard stuff like the HTTP Vary Header. After working around that for years we switched to the horizontally scalable Zeus Traffic Manager (at that time, referred to as ZXTM). We’ve been pleased with our choice and six months ago I wrote a similar PHP library that allows you to connect to Zeus’s API. Time and priorities being what they are, we never implemented it in production.

Finally, the real point to this post, last night I wrote a python library that will expire content from Zeus. We’ll roll this into our migration and waiting on content to expire from Zeus should be a thing of the past.

As always, if you can use the libraries, feel free. They all have READMEs with examples.

Tagged , , , , ,

Maintaining localization between Python and PHP (it’s not fun)

I reached my hand into the barrel of problems our migration to Python is going to cause and came up with Localization. It figures.

First out of the chute was the .po files. It turns out the actual formatting is different between the two languages. PHP uses %1$s for its substitutions, but python uses either named variables like (num)s or integers like {0}. For the record, they both support %s when you don’t need to order the substitutions.
PHP example:
I have %2$s apples and %1$s oranges
Python example:
I have {1} apples and {0} oranges

Since I’ve worked with the Translate Toolkit before, I decided to write a script to convert between the two formats. If you find yourself in the same unfortunate boat as me, behold
phppo2pypo and pypo2phppo to convert between the two types.

Crisis averted, right? Oh, that’s just scratching the surface. Remember how happy I was that PHP finally started supporting msgctxt? Well, Python has had a patch for it since 2008 but no one has bothered to land it. I wrote a new ugettext() and ungettext() that recognizes context in the .po files. To use simply do: from l10n import ugettext as _ at the top of your file.

Along with adding msgctxt support, those two functions also collapse consecutive white space. We’re using Jinja2 with Babel and the i18n extension as our template engine. Jinja2 has a concept of stripping white space from the beginning or end of a string but does nothing about the middle. A paragraph of text in a Jinja2 template would look like:

{% trans -%}Mozilla is providing links to these applications
as a courtesy, and makes no representations regarding the
applications or any information related thereto. Any questions,
complaints or claims regarding the applications must be
directed to the appropriate software vendor.
{%- endtrans %}

That’s a decent looking template, right? Yeah, well, when Babel extracts that, it includes all the line breaks too, giving you something like this. The localizers would revolt if I sent them that, so I added in auto white-space collapsing. Getting Babel to use the new functions means a new extraction script.

At this point, we’re extracting strings from our new code and we can convert between Python and PHP files. All we need now is a Frankenstein mix of xgettext functions to act as glue. Meet the amalgamate script that uses the pypo2php scripts, concatenates the .pot files, and merge updates each locales .po file. After that it’s quick tweaks to the build scripts to create z-messages.po files and we’re done.

So, all that said, the new process for L10n, while we’re in this transitional phase, is:

  1. From the PHP code, run locale/extract-po-remora.sh. That pulls everything from all the PHP files, creates locale/r-keys.pot, updates the messages.po file for each locale, and compiles them. Life used to be so simple.
  2. From the python code, make sure you’re up to date, then run ./manage.py extract. That will pull everything from the python code and templates and create locale/z-keys.pot.
  3. Run ./manage.py amalgamate. That will merge the z-keys.pot into the PHP messages.po files.
  4. Localizers can make their changes as usual, and commit back to messages.po.
  5. From PHP, locale/copy-to-zamboni.py locale will create z-messages.po files in the Python format. We could skip right to .mo files, but in case something goes wrong I want to see the .po files.
  6. Then, like today, locale/compile-mo.sh locale will compile all the .po files.

After all those steps are done, we’ve got duplicate .mo files, aside from formatting, and each application can look at its own .mo to get the strings it needs. All this code is just a big band-aid and there are plenty of things that are more fun than juggling L10n between two applications across two RCSs. But we knew what we were getting in to. I’ll post something more positive later to help justify it. :)

Tagged , , , , ,

AMO Development Changes in 2010

The AMO team met in Mountain View last week to develop a 2010 plan. We’ve been wanting to change some key areas of our development flow for a while but we needed to make sure time was budgeted in the overall AMO and Mozilla goals. As usual, the timeline will be tight, but the AMO developers do amazing work and as our changes are implemented, development should just get faster. I’ll give a brief summary of the changes we’re planning; a lot of discussion went into this and I’m not going to be able to cover everything here. If you’ve been in the AMO calls or reading the notes you probably already know most of this.

Migrating from CakePHP to Django

This is a big undertaking and we’ve been discussing it for quite a while. We’re currently the highest trafficked site on the internet using CakePHP and along with that we’ve run into a lot of frustrating issues. CakePHP has serviced AMO well for several years, so it’s not my intention to bad mouth it here, but I do want to give a fair summary of why we’re moving on. Please also note that AMO is still running on CakePHP 1.1 which is, I think, a year out of date? Three substantial issues:

  • Useful Database Abstraction Layer: CakePHP has a concept of database abstraction, but we didn’t find it powerful enough. When it did work it would return enormous nested arrays of data causing massive CPU and memory usage (out of memory errors plague us on AMO). When it didn’t work, we’d end up doing queries directly which kind of defeats the purpose. We couldn’t use prepared statements so we’d have to escape variables ourselves. There was no effective caching built-in and since we just had huge arrays as a response there was no effective way to invalidate the cache we were using (see: Caching is easy; Expiration is hard). The DB layer should return objects that are easy to cache and easy to invalidate. The built-in Django database classes (combined with memcache) should work fine for us here.
  • Effective unit tests: I’ve beat the drum about our unit tests before but the simple matter is that it’s really difficult to do them right with the tools we are using. Our test data is already very limited, but if we try to run all our tests right now they’ll run out of memory (and take forever). The CakePHP method of mocking controllers and models was inadequate for what we needed and difficult to deal with. We want our unit tests to run quickly, from the command line, and be independent from each other so there aren’t intermittent problems to waste our time with. We’ll be using Django’s built-in testing framework.
  • Better debugging: Debugging in CakePHP amounts to defining a DEBUG level and seeing what is printed on the screen (usually the giant arrays). We supplemented this with Xdebug where we needed it, but that’s still not enough. A framework should have excellent logging and on-the-fly debugging that displays a full traceback (often something will fail deep within CakePHP and we’ll get the file/line where PHP gave up, but not the line in our code that started the problem), the values of variables, the page headers, server settings, SQL that was run, what views and elements are in use, etc. We’re planning on using a combination of pdb, IPython, and the django-debug-toolbar to make all of this easily accessible while developing.

Those are the major issues we’re having right now, but if you want to dig into the comparison some more check out our discussion wiki pages, but realize the majority of discussion happened in person.

Moving away from SVN

We moved AMO into SVN in 2006 and it’s treated us relatively well. Somewhere along the line, we decided to tag our production versions at a revision of trunk instead of keeping a separate tag and merging changes into it. It’s worked for us but it’s a hard cutoff on code changes, which means that while we’re in a code freeze no one can check anything in to trunk. As we begin to branch for larger projects this will become more of a hassle, so I’m planning on going back to a system where a production tag is created and changes are merged into it as they are ready to go live.

Most of the development team has been using git-svn for several months and, aside from the commands being far more verbose, we haven’t had many complaints. We’ve discovered Git is a much more powerful development tool and we expect to use it directly starting some time next year. As of now, we expect to maintain the /locales/ directory in SVN so this change doesn’t affect localizers but we’ll keep people notified if there are any changes to that process.

Continuous Integration

I mentioned excellent testing being one of the reasons we’re moving to Django. Along with that testing is the opportunity for continuous integration. We plan on using Hudson as the framework for our continuous integration. With excellent test coverage and quick feedback from Hudson this should drastically lower our regressions and boost our confidence when we deploy. Speaking of which…

Faster Deployment

For most of 2009 we’ve pushed on 3 week cycles. 2 weeks of development, 1 week of QA and L10n. Delays and regressions being what they are, I think we averaged a little better than a push a month. This is a fairly rapid cycle for a lot of development shops, but I feel like it’s holding us back. We’ve heard a lot of success stories about shorter cycles and I’d like to aim for deployment (optionally, of course) of a few times per week. By shortening the development cycle we reduce the stress of:

  • the developers: Everyone likes to see what they’ve done go out quicker and it means less conflicts with others when the patches are smaller.
  • the QA team: Right now we dump 2 weeks of work on them and say we need it done right away. With smaller cycles they can verify small changes as they go and not be overwhelmed.
  • the infrastructure team: Smaller changes means less to go wrong and with a continuous integration server and some automation they can have minimal involvement with the whole process.
  • the localizers: Every time we release we dump a bunch of changes on these fantastic people and tell them we need them back in a week. Most of the time they plow forward and get them done on time. If they don’t though, they are stuck with waiting for the next 3 week cycle. If we push often, it’s not a big deal.
  • the product managers: These guys come up with crazy ideas for us to implement and then they stare at graphs and numbers to see if it worked. With shorter cycles they can get faster feedback about what works and what doesn’t.
  • the users: Faster release cycles means bugs that are fixed in the repository are fixed on the live site sooner. ’nuff said.

Process Data Offline

Much of AMO relies on cron jobs to get things done. All the statistics, add-on download numbers, how popular an add-on is, all the star rating calculations, any cleanup or maintenance tasks – these are all run via cron and they are so intensive that the database has trouble keeping up. We’re planning on utilizing Gearman to farm all this work out to other machines in incremental pieces instead of single huge queries. Any heavy calculating that can be done offline will be moved to these external processors which should help improve the speed of the site and make all our statistics more reliable (as currently the cron jobs have a tendency to fail before they are complete).

Improve the Documentation

Documentation is a noble goal of many developers but it rarely gets enough attention. We evaluated our current documentation and found it is woefully out of date. By being on a wiki that is rarely used it doesn’t get updated except when someone tries to use it and sees it’s not right. We’re hoping to change that by moving the developer documentation into the code repository itself. We’ll be able to integrate with generated API docs, style the docs however we want, and check in changes right along with our code patches. When someone checks out a copy of AMO, they’ll get all the documentation right along with it. We’ll use Sphinx to build the docs.

The outline above details several large, high-level changes but there are a lot of other plans for smaller improvements as well. This post got a lot longer than I was expecting, but I’m really excited about the direction AMO is headed for 2010. As these changes are implemented the site will become more responsive and reliable, and we’ll be able to adapt to the needs of Mozilla’s users even faster. As always, feedback and discussion are welcome and stay tuned for further back end improvements.

Tagged , , , , , , , , , , ,