AMO Scalability: Then and Now
Struggling with scalability on AMO is nothing new but the tools we use to solve the problems have changed over time. Here is a bit of information on the performance evolution AMO has gone through. I wanted to link to the wayback machine for all our old versions, but I get "Redirect Errors" for the addons.mozilla.org domain. I'll have to make due with code repositories.
Version 1 of AMO wasn't concerned with caching. It was straight PHP talking directly to a single MySQL box. Short, easy, and not very scalable.</a>
Version 2 of AMO progressed through several caching systems. The site used the Smarty template engine so our first step was to turn on the built in Smarty cache. That didn't give us the performance we needed, so Mike Morgan started caching page output in PEAR's Cache_Lite. I don't remember the specifics of this implementation since it was so short lived (less than a month), but the CVS log, mentions problems with "scalability in a clustered environment." Our next step was to store the same page output in memcached instead of Cache_Lite which brought pretty satisfying results. Thus began our abuse of memcached.
In addition to memcached and expanding the number of web servers it ran on, version 2 also boasted two other significant performance improvements. The first was the ability to talk to a slave database for read-only queries which, when combined with a load balancer, let us scale database servers horizontally. The second was installing a NetScaler in front of addons.mozilla.org giving us the benefits of a reverse proxy cache and SSL offloading. These changes bought us precious time when hoards of Firefox 1.5 users were clamoring for add-ons. In fact, I'd say we were in pretty good shape at that point.
Fast forward to Version 3 (the current version). We've expanded the memcache servers from one to two and instead of page output we're storing database queries and their results. We're still using a single master database but are using two slaves now for read only queries. There are several NetScalers around the world caching pages locally[1] for closer regions. We've survived quite a while on this system but we're starting to push the envelope again and we're going to need to make some changes to be able to scale for Firefox 3 and still provide a good user experience. I'll write more about our plans as they develop.
[1] Users who are logged in to AMO don't get the local caches - their connection is always to San Jose, CA.