Yesterday, igvita.com published a short
article summarizing memcached internals and best practices. It was
originally a talk by Brian Aker and Alan Kasindorf and their slides are
included. It's a good read and digs into memcached's memory management and
expiration logic. (Thanks for the heads up, chenb).
Still on a high from our success with memcached in AMO version 2, we decided to go a fairly
common route and cache query results in version 3. This performs admirably
particularly with our ridiculously long and slow queries. Over time,
though, the popularity of the site and the load on the servers climb, and soon
we’re looking at slowness issues again. On a rough day we decided to increase
the expiration timeout for our queries in memcached from a minute to around an
hour. This gives the database servers some breathing room but causes excessive
delay on the AMO site when add-ons are updated and things like bug 425315
and it’s friends are born. Weird things happen when parts of a site expire at
different times and consequently user experience (particularly add-on
developers) suffers.
Struggling with scalability on AMO is
nothing new but the tools we use to solve the problems have changed over time.
Here is a bit of information on the performance evolution AMO has gone through.
I wanted to link to the wayback
machine for all our old versions, but I get "Redirect Errors" for the
addons.mozilla.org domain. I'll have to make due with code repositories.
Laura attended CakeFest a couple months ago and got to meet
some core Cake developers in person. In doing so she let slip that AMO was running on a pretty old version
(1.1.12 - Released in December of 2006). Apparently 1.1.15 had some major
performance boosts and since we melted the cluster a few times recently (the new API was the culprit) we
thought it would be a good idea to investigate upgrading.