<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>All Night Diner &#187; scalability</title>
	<atom:link href="http://micropipes.com/blog/tag/scalability/feed/" rel="self" type="application/rss+xml" />
	<link>http://micropipes.com/blog</link>
	<description>because at 3am anything sounds good</description>
	<lastBuildDate>Mon, 03 May 2010 17:34:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Libraries to connect to a Citrix NetScaler or Zeus Traffic Manager</title>
		<link>http://micropipes.com/blog/2010/03/23/libraries-to-connect-to-a-citrix-netscaler-or-zeus-traffic-manager/</link>
		<comments>http://micropipes.com/blog/2010/03/23/libraries-to-connect-to-a-citrix-netscaler-or-zeus-traffic-manager/#comments</comments>
		<pubDate>Tue, 23 Mar 2010 18:56:57 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=133</guid>
		<description><![CDATA[The first front end cache we used on AMO was the Citrix NetScaler.  I&#8217;ve complained about it&#8217;s API before but apparently never announced the library I wrote to purge items from the cache.  So, a little late, but I have some reusable PHP code that will talk to your NetScaler and let you [...]]]></description>
			<content:encoded><![CDATA[<p>The first front end cache we used on <a href="https://addons.mozilla.org/">AMO</a> was the <a href="http://www.citrix.com/English/ps2/products/product.asp?contentID=21679">Citrix NetScaler</a>.  I&#8217;ve <a href="http://micropipes.com/blog/2008/07/14/planning-your-api-is-important/">complained about it&#8217;s API before</a> but apparently never announced the library I wrote to purge items from the cache.  So, a little late, but I have <a href="http://viewvc.svn.mozilla.org/vc/libs/ns-api/">some reusable PHP code that will talk to your NetScaler</a> and let you expire objects.</p>
<p>We hit some limitations with the NetScaler that we weren&#8217;t happy with.  Cost aside, it ignored some pretty standard stuff like the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44">HTTP Vary Header</a>.  After working around that for years we switched to the horizontally scalable <a href="http://www.zeus.com/products/traffic-manager/index.html">Zeus Traffic Manager</a> (at that time, referred to as ZXTM).  We&#8217;ve been pleased with our choice and six months ago I wrote a <a href="http://viewvc.svn.mozilla.org/vc/libs/zxtm-api/">similar PHP library that allows you to connect to Zeus&#8217;s API</a>.  Time and priorities being what they are, we never implemented it in production.</p>
<p>Finally, the real point to this post, last night I wrote a <a href="http://github.com/clouserw/hera">python library that will expire content from Zeus</a>.  We&#8217;ll roll this into our migration and waiting on content to expire from Zeus should be a thing of the past.</p>
<p>As always, if you can use the libraries, feel free.  They all have READMEs with examples.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2010/03/23/libraries-to-connect-to-a-citrix-netscaler-or-zeus-traffic-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AMO Development Changes in 2010</title>
		<link>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/</link>
		<comments>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 21:44:12 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Git]]></category>
		<category><![CDATA[hindsight]]></category>
		<category><![CDATA[L10n]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[SVN]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=98</guid>
		<description><![CDATA[The AMO team met in Mountain View last week to develop a 2010 plan.  We&#8217;ve been wanting to change some key areas of our development flow for a while but we needed to make sure time was budgeted in the overall AMO and Mozilla goals.  As usual, the timeline will be tight, but [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="https://addons.mozilla.org/"><abbr title="addons.mozilla.org">AMO</abbr></a> team met in Mountain View last week to develop a 2010 plan.  We&#8217;ve been wanting to change some key areas of our development flow for a while but we needed to make sure time was budgeted in the overall AMO and Mozilla goals.  As usual, the timeline will be tight, but the AMO developers do amazing work and as our changes are implemented, development should just get faster.  I&#8217;ll give a brief summary of the changes we&#8217;re planning; a lot of discussion went into this and I&#8217;m not going to be able to cover everything here.  If you&#8217;ve been in the AMO calls or reading the notes you probably already know most of this.</p>
<h3>Migrating from CakePHP to Django</h3>
<p>This is a big undertaking and we&#8217;ve been discussing it for quite a while.  We&#8217;re currently the highest trafficked site on the internet using <a href="http://cakephp.org/">CakePHP</a> and along with that we&#8217;ve run into a lot of frustrating issues.  CakePHP has serviced AMO well for several years, so it&#8217;s not my intention to bad mouth it here, but I do want to give a fair summary of why we&#8217;re moving on.  Please also note that <em>AMO is still running on CakePHP 1.1 which is, I think, a year out of date</em>?  Three substantial issues:</p>
<ul>
<li><strong>Useful Database Abstraction Layer:</strong>  CakePHP has a concept of database abstraction, but we didn&#8217;t find it powerful enough.  When it did work it would return enormous nested arrays of data causing massive CPU and memory usage (out of memory errors plague us on AMO).  When it didn&#8217;t work, we&#8217;d end up doing queries directly which kind of defeats the purpose.  We couldn&#8217;t use prepared statements so we&#8217;d have to escape variables ourselves.  There was no effective caching built-in and since we just had huge arrays as a response there was no effective way to invalidate the cache we were using (see: <a href="http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/">Caching is easy; Expiration is hard</a>).  The DB layer should return objects that are easy to cache and easy to invalidate.  The built-in Django database classes (combined with memcache) should work fine for us here.</li>
<li><strong>Effective unit tests:</strong>  I&#8217;ve <a href="http://micropipes.com/blog/2009/04/09/addonsmozillaorg-celebrates-1000-passing-unit-tests/">beat the drum about our unit tests before</a> but the simple matter is that it&#8217;s really difficult to do them right with the tools we are using.  Our test data is already very limited, but if we try to run all our tests right now they&#8217;ll run out of memory (and take forever).  The CakePHP method of mocking controllers and models was inadequate for what we needed and difficult to deal with.  We want our unit tests to run quickly, from the command line, and be independent from each other so there aren&#8217;t intermittent problems to waste our time with.  We&#8217;ll be using Django&#8217;s <a href="http://docs.djangoproject.com/en/dev/topics/testing/">built-in testing framework</a>.</li>
<li><strong>Better debugging:</strong>  Debugging in CakePHP amounts to defining a DEBUG level and seeing what is printed on the screen (usually the giant arrays).  We supplemented this with <a href="http://www.xdebug.org/">Xdebug</a> where we needed it, but that&#8217;s still not enough.  A framework should have excellent logging and on-the-fly debugging that displays a full traceback (often something will fail deep within CakePHP and we&#8217;ll get the file/line where PHP gave up, but not the line in our code that started the problem), the values of variables, the page headers, server settings, SQL that was run, what views and elements are in use, etc.  We&#8217;re planning on using a combination of <a href="http://docs.python.org/library/pdb.html">pdb</a>, <a href="http://ipython.scipy.org/moin/">IPython</a>, and the <a href="http://robhudson.github.com/django-debug-toolbar/">django-debug-toolbar</a> to make all of this easily accessible while developing.</li>
</ul>
<p>Those are the major issues we&#8217;re having right now, but if you want to dig into the comparison some more check out our <a href="https://wiki.mozilla.org/AMO:v4">discussion wiki pages</a>, but realize the majority of discussion happened in person.</p>
<h3>Moving away from <abbr title="Subversion">SVN</abbr></h3>
<p>We moved AMO into SVN in 2006 and it&#8217;s treated us relatively well.  Somewhere along the line, we decided to tag our production versions at a revision of trunk instead of keeping a separate tag and merging changes into it.  It&#8217;s worked for us but it&#8217;s a hard cutoff on code changes, which means that while we&#8217;re in a code freeze no one can check anything in to trunk.  As we begin to branch for larger projects this will become more of a hassle, so I&#8217;m planning on going back to a system where a production tag is created and changes are merged into it as they are ready to go live.</p>
<p>Most of the development team has been using <a href="http://kernel.org/pub/software/scm/git/docs/git-svn.html">git-svn</a> for several months and, aside from the commands being far more verbose, we haven&#8217;t had many complaints.  We&#8217;ve discovered <a href="http://git-scm.com/">Git</a> is a much more powerful development tool and we expect to use it directly starting some time next year.  As of now, we expect to maintain the /locales/ directory in SVN so this change doesn&#8217;t affect localizers but we&#8217;ll keep people notified if there are any changes to that process.</p>
<h3>Continuous Integration</h3>
<p>I mentioned excellent testing being one of the reasons we&#8217;re moving to Django.  Along with that testing is the opportunity for continuous integration.  We plan on using <a href="https://hudson.dev.java.net/">Hudson</a> as the framework for our continuous integration.  With excellent test coverage and quick feedback from Hudson this should drastically lower our regressions and boost our confidence when we deploy.  Speaking of which&#8230;</p>
<h3>Faster Deployment</h3>
<p>For most of 2009 we&#8217;ve pushed on 3 week cycles.  2 weeks of development, 1 week of <abbr title="Quality Assurance">QA</abbr> and <abbr title="Localization">L10n</abbr>.  Delays and regressions being what they are, I think we averaged a little better than a push a month.  This is a fairly rapid cycle for a lot of development shops, but I feel like it&#8217;s holding us back.  We&#8217;ve heard a lot of success stories about shorter  cycles and I&#8217;d like to aim for deployment (optionally, of course) of a few times per week.  By shortening the development cycle we reduce the stress of:</p>
<ul>
<li><strong>the developers:</strong>  Everyone likes to see what they&#8217;ve done go out quicker and it means less conflicts with others when the patches are smaller.</li>
<li><strong>the QA team:</strong> Right now we dump 2 weeks of work on them and say we need it done right away.  With smaller cycles they can verify small changes as they go and not be overwhelmed.</li>
<li><strong>the infrastructure team:</strong> Smaller changes means less to go wrong and with a continuous integration server and some automation they can have minimal involvement with the whole process.</li>
<li><strong>the localizers:</strong> Every time we release we dump a bunch of changes on these fantastic people and tell them we need them back in a week.  Most of the time they plow forward and get them done on time.  If they don&#8217;t though, they are stuck with waiting for the next 3 week cycle.  If we push often, it&#8217;s not a big deal.</li>
<li><strong>the product managers:</strong> These guys come up with crazy ideas for us to implement and then they stare at graphs and numbers to see if it worked.  With shorter cycles they can get faster feedback about what works and what doesn&#8217;t.</li>
<li><strong>the users:</strong> Faster release cycles means bugs that are fixed in the repository are fixed on the live site sooner.  &#8217;nuff said.</li>
</ul>
<h3>Process Data Offline</h3>
<p>Much of AMO relies on cron jobs to get things done.  All the statistics, add-on download numbers, how popular an add-on is, all the star rating calculations, any cleanup or maintenance tasks &#8211; these are all run via cron and they are so intensive that the database has trouble keeping up.  We&#8217;re planning on utilizing <a href="http://gearman.org/">Gearman</a> to farm all this work out to other machines in incremental pieces instead of single huge queries.  Any heavy calculating that can be done offline will be moved to these external processors which should help improve the speed of the site and make all our statistics more reliable (as currently the cron jobs have a tendency to fail before they are complete).</p>
<h3>Improve the Documentation</h3>
<p>Documentation is a noble goal of many developers but it rarely gets enough attention.  We evaluated our <a href="https://wiki.mozilla.org/AMO:Developers">current documentation</a> and found it is woefully out of date.  By being on a wiki that is rarely used it doesn&#8217;t get updated except when someone tries to use it and sees it&#8217;s not right.  We&#8217;re hoping to change that by moving the developer documentation into the code repository itself.  We&#8217;ll be able to integrate with generated API docs, style the docs however we want, and check in changes right along with our code patches.  When someone checks out a copy of AMO, they&#8217;ll get all the documentation right along with it.  We&#8217;ll use <a href="http://sphinx.pocoo.org/">Sphinx</a> to build the docs.</p>
<p>The outline above details several large, high-level changes but there are a lot of other plans for smaller improvements as well.  This post got a lot longer than I was expecting, but I&#8217;m really excited about the direction AMO is headed for 2010.  As these changes are implemented the site will become more responsive and reliable, and we&#8217;ll be able to adapt to the needs of Mozilla&#8217;s users even faster.  As always, feedback and discussion are welcome and stay tuned for further back end improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/feed/</wfw:commentRss>
		<slash:comments>37</slash:comments>
		</item>
		<item>
		<title>Planning your API is important</title>
		<link>http://micropipes.com/blog/2008/07/14/planning-your-api-is-important/</link>
		<comments>http://micropipes.com/blog/2008/07/14/planning-your-api-is-important/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 07:04:34 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=55</guid>
		<description><![CDATA[I&#8217;m upgrading some code I wrote to talk to a new version of the Citrix NetScaler&#8217;s API.  The NetScaler&#8217;s manuals (that&#8217;s right, plural) weigh in at a combined 1114 pages so documentation isn&#8217;t a problem and their implementation is a breeze using WSDL over SOAP.  However, some of the core changes left me [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m upgrading some code I wrote to talk to a new version of the <a href="http://www.citrix.com/English/ps2/products/product.asp?contentID=21679">Citrix NetScaler</a>&#8217;s <abbr title="Application Programming Interface">API</abbr>.  The NetScaler&#8217;s manuals (that&#8217;s right, plural) weigh in at a combined 1114 pages so documentation isn&#8217;t a problem and their implementation is a breeze using <abbr title="Web Services Definition Language">WSDL</abbr> over <abbr title="Simple Object Access Protocol">SOAP</abbr>.  However, some of the core changes left me scratching my head.  Case in point:</p>
<p>To get an object out of the cache in the previous version the method signature was:</p>
<blockquote><p>getcacheobject(string $url, string $host, unsignedInt $port)</p></blockquote>
<p>$url in this case is what is commonly referred to as the &#8220;path&#8221; in a standard <abbr title="Uniform Resource Locator">URL</abbr>; e.g. /en-US/index.html.  That&#8217;s simple enough to use, let&#8217;s see what the new version has:</p>
<blockquote><p>getcacheobject(string $url, unsignedLong $locator, string $host, unsignedInt $port [...]</p></blockquote>
<p>$locator is an internal id for a cache object which is defintitely something I should be able to use to expire content, but what is it doing in the middle of my url+host pair?  The error messages are happy to let me know that I <b>can&#8217;t</b> pass both $url and $locator to the same call and if I pass $url I <b>must</b> pass $host.  That means I will always have be passing a null value in either the first or second parameter.</p>
<p>And if you&#8217;re thinking it&#8217;s for consistency with other methods, think again.  To flush an object we&#8217;ve got:</p>
<blockquote><p>flushcacheobject(unsignedLong $locator, string $url, string $host, unsignedInt $port [...]</p></blockquote>
<p>Moral of the story: planning and consistency makes happy programmers; particularly with an API.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/07/14/planning-your-api-is-important/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Speed testing your web site with AOL Pagetest</title>
		<link>http://micropipes.com/blog/2008/06/24/speed-testing-your-web-site-with-aol-pagetest/</link>
		<comments>http://micropipes.com/blog/2008/06/24/speed-testing-your-web-site-with-aol-pagetest/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 21:16:39 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=51</guid>
		<description><![CDATA[I&#8217;m at Velocity 2008 right now and the keynote I was most excited about this morning introduced http://webpagetest.org/.
This site accepts a URL which it loads over a network connection with the speed of your choice.  After loading the site it offers you a waterfall graph of page elements, a screenshot so you can see [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m at <a href="http://en.oreilly.com/velocity2008/public/content/home">Velocity 2008</a> right now and the keynote I was most excited about this morning introduced <a href="http://webpagetest.org/">http://webpagetest.org/</a>.</p>
<p>This site accepts a <abbr title="Uniform Resource Locator">URL</abbr> which it loads over a network connection with the speed of your choice.  After loading the site it offers you a waterfall graph of page elements, a screenshot so you can see if it loaded correctly, and automatic comparisons to previous page loads to see how caching affects your site.</p>
<p>The code behind the page is based on the open source <a href="http://pagetest.sourceforge.net/">AOL Pagetest</a> which means it can be modified as needed.  I think I heard <a href="http://www.ryandoherty.net/">Ryan</a> say something about automated testing over time with historical graphs and email alerts? <img src='http://micropipes.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/06/24/speed-testing-your-web-site-with-aol-pagetest/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Memcached best practices and internals</title>
		<link>http://micropipes.com/blog/2008/04/23/memcached-best-practices-and-internals/</link>
		<comments>http://micropipes.com/blog/2008/04/23/memcached-best-practices-and-internals/#comments</comments>
		<pubDate>Wed, 23 Apr 2008 16:37:29 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/04/23/memcached-best-practices-and-internals/</guid>
		<description><![CDATA[Yesterday, igvita.com published a short article summarizing memcached internals and best practices.  It was originally a talk by Brian Aker and Alan Kasindorf and their slides are included.  It&#8217;s a good read and digs into memcached&#8217;s memory management and expiration logic.  (Thanks for the heads up, chenb).
I noticed their first best practice [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, igvita.com published <a href="http://www.igvita.com/2008/04/22/mysql-conf-memcached-internals/">a short article summarizing memcached internals and best practices</a>.  It was originally a talk by Brian Aker and Alan Kasindorf and their slides are included.  It&#8217;s a good read and digs into memcached&#8217;s memory management and expiration logic.  (Thanks for the heads up, chenb).</p>
<p>I noticed their first best practice for memcached, &#8220;Don&#8217;t think row-level (database) caching, think complex objects, &#8221; is in opposition with what we&#8217;re doing on <a href="http://addons.mozilla.org/"><abbr title="addons.mozilla.org">AMO</abbr></a>.  After stuff like <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=425315" title="Implement full-time cache with instant invalidation">bug 425315</a> I completely agree with them.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/04/23/memcached-best-practices-and-internals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Caching is easy; Expiration is hard</title>
		<link>http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/</link>
		<comments>http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/#comments</comments>
		<pubDate>Wed, 23 Apr 2008 16:36:00 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/</guid>
		<description><![CDATA[Still on a high from our success with memcached in AMO version 2, we decided to go a fairly common route and cache query results in version 3.  This performs admirably particularly with our rediculously long and slow queries.  Over time, though, the popularity of the site and the load on the servers [...]]]></description>
			<content:encoded><![CDATA[<p>Still on a high from <a href="http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/">our success with memcached</a> in <abbr title="addons.mozilla.org">AMO</abbr> version 2, we decided to go a fairly common route and cache query results in version 3.  This performs admirably particularly with our <a href="http://blog.mozilla.com/webdev/2007/04/18/teaching-cakephp-to-be-multilingual-part-3/">rediculously long and slow queries</a>.  Over time, though, the popularity of the site and the load on the servers climb, and soon we&#8217;re looking at slowness issues again.  On a rough day we decided to increase the expiration timeout for our queries in memcached from a minute to around an hour.  This gives the database servers some breathing room but causes excessive delay on the AMO site when add-ons are updated and things like <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=425315" title="Implement full-time cache with instant invalidation">bug 425315</a> and it&#8217;s friends are born.  Weird things happen when parts of a site expire at different times and consequently user experience (particularly add-on developers) suffers.</p>
<p>The problem we&#8217;re running into in the bug linked above is knowing when to expire a cache.  Consider when an add-on author updates the summary of their add-on.  We know we&#8217;ll have to flush the queries on that page out of memcached, and that&#8217;s easy enough, but what about all the other places the summary is used?  Search results pages, add-on detail pages, recommended lists, the <abbr title="Application Programming Interface">API</abbr>, etc.  Now we&#8217;ve got to figure out the queries used on those pages and expire them too.  Suddenly I&#8217;m wishing we were caching objects in memcached instead of queries.</p>
<p>I looked in to other ways to use memcached and they all have their pros and cons.  Caching entire pages means we&#8217;d have to store different versions for a person that is logged in vs. logged out and also what permissions they had (pages have different options for localizers, admins, developers, etc.).  Caching objects is attractive, but the way <a href="http://cakephp.org">CakePHP</a> does queries makes this a non-option (namely, it&#8217;s not asking objects for values, it does joins directly on the db).  Directly caching queries seems like the best fit because we can affect just the parts of the pages we want and it will work with CakePHP&#8217;s current system&#8230;just as soon as we figure out how to relate updating a row to all of it&#8217;s associated queries.</p>
<p>I attached an idea to the bug but regardless of the process we use, figuring out how to implement a full time cache that we can expire on the fly is going to be an important step in keeping the AMO site usable as our traffic increases.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>AMO Scalability: Then and Now</title>
		<link>http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/</link>
		<comments>http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/#comments</comments>
		<pubDate>Fri, 18 Apr 2008 08:30:49 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/</guid>
		<description><![CDATA[Struggling with scalability on AMO is nothing new but the tools we use to solve the problems have changed over time.  Here is a bit of information on the performance evolution AMO has gone through.  I wanted to link to the wayback machine for all our old versions, but I get &#8220;Redirect Errors&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>Struggling with scalability on <abbr title="addons.mozilla.org">AMO</abbr> is nothing new but the tools we use to solve the problems have changed over time.  Here is a bit of information on the performance evolution AMO has gone through.  I wanted to link to the <a href="http://www.archive.org/web/web.php">wayback machine</a> for all our old versions, but I get &#8220;Redirect Errors&#8221; for the addons.mozilla.org domain.  I&#8217;ll have to make due with code repositories.</p>
<p><a href="http://mxr.mozilla.org/mozilla/source/webtools/update/">Version 1</a> of AMO wasn&#8217;t concerned with caching.  It was straight <abbr title="PHP: Hypertext Preprocessor">PHP</abbr> talking directly to a single MySQL box.  Short, easy, and not very scalable.</a></p>
<p><a href="http://lxr.mozilla.org/mozilla/source/webtools/addons/">Version 2</a> of AMO progressed through several caching systems.  The site used the <a href="http://www.smarty.net/">Smarty template engine</a> so our first step was to turn on the built in Smarty cache.  That didn&#8217;t give us the performance we needed, so <a href="http://morgamic.com/">Mike Morgan</a> started caching page output in <a href="http://pear.php.net/package/Cache_Lite"><abbr title="PHP Extension and Application Repository">PEAR</abbr>&#8217;s Cache_Lite</a>.  I don&#8217;t remember the specifics of this implementation since it was so short lived (less than a month), but the <a href="http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/webtools/addons/public/inc/finish.php&#038;rev=HEAD&#038;mark=1.7">CVS log</a>, mentions problems with &#8220;scalability in a clustered environment.&#8221;  Our next step was to store the same page output in <a href="http://www.danga.com/memcached/">memcached</a> instead of Cache_Lite which brought pretty satisfying results.  Thus began our abuse of memcached.</p>
<p>In addition to memcached and expanding the number of web servers it ran on, version 2 also boasted two other significant performance improvements. The first was the ability to talk to a slave database for read-only queries which, when combined with a load balancer, let us scale database servers horizontally.  The second was installing a <a href="http://www.citrix.com/english/ps2/products/product.asp?contentID=21679">NetScaler</a> in front of addons.mozilla.org giving us the benefits of a reverse proxy cache and <abbr title="Secure Socket Layer">SSL</abbr> offloading.  These changes bought us precious time when hoards of Firefox 1.5 users were clamoring for add-ons.  In fact, I&#8217;d say we were in pretty good shape at that point.</p>
<p>Fast forward to <a href="http://svn.mozilla.org/addons/trunk/">Version 3</a> (the current version).  We&#8217;ve expanded the memcache servers from one to two and instead of page output we&#8217;re storing database queries and their results.  We&#8217;re still using a single master database but are using two slaves now for read only queries.  There are several NetScalers around the world caching pages locally[1] for closer regions.  We&#8217;ve survived quite a while on this system but we&#8217;re starting to push the envelope again and we&#8217;re going to need to make some changes to be able to scale for Firefox 3 and still provide a good user experience.  I&#8217;ll write more about our plans as they develop.</p>
<p>[1] Users who are logged in to AMO don&#8217;t get the local caches &#8211; their connection is always to San Jose, CA.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Frameworks that start sessions for every visitor make me sad</title>
		<link>http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/</link>
		<comments>http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/#comments</comments>
		<pubDate>Thu, 06 Mar 2008 07:00:48 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/</guid>
		<description><![CDATA[I might have played the devil&#8217;s advocate when Lars was hating on frameworks at the barcamp last weekend, but that doesn&#8217;t mean I don&#8217;t see his point.  The latest in a series of frustrations with frameworks kept me up until 3am last night.  What better way to cap it off than complaining on [...]]]></description>
			<content:encoded><![CDATA[<p>I might have played the devil&#8217;s advocate when <a href="http://staff.osuosl.org/~lohnk/blog/">Lars</a> was hating on frameworks at the <a href="http://barcamp.org/BeaverBarCamp">barcamp</a> last weekend, but that doesn&#8217;t mean I don&#8217;t see his point.  The latest in a series of frustrations with frameworks kept me up until 3am last night.  What better way to cap it off than complaining on the internet?</p>
<p>Today&#8217;s subject is anonymous sessions.  Frameworks (and developers) love them because they are simple and convenient, but it comes at a cost.  Keeping track of sessions for every visitor on a high traffic site is far too expensive to be practical.  Developers should know how to work around this, but their frameworks need to support them.</p>
<p>The first framework on my mind is <a href="http://drupal.org/">Drupal</a>.  I filed <a href="http://drupal.org/node/201122">an issue</a> last year that Drupal should support disabling anonymous sessions.  It&#8217;s still unassigned so I&#8217;m guessing it&#8217;s not a high priority, but it was one of the main things that made me choose not to use Drupal on mozilla.com.  I <a href="http://drupal.org/node/183006">wrote some ideas</a> on how to handle it and got some responses from people suffering the same fate.  No word on any progress though.</p>
<p>The second framework, <a href="http://cakephp.org/">CakePHP</a>, has an AUTO_SESSION variable that, <a href="http://micropipes.com/blog/2008/01/07/cakephps-cache-that-wouldnt-quit/">just like $cacheQueries</a>, is far to easy to misplace faith in.</p>
<p>By setting AUTO_SESSION to false, you can&#8217;t read or write to the session.  Working as advertised?  Not so much.  If you take a closer look at what&#8217;s actually happening you&#8217;ll see that the session is still getting started, it&#8217;s just that CakePHP is blocking your access to it.  Even with AUTO_SESSION off, a cookie with a unique ID is set, and <strong>a row is still inserted into the sessions table</strong>.  That last part almost brought down <a href="https://addons.mozilla.org/">AMO</a> last night.  I wrote <a href="http://viewvc.svn.mozilla.org/vc/addons/trunk/site/cake/libs/controller/components/session.php?r1=10970&#038;r2=10969&#038;pathrev=10970">a patch that disables anonymous sessions for real</a>, but anyone that has talked to me about patching core code knows I don&#8217;t like to do it.</p>
<p>When you&#8217;re writing code, framework or not, don&#8217;t forget about scalability.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
