<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>All Night Diner &#187; PHP</title>
	<atom:link href="http://micropipes.com/blog/tag/php/feed/" rel="self" type="application/rss+xml" />
	<link>http://micropipes.com/blog</link>
	<description>because at 3am anything sounds good</description>
	<lastBuildDate>Fri, 20 May 2011 16:16:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Welcome to the Landfill</title>
		<link>http://micropipes.com/blog/2011/03/29/welcome-to-the-landfill/</link>
		<comments>http://micropipes.com/blog/2011/03/29/welcome-to-the-landfill/#comments</comments>
		<pubDate>Tue, 29 Mar 2011 21:26:31 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[add-ons]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[L10n]]></category>
		<category><![CDATA[open web]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=196</guid>
		<description><![CDATA[Anyone who has tried to set up AMO knows it&#8217;s no walk in the park even with the respectable amount of documentation. There are two big stumbling blocks: the database is large and complex, and a portion of the site functionality is still in PHP. Django&#8217;s syncdb can make a database, but the relationships in [...]]]></description>
			<content:encoded><![CDATA[<p>Anyone who has tried to set up <abbr title="addons.mozilla.org">AMO</abbr> knows it&#8217;s no walk in the park even with the <a href="http://jbalogh.github.com/zamboni/topics/installation/">respectable amount of documentation</a>.  There are two big stumbling blocks:  the database is large and complex, and a portion of the site functionality is still in PHP.  Django&#8217;s <em>syncdb</em> can make a database, but the relationships in the data is what&#8217;s hard and trying to load fixtures from the test cases is an exercise in frustration since they may or may not all combine into a useful set of data.</p>
<p>With the launch of <a href="https://landfill.addons.allizom.org/">landfill.amo</a>[1] we bypass the entire headache.  The site started with a clean database and I uploaded an add-on to show it worked, but otherwise it&#8217;s empty.  It&#8217;s compact, fast, and simple to use.  The beauty of the site for volunteers and casual developers is that the database and the filesystem are <a href="https://landfill.addons.allizom.org/db/">available in their entirety to download</a>.  This means you can check out the code, fill in the configuration, import the landfill database and have the site 90% running.[2]</p>
<p>Perhaps a testament to the obscene number of open bugs for AMO right now, but this also solves a second <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=510430">long standing problem</a> where localizers couldn&#8217;t see the entire site.  On landfill, anyone can be an administrator, an editor, or any other permission level they&#8217;d like; and they&#8217;ll be able to see the entire site.</p>
<p>If you&#8217;ve been overwhelmed or frustrated trying to set up AMO in the past, now is a good time to give it another shot.  The landfill should just get better with age and use &#8211; if a few people register and add some data the available database dumps will get richer.</p>
<p>If there is a part of the site that isn&#8217;t working and you need it to be, let me know.  Keep in mind this is only the new Python code, so the few parts that are still on PHP (like the admin panel) won&#8217;t be available until they are ported.  Code is updated near-instantly on commit, localization changes are updated every 5 minutes.</p>
<p>[1] Forgive the fake certificate.  This is a sandbox for developers, y&#8217;all know what you&#8217;re doing. <img src='http://micropipes.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>[2] Honestly, 90% is really all you need.  We do a lot of stuff for scalability, statistics, etc. and unless you&#8217;re actually working on that part of the site, you don&#8217;t need those elements running.  Of course, you&#8217;re more than welcome to turn them on, I&#8217;m just trying to make it easy.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2011/03/29/welcome-to-the-landfill/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>High level perspective on the switch from PHP to Python</title>
		<link>http://micropipes.com/blog/2011/03/27/high-level-perspective-on-the-switch-from-php-to-python/</link>
		<comments>http://micropipes.com/blog/2011/03/27/high-level-perspective-on-the-switch-from-php-to-python/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 22:39:03 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[hindsight]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=206</guid>
		<description><![CDATA[It may be fatuous to write this post before we&#8217;ve actually finished the transition from PHP to Python, but I started writing a different post and this is what came out. Sometimes that happens. In January of 2010 we started migrating addons.mozilla.org from CakePHP to Django. It was a controversial decision. Developers were ambivalent to [...]]]></description>
			<content:encoded><![CDATA[<p><em>It may be fatuous to write this post before we&#8217;ve actually finished the transition from PHP to Python, but I started writing a different post and this is what came out.  Sometimes that happens.</em></p>
<p>In January of 2010 we started migrating <a href="https://addons.mozilla.org">addons.mozilla.org</a> from CakePHP to Django.  It was a controversial decision.  Developers were ambivalent to excited, managers were opposed to neutral &#8211; a split anyone would expect.  When I <a href="http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/">first talked about it</a> I expected to be able to turn off PHP by the end of the year.  It didn&#8217;t turn out quite like that.  </p>
<p>Fifteen months later we&#8217;re still transitioning and it&#8217;s still stressful.  The toughest part about a major migration like this is that there is only one team that is doing the migration, continuing to add the new features we need, and all the while maintaining the old site.  That&#8217;s a stressful environment for developers since the interactions between the languages can be complicated, it&#8217;s stressful for managers because features take longer to complete, and it&#8217;s stressful for users (and QA for that matter) because issues <em>will</em> arise which are hard to reproduce and complicated to explain.</p>
<p>In the midst of all the work of migration, the rest of the company is still working:  the security team is <a href="http://blog.mozilla.com/security/2010/12/14/adding-web-applications-to-the-security-bug-bounty-program/">announcing bounties on our site</a> which means we need to be vigilant about fixing issues, project management continues to come up with features to be added, the site perseveres in its never-ending quest for a new <em>look and feel</em>, and <a href="http://blog.mozilla.com/addons/2011/03/22/firefox-4-add-ons/">Firefox 4 is using AMO like never before</a> meaning approaching 10,000 hits per second is a regular day.  All of that is specific to the add-ons site, but consider your own company if you&#8217;re thinking of going down the same road &#8211; what is coming up for your site that will throw a wrench in the works?</p>
<p>The meat and potatoes of it really comes down to:  Given the hindsight of today, would the migration be a good idea?  There isn&#8217;t a right answer for every site, but for AMO we did the right thing[1].  As of today the majority of pages that matter are on Python &#8211; there are some admin tools, and some cron jobs, and the occasional semi-obsolete public page that is on PHP, but for the most part, we&#8217;re looking really good (<a href="https://spreadsheets.google.com/ccc?key=0AgX-nlaDaTaBdGhVd3ZlU1ZySWRiNmZ4YmgxTkV6ZlE&#038;hl=en">less hand waving, more real data</a>).  My new (overly optimistic?) plan is to have PHP off by the end of <em>this</em> year.  We&#8217;ll see.</p>
<p>To give you an idea of man-hours, we&#8217;ve had anywhere from 3 to 6 superhero developers working on the site over the past 15 months, and it&#8217;s looking like the whole thing will take around 24 months.  That&#8217;s a big chunk of time for a site that needs to grow and evolve as quickly as popular sites do these days.</p>
<p>So, overall, I think the lesson is: any reasonably sized site is going to have rabbit holes in it.  At first glance AMO might look like it&#8217;s got a dozen &#8220;main&#8221; pages, with a couple dozen more supporting pages (and throw in a few more for the admin CRUD).  Have a look at that spreadsheet I linked above and you&#8217;ll see that&#8217;s not even remotely the case.  The spreadsheet even ignores sub-pages in a few places and doesn&#8217;t include any new features added in the past year.  If you&#8217;re considering a migration, think it through well.  Make a spreadsheet of every URL, identify the complicated areas, and make sure everyone is clear on the timeline and what it means for new features.  People will absolutely try to scope creep your migration &#8211; make it clear if a section of the site is migrating as-is or can be migrated and redesigned at the same time.  Redesigns add complexity for the developers but can earn you some good will with the users and managers and if you&#8217;re in this boat you can use all the good will you can get.  </p>
<p>May you have the best of luck with your decisions. <img src='http://micropipes.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>[1] I&#8217;ll write another post about pros/cons of the actual frameworks and platforms.  Let&#8217;s just assume we&#8217;re happy with the technical side of the switch for now.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2011/03/27/high-level-perspective-on-the-switch-from-php-to-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Libraries to connect to a Citrix NetScaler or Zeus Traffic Manager</title>
		<link>http://micropipes.com/blog/2010/03/23/libraries-to-connect-to-a-citrix-netscaler-or-zeus-traffic-manager/</link>
		<comments>http://micropipes.com/blog/2010/03/23/libraries-to-connect-to-a-citrix-netscaler-or-zeus-traffic-manager/#comments</comments>
		<pubDate>Tue, 23 Mar 2010 18:56:57 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=133</guid>
		<description><![CDATA[The first front end cache we used on AMO was the Citrix NetScaler. I&#8217;ve complained about it&#8217;s API before but apparently never announced the library I wrote to purge items from the cache. So, a little late, but I have some reusable PHP code that will talk to your NetScaler and let you expire objects. [...]]]></description>
			<content:encoded><![CDATA[<p>The first front end cache we used on <a href="https://addons.mozilla.org/">AMO</a> was the <a href="http://www.citrix.com/English/ps2/products/product.asp?contentID=21679">Citrix NetScaler</a>.  I&#8217;ve <a href="http://micropipes.com/blog/2008/07/14/planning-your-api-is-important/">complained about it&#8217;s API before</a> but apparently never announced the library I wrote to purge items from the cache.  So, a little late, but I have <a href="http://viewvc.svn.mozilla.org/vc/libs/ns-api/">some reusable PHP code that will talk to your NetScaler</a> and let you expire objects.</p>
<p>We hit some limitations with the NetScaler that we weren&#8217;t happy with.  Cost aside, it ignored some pretty standard stuff like the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44">HTTP Vary Header</a>.  After working around that for years we switched to the horizontally scalable <a href="http://www.zeus.com/products/traffic-manager/index.html">Zeus Traffic Manager</a> (at that time, referred to as ZXTM).  We&#8217;ve been pleased with our choice and six months ago I wrote a <a href="http://viewvc.svn.mozilla.org/vc/libs/zxtm-api/">similar PHP library that allows you to connect to Zeus&#8217;s API</a>.  Time and priorities being what they are, we never implemented it in production.</p>
<p>Finally, the real point to this post, last night I wrote a <a href="http://github.com/clouserw/hera">python library that will expire content from Zeus</a>.  We&#8217;ll roll this into our migration and waiting on content to expire from Zeus should be a thing of the past.</p>
<p>As always, if you can use the libraries, feel free.  They all have READMEs with examples.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2010/03/23/libraries-to-connect-to-a-citrix-netscaler-or-zeus-traffic-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Maintaining localization between Python and PHP (it&#8217;s not fun)</title>
		<link>http://micropipes.com/blog/2010/03/08/maintaining-localization-between-python-and-php-its-not-fun/</link>
		<comments>http://micropipes.com/blog/2010/03/08/maintaining-localization-between-python-and-php-its-not-fun/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 22:42:00 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[L10n]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=115</guid>
		<description><![CDATA[I reached my hand into the barrel of problems our migration to Python is going to cause and came up with Localization. It figures. First out of the chute was the .po files. It turns out the actual formatting is different between the two languages. PHP uses %1$s for its substitutions, but python uses either [...]]]></description>
			<content:encoded><![CDATA[<p>I reached my hand into the barrel of problems <a href="http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/">our migration to Python</a> is going to cause and came up with Localization.  It figures.</p>
<p>First out of the chute was the .po files.  It turns out the actual formatting is different between the two languages.  PHP uses <em>%1$s</em> for its substitutions, but python uses either named variables like <em>(num)s</em> or integers like <em>{0}</em>.  For the record, they both support <em>%s</em> when you don&#8217;t need to order the substitutions.<br />
PHP example:<br />
<code>I have %2$s apples and %1$s oranges</code><br />
Python example:<br />
<code>I have {1} apples and {0} oranges</code></p>
<p>Since I&#8217;ve worked with the <a href="http://translate.sourceforge.net/wiki/">Translate Toolkit</a> before, I decided to write a script to convert between the two formats.  If you find yourself in the same unfortunate boat as me, behold<br />
<a href="http://translate.svn.sourceforge.net/viewvc/translate/src/trunk/translate/tools/phppo2pypo.py?view=markup">phppo2pypo</a> and <a href="http://translate.svn.sourceforge.net/viewvc/translate/src/trunk/translate/tools/pypo2phppo.py?view=markup">pypo2phppo</a> to convert between the two types.</p>
<p>Crisis averted, right?  Oh, that&#8217;s just scratching the surface.  Remember <a href="http://micropipes.com/blog/2008/07/09/adding-context-to-amo-po-files/">how happy I was that PHP finally started supporting msgctxt</a>?  Well, Python has had <a href="http://bugs.python.org/issue2504">a patch for it since 2008</a> but no one has bothered to land it.  I wrote a new <a href="http://github.com/clouserw/tower/blob/master/l10n/__init__.py">ugettext() and ungettext()</a> that recognizes context in the .po files.  To use simply do: <em>from l10n import ugettext as _</em> at the top of your file.</p>
<p>Along with adding msgctxt support, those two functions also collapse consecutive white space.  We&#8217;re using <a href="http://jinja.pocoo.org/2/">Jinja2</a> with <a href="http://babel.edgewall.org/">Babel</a> and the <a href="http://jinja.pocoo.org/2/documentation/extensions">i18n extension</a> as our template engine.  Jinja2 has a concept of stripping white space from the beginning or end of a string but does nothing about the middle.  A paragraph of text in a Jinja2 template would look like:<br />
<code><br />
  {% trans -%}Mozilla is providing links to these applications<br />
  as a courtesy, and makes no representations regarding the<br />
  applications or any information related thereto. Any questions,<br />
  complaints or claims regarding the applications must be<br />
  directed to the appropriate software vendor.<br />
  {%- endtrans %}<br />
</code></p>
<p>That&#8217;s a decent looking template, right?  Yeah, well, when Babel extracts that, it includes all the line breaks too, giving you something <a href="http://bitbucket.org/plurk/solace/src/tip/solace/i18n/messages.pot#cl-625">like this</a>.  The localizers would revolt if I sent them that, so I added in auto white-space collapsing.  Getting Babel to use the new functions means <a href="http://github.com/clouserw/tower/blob/master/tower/management/commands/extract.py">a new extraction script</a>.</p>
<p>At this point, we&#8217;re extracting strings from our new code and we can convert between Python and PHP files.  All we need now is a Frankenstein mix of xgettext functions to act as glue.  Meet the <a href="http://github.com/clouserw/tower/blob/master/l10n/management/commands/amalgamate.py">amalgamate script</a> that uses the pypo2php scripts, concatenates the .pot files, and merge updates each locales .po file.  After that it&#8217;s <a href="http://viewvc.svn.mozilla.org/vc?revision=63671&#038;view=revision">quick tweaks to the build scripts</a> to create z-messages.po files and we&#8217;re done.</p>
<p>So, all that said, the new process for L10n, while we&#8217;re in this transitional phase, is:</p>
<ol>
<li>From the PHP code, run <em>locale/extract-po-remora.sh</em>.  That pulls everything from all the PHP files, creates <em>locale/r-keys.pot</em>, updates the messages.po file for each locale, and compiles them.  Life used to be so simple.</li>
<li>From the python code, make sure you&#8217;re up to date, then run <em>./manage.py extract</em>.  That will pull everything from the python code and templates and create <em>locale/z-keys.pot</em>.</li>
<li>Run <em>./manage.py amalgamate</em>.  That will merge the z-keys.pot into the PHP messages.po files.</li>
<li>Localizers can make their changes as usual, and commit back to messages.po.</li>
<li>From PHP, <em>locale/copy-to-zamboni.py locale</em> will create z-messages.po files in the Python format. We could skip right to .mo files, but in case something goes wrong I want to see the .po files.</li>
<li>Then, like today, <em>locale/compile-mo.sh locale</em> will compile all the .po files.</li>
</ol>
<p>After all those steps are done, we&#8217;ve got duplicate .mo files, aside from formatting, and each application can look at its own .mo to get the strings it needs.  All this code is just a big band-aid and there are plenty of things that are more fun than juggling L10n between two applications across two <abbr title="Revision Control Systems">RCS</abbr>s.  But we knew what we were getting in to.  I&#8217;ll post something more positive later to help justify it. <img src='http://micropipes.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2010/03/08/maintaining-localization-between-python-and-php-its-not-fun/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>AMO Development Changes in 2010</title>
		<link>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/</link>
		<comments>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 21:44:12 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Git]]></category>
		<category><![CDATA[hindsight]]></category>
		<category><![CDATA[L10n]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[SVN]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=98</guid>
		<description><![CDATA[The AMO team met in Mountain View last week to develop a 2010 plan. We&#8217;ve been wanting to change some key areas of our development flow for a while but we needed to make sure time was budgeted in the overall AMO and Mozilla goals. As usual, the timeline will be tight, but the AMO [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="https://addons.mozilla.org/"><abbr title="addons.mozilla.org">AMO</abbr></a> team met in Mountain View last week to develop a 2010 plan.  We&#8217;ve been wanting to change some key areas of our development flow for a while but we needed to make sure time was budgeted in the overall AMO and Mozilla goals.  As usual, the timeline will be tight, but the AMO developers do amazing work and as our changes are implemented, development should just get faster.  I&#8217;ll give a brief summary of the changes we&#8217;re planning; a lot of discussion went into this and I&#8217;m not going to be able to cover everything here.  If you&#8217;ve been in the AMO calls or reading the notes you probably already know most of this.</p>
<h3>Migrating from CakePHP to Django</h3>
<p>This is a big undertaking and we&#8217;ve been discussing it for quite a while.  We&#8217;re currently the highest trafficked site on the internet using <a href="http://cakephp.org/">CakePHP</a> and along with that we&#8217;ve run into a lot of frustrating issues.  CakePHP has serviced AMO well for several years, so it&#8217;s not my intention to bad mouth it here, but I do want to give a fair summary of why we&#8217;re moving on.  Please also note that <em>AMO is still running on CakePHP 1.1 which is, I think, a year out of date</em>?  Three substantial issues:</p>
<ul>
<li><strong>Useful Database Abstraction Layer:</strong>  CakePHP has a concept of database abstraction, but we didn&#8217;t find it powerful enough.  When it did work it would return enormous nested arrays of data causing massive CPU and memory usage (out of memory errors plague us on AMO).  When it didn&#8217;t work, we&#8217;d end up doing queries directly which kind of defeats the purpose.  We couldn&#8217;t use prepared statements so we&#8217;d have to escape variables ourselves.  There was no effective caching built-in and since we just had huge arrays as a response there was no effective way to invalidate the cache we were using (see: <a href="http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/">Caching is easy; Expiration is hard</a>).  The DB layer should return objects that are easy to cache and easy to invalidate.  The built-in Django database classes (combined with memcache) should work fine for us here.</li>
<li><strong>Effective unit tests:</strong>  I&#8217;ve <a href="http://micropipes.com/blog/2009/04/09/addonsmozillaorg-celebrates-1000-passing-unit-tests/">beat the drum about our unit tests before</a> but the simple matter is that it&#8217;s really difficult to do them right with the tools we are using.  Our test data is already very limited, but if we try to run all our tests right now they&#8217;ll run out of memory (and take forever).  The CakePHP method of mocking controllers and models was inadequate for what we needed and difficult to deal with.  We want our unit tests to run quickly, from the command line, and be independent from each other so there aren&#8217;t intermittent problems to waste our time with.  We&#8217;ll be using Django&#8217;s <a href="http://docs.djangoproject.com/en/dev/topics/testing/">built-in testing framework</a>.</li>
<li><strong>Better debugging:</strong>  Debugging in CakePHP amounts to defining a DEBUG level and seeing what is printed on the screen (usually the giant arrays).  We supplemented this with <a href="http://www.xdebug.org/">Xdebug</a> where we needed it, but that&#8217;s still not enough.  A framework should have excellent logging and on-the-fly debugging that displays a full traceback (often something will fail deep within CakePHP and we&#8217;ll get the file/line where PHP gave up, but not the line in our code that started the problem), the values of variables, the page headers, server settings, SQL that was run, what views and elements are in use, etc.  We&#8217;re planning on using a combination of <a href="http://docs.python.org/library/pdb.html">pdb</a>, <a href="http://ipython.scipy.org/moin/">IPython</a>, and the <a href="http://robhudson.github.com/django-debug-toolbar/">django-debug-toolbar</a> to make all of this easily accessible while developing.</li>
</ul>
<p>Those are the major issues we&#8217;re having right now, but if you want to dig into the comparison some more check out our <a href="https://wiki.mozilla.org/AMO:v4">discussion wiki pages</a>, but realize the majority of discussion happened in person.</p>
<h3>Moving away from <abbr title="Subversion">SVN</abbr></h3>
<p>We moved AMO into SVN in 2006 and it&#8217;s treated us relatively well.  Somewhere along the line, we decided to tag our production versions at a revision of trunk instead of keeping a separate tag and merging changes into it.  It&#8217;s worked for us but it&#8217;s a hard cutoff on code changes, which means that while we&#8217;re in a code freeze no one can check anything in to trunk.  As we begin to branch for larger projects this will become more of a hassle, so I&#8217;m planning on going back to a system where a production tag is created and changes are merged into it as they are ready to go live.</p>
<p>Most of the development team has been using <a href="http://kernel.org/pub/software/scm/git/docs/git-svn.html">git-svn</a> for several months and, aside from the commands being far more verbose, we haven&#8217;t had many complaints.  We&#8217;ve discovered <a href="http://git-scm.com/">Git</a> is a much more powerful development tool and we expect to use it directly starting some time next year.  As of now, we expect to maintain the /locales/ directory in SVN so this change doesn&#8217;t affect localizers but we&#8217;ll keep people notified if there are any changes to that process.</p>
<h3>Continuous Integration</h3>
<p>I mentioned excellent testing being one of the reasons we&#8217;re moving to Django.  Along with that testing is the opportunity for continuous integration.  We plan on using <a href="https://hudson.dev.java.net/">Hudson</a> as the framework for our continuous integration.  With excellent test coverage and quick feedback from Hudson this should drastically lower our regressions and boost our confidence when we deploy.  Speaking of which&#8230;</p>
<h3>Faster Deployment</h3>
<p>For most of 2009 we&#8217;ve pushed on 3 week cycles.  2 weeks of development, 1 week of <abbr title="Quality Assurance">QA</abbr> and <abbr title="Localization">L10n</abbr>.  Delays and regressions being what they are, I think we averaged a little better than a push a month.  This is a fairly rapid cycle for a lot of development shops, but I feel like it&#8217;s holding us back.  We&#8217;ve heard a lot of success stories about shorter  cycles and I&#8217;d like to aim for deployment (optionally, of course) of a few times per week.  By shortening the development cycle we reduce the stress of:</p>
<ul>
<li><strong>the developers:</strong>  Everyone likes to see what they&#8217;ve done go out quicker and it means less conflicts with others when the patches are smaller.</li>
<li><strong>the QA team:</strong> Right now we dump 2 weeks of work on them and say we need it done right away.  With smaller cycles they can verify small changes as they go and not be overwhelmed.</li>
<li><strong>the infrastructure team:</strong> Smaller changes means less to go wrong and with a continuous integration server and some automation they can have minimal involvement with the whole process.</li>
<li><strong>the localizers:</strong> Every time we release we dump a bunch of changes on these fantastic people and tell them we need them back in a week.  Most of the time they plow forward and get them done on time.  If they don&#8217;t though, they are stuck with waiting for the next 3 week cycle.  If we push often, it&#8217;s not a big deal.</li>
<li><strong>the product managers:</strong> These guys come up with crazy ideas for us to implement and then they stare at graphs and numbers to see if it worked.  With shorter cycles they can get faster feedback about what works and what doesn&#8217;t.</li>
<li><strong>the users:</strong> Faster release cycles means bugs that are fixed in the repository are fixed on the live site sooner.  &#8217;nuff said.</li>
</ul>
<h3>Process Data Offline</h3>
<p>Much of AMO relies on cron jobs to get things done.  All the statistics, add-on download numbers, how popular an add-on is, all the star rating calculations, any cleanup or maintenance tasks &#8211; these are all run via cron and they are so intensive that the database has trouble keeping up.  We&#8217;re planning on utilizing <a href="http://gearman.org/">Gearman</a> to farm all this work out to other machines in incremental pieces instead of single huge queries.  Any heavy calculating that can be done offline will be moved to these external processors which should help improve the speed of the site and make all our statistics more reliable (as currently the cron jobs have a tendency to fail before they are complete).</p>
<h3>Improve the Documentation</h3>
<p>Documentation is a noble goal of many developers but it rarely gets enough attention.  We evaluated our <a href="https://wiki.mozilla.org/AMO:Developers">current documentation</a> and found it is woefully out of date.  By being on a wiki that is rarely used it doesn&#8217;t get updated except when someone tries to use it and sees it&#8217;s not right.  We&#8217;re hoping to change that by moving the developer documentation into the code repository itself.  We&#8217;ll be able to integrate with generated API docs, style the docs however we want, and check in changes right along with our code patches.  When someone checks out a copy of AMO, they&#8217;ll get all the documentation right along with it.  We&#8217;ll use <a href="http://sphinx.pocoo.org/">Sphinx</a> to build the docs.</p>
<p>The outline above details several large, high-level changes but there are a lot of other plans for smaller improvements as well.  This post got a lot longer than I was expecting, but I&#8217;m really excited about the direction AMO is headed for 2010.  As these changes are implemented the site will become more responsive and reliable, and we&#8217;ll be able to adapt to the needs of Mozilla&#8217;s users even faster.  As always, feedback and discussion are welcome and stay tuned for further back end improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/feed/</wfw:commentRss>
		<slash:comments>44</slash:comments>
		</item>
		<item>
		<title>How addons.mozilla.org defends against XSS attacks</title>
		<link>http://micropipes.com/blog/2009/02/23/how-addonsmozillaorg-defends-against-xss-attacks/</link>
		<comments>http://micropipes.com/blog/2009/02/23/how-addonsmozillaorg-defends-against-xss-attacks/#comments</comments>
		<pubDate>Mon, 23 Feb 2009 16:16:56 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=70</guid>
		<description><![CDATA[One of the things that gets a lot of news time these days is XSS. There are a lot of places that explain what it is and how to prevent it but most are oversimplified or don&#8217;t provide real world examples. I thought I&#8217;d explain a couple of the ways AMO attempts to prevent it. [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that gets a lot of news time these days is <abbr title="Cross Site Scripting">XSS</abbr>.  There are a lot of places that explain what it is and how to prevent it but most are oversimplified or don&#8217;t provide real world examples.  I thought I&#8217;d explain a couple of the ways <a href="https://addons.mozilla.org/"><abbr title="addons.mozilla.org">AMO</abbr></a> attempts to prevent it.</p>
<p>I&#8217;m not trying to invite attackers by posting this.  My goal is to provide a (hopefully) working example from a real world, high-traffic site.  I think the people exploiting XSS have a fairly good idea what they are doing and, too often, the people attempting to secure their sites don&#8217;t.  Since AMO is open source I&#8217;m not sharing anything that isn&#8217;t available already anyway (side note: please don&#8217;t depend on security by obscurity).  </p>
<p>Firstly, this chunk of code sits in CakePHP&#8217;s <a href="http://svn.mozilla.org/addons/trunk/site/app/config/bootstrap.php">bootstrap.php</a> and runs very close to the start of every request:</p>
<pre><code>
if (array_key_exists('url',$_GET) &#038;&#038;
    !preg_match('/\/api\//', $_GET['url']) &#038;&#038;
    preg_match('/[^\w\d\/\.\-_!: ]/u',$_GET['url'])) {
    header("HTTP/1.1 400 Bad Request");
    exit;
}</code></pre>
<p>Since a lot of XSS attacks are launched from the URL we implemented this simple white list of characters we&#8217;ll allow.  If anything outside of that white-list is in the URL we return an invalid request header and die.  This isn&#8217;t a lot of protection but it does narrow the field of what our application expects and has to deal with (particularly with control characters, high level ASCII, etc.).</p>
<p>The second, and more important section of code is in our <a href="http://svn.mozilla.org/addons/trunk/site/app/app_controller.php">app_controller class</a>.  We wrote a custom sanitize() function that any string going into one of our views gets run through:</p>
<pre class="php"><code>
$sanitize_patterns = array(
    'patterns'      => array("/%/u", "/\(/u", "/\)/u", "/\+/u", "/-/u"),
    'replacements'  => array("&amp;#37;", "&amp;#40;", "&amp;#41;", "&amp;#43;", "&amp;#45;")
    );

........

$data = iconv('UTF-8', 'UTF-8//IGNORE', $data);
$data = htmlspecialchars($data, ENT_QUOTES, 'UTF-8');
$data = preg_replace($sanitize_patterns['patterns'], $sanitize_patterns['replacements'], $data);
</code></pre>
<p>This code has several important parts and I&#8217;ll start with the functions.  The first function that modifies the actual data is <a href="http://php.oregonstate.edu/manual/en/function.iconv.php">iconv()</a>.  We ask it to convert our data from UTF-8 to UTF-8 which seems unnecessary but the &#8220;//IGNORE&#8221; part is important &#8211; that means it will throw out any characters it can&#8217;t represent appropriately.  This was added to prevent a proof of concept attack that exploited a <a href="http://en.wikipedia.org/wiki/C0_and_C1_control_codes">C0 ASCII control code</a> character to break the output (discovered on the <a href="http://sla.ckers.org/forum/">sla.ckers.org forums</a>).</p>
<p>The next function, <a href="http://php.oregonstate.edu/htmlspecialchars">htmlspecialchars()</a>, is a pretty well known function and converts special characters to their ASCII equivalents.  The second parameter specifically asks it to encode single quotes.</p>
<p>Lastly we use the array of patterns and replacements declared at the beginning to encode a few final symbols, like parenthesis and the percentage sign, into HTML entities.</p>
<p>This system has worked fairly well for a few years now and as issues are discovered we make changes to it.  If you&#8217;re looking for the latest code please be sure to check <a href="http://svn.mozilla.org/addons/trunk/">our repository</a>.  And, as always, if you find any kind of exploit on AMO please let me know! <img src='http://micropipes.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2009/02/23/how-addonsmozillaorg-defends-against-xss-attacks/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>AMO Scalability: Then and Now</title>
		<link>http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/</link>
		<comments>http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/#comments</comments>
		<pubDate>Fri, 18 Apr 2008 08:30:49 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/</guid>
		<description><![CDATA[Struggling with scalability on AMO is nothing new but the tools we use to solve the problems have changed over time. Here is a bit of information on the performance evolution AMO has gone through. I wanted to link to the wayback machine for all our old versions, but I get &#8220;Redirect Errors&#8221; for the [...]]]></description>
			<content:encoded><![CDATA[<p>Struggling with scalability on <abbr title="addons.mozilla.org">AMO</abbr> is nothing new but the tools we use to solve the problems have changed over time.  Here is a bit of information on the performance evolution AMO has gone through.  I wanted to link to the <a href="http://www.archive.org/web/web.php">wayback machine</a> for all our old versions, but I get &#8220;Redirect Errors&#8221; for the addons.mozilla.org domain.  I&#8217;ll have to make due with code repositories.</p>
<p><a href="http://mxr.mozilla.org/mozilla/source/webtools/update/">Version 1</a> of AMO wasn&#8217;t concerned with caching.  It was straight <abbr title="PHP: Hypertext Preprocessor">PHP</abbr> talking directly to a single MySQL box.  Short, easy, and not very scalable.</a></p>
<p><a href="http://lxr.mozilla.org/mozilla/source/webtools/addons/">Version 2</a> of AMO progressed through several caching systems.  The site used the <a href="http://www.smarty.net/">Smarty template engine</a> so our first step was to turn on the built in Smarty cache.  That didn&#8217;t give us the performance we needed, so <a href="http://morgamic.com/">Mike Morgan</a> started caching page output in <a href="http://pear.php.net/package/Cache_Lite"><abbr title="PHP Extension and Application Repository">PEAR</abbr>&#8216;s Cache_Lite</a>.  I don&#8217;t remember the specifics of this implementation since it was so short lived (less than a month), but the <a href="http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/webtools/addons/public/inc/finish.php&#038;rev=HEAD&#038;mark=1.7">CVS log</a>, mentions problems with &#8220;scalability in a clustered environment.&#8221;  Our next step was to store the same page output in <a href="http://www.danga.com/memcached/">memcached</a> instead of Cache_Lite which brought pretty satisfying results.  Thus began our abuse of memcached.</p>
<p>In addition to memcached and expanding the number of web servers it ran on, version 2 also boasted two other significant performance improvements. The first was the ability to talk to a slave database for read-only queries which, when combined with a load balancer, let us scale database servers horizontally.  The second was installing a <a href="http://www.citrix.com/english/ps2/products/product.asp?contentID=21679">NetScaler</a> in front of addons.mozilla.org giving us the benefits of a reverse proxy cache and <abbr title="Secure Socket Layer">SSL</abbr> offloading.  These changes bought us precious time when hoards of Firefox 1.5 users were clamoring for add-ons.  In fact, I&#8217;d say we were in pretty good shape at that point.</p>
<p>Fast forward to <a href="http://svn.mozilla.org/addons/trunk/">Version 3</a> (the current version).  We&#8217;ve expanded the memcache servers from one to two and instead of page output we&#8217;re storing database queries and their results.  We&#8217;re still using a single master database but are using two slaves now for read only queries.  There are several NetScalers around the world caching pages locally[1] for closer regions.  We&#8217;ve survived quite a while on this system but we&#8217;re starting to push the envelope again and we&#8217;re going to need to make some changes to be able to scale for Firefox 3 and still provide a good user experience.  I&#8217;ll write more about our plans as they develop.</p>
<p>[1] Users who are logged in to AMO don&#8217;t get the local caches &#8211; their connection is always to San Jose, CA.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>How my cookies became a one way street</title>
		<link>http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/</link>
		<comments>http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 19:27:47 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/</guid>
		<description><![CDATA[I&#8217;ve been playing with CakePHP&#8217;s session code lately and ran across an interesting (read: nerdy) problem with cookies on AMO. First, some background: The uniqueness of a cookie in the browser is determined by all the attributes when it&#8217;s set (not just it&#8217;s name). That means I can have multiple cookies named AMOv3 as long [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been playing with CakePHP&#8217;s session code lately and ran across an interesting (read: nerdy) problem with cookies on <a href="http://addons.mozilla.org/">AMO</a>.</p>
<p>First, some background:</p>
<p>The uniqueness of a cookie in the browser is determined by all the attributes when it&#8217;s set (not just it&#8217;s name).  That means I can have multiple cookies named <q>AMOv3</q> as long as another attribute (eg. path) is different when I set the second cookie.</p>
<p>According to <a href="http://www.faqs.org/rfcs/rfc2109.html">RFC 2109</a>:</p>
<blockquote><p> If multiple cookies satisfy the criteria [...] they are ordered in the Cookie header such that those with more specific Path attributes precede those with less specific.</p></blockquote>
<p>In PHP, however, cookies are indexed in the $_COOKIE variable by name, which means I can send several cookies with the same name and only the first cookie will show up. <sup>[1]</sup></p>
<hr />
<p>So, what&#8217;s this have to do with AMO?  For some reason, CakePHP hasn&#8217;t been consistent in the past regarding cookie paths. Sometimes they were <q>/</q> (which is correct), sometimes they were something else like <q>/en-US/firefox/users/</q> and sometimes users had multiple cookies with different paths.</p>
<p>From the <a href="http://php.oregonstate.edu/setcookie">PHP manual</a>:</p>
<blockquote><p>Cookies must be deleted with the same parameters as they were set with. If the value argument is an empty string, or FALSE, and all other arguments match a previous call to setcookie, then the cookie with the specified name will be deleted from the remote client. </p></blockquote>
<p>When a browser sends a cookie back to a server it only sends the name and value.  If a cookie was set with a path that I don&#8217;t know I have no way to remove that cookie!</p>
<p>To compound the problem, if a cookie with the same name has a more specific path, it shows up in $_COOKIE and there is no indication the other cookie even exists.  This means on the front page your session can be fine and after clicking on a deeper URL you&#8217;re suddenly requesting a session that has been expired long ago.</p>
<p>We rolled out a change to session handling last week that prevented myself and another developer from logging in because we had a mess of cookies with different paths.  I haven&#8217;t had a barrage of emails so I don&#8217;t think it&#8217;s affected many people, but if you have any troubles with logging in please let me know.  If you&#8217;re in a hurry, clearing your cookies for the addons.mozilla.org domain will fix any problems.</p>
<p>For the record, any new login cookies on AMO will expire when the browser closes which should prevent stale cookies from coming back to haunt us.</p>
<p>[1] If you need to get at all the cookies the browser is sending, you&#8217;ll have to dig through $_SERVER['HTTP_COOKIE'] manually.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

