<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>All Night Diner &#187; CakePHP</title>
	<atom:link href="http://micropipes.com/blog/tag/cakephp/feed/" rel="self" type="application/rss+xml" />
	<link>http://micropipes.com/blog</link>
	<description>because at 3am anything sounds good</description>
	<lastBuildDate>Mon, 03 May 2010 17:34:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>addons.mozilla.org ♥s unit tests.  Again.</title>
		<link>http://micropipes.com/blog/2010/05/03/addons-mozilla-org-%e2%99%a5s-unit-tests-again/</link>
		<comments>http://micropipes.com/blog/2010/05/03/addons-mozilla-org-%e2%99%a5s-unit-tests-again/#comments</comments>
		<pubDate>Mon, 03 May 2010 17:34:44 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=142</guid>
		<description><![CDATA[AMO has had an on-again off-again relationship with unit tests.  A little over a year ago we had a thousand unit tests that sort of, mostly, ran.  The problem is, PHP unit testing just isn&#8217;t as good as it should be.  CakePHP relies on SimpleTest, one of the main PHP test suites. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="https://addons.mozilla.org">AMO</a> has had an on-again off-again relationship with unit tests.  A little over a year ago we had <a href="http://micropipes.com/blog/2009/04/09/addonsmozillaorg-celebrates-1000-passing-unit-tests/">a thousand unit tests</a> that sort of, mostly, ran.  The problem is, PHP unit testing just isn&#8217;t as good as it should be.  CakePHP relies on <a href="http://www.simpletest.org/">SimpleTest</a>, one of the main PHP test suites.  It worked relatively well for a small number of tests, but as our suite grew, so did our troubles.</p>
<p>Our main issue was hitting a memory limit or the max execution time.  We hit the limits often for a variety of reasons, some legitimate bugs, and some because we tried to hack around things to make the tests run.  If we change the limits we affect the tests because they are running within the same environment.  There wasn&#8217;t really a concept of fixtures then, although it looks like <a href="http://bakery.cakephp.org/articles/view/testing-models-with-cakephp-1-2-test-suite">CakePHP has stepped up there</a>.  The simple test web runner was hard to use and the mock objects were sometimes a little too mocked and missing some attributes.</p>
<p>All in all it was a heroic effort to get that many tests, but we didn&#8217;t maintain it because they were so slow to write and difficult to run.  Testing can be a pain to write, sure, but it shouldn&#8217;t be a burden like that.  Enter <a href="http://docs.djangoproject.com/en/dev/topics/testing/">Django&#8217;s testing suite</a> (built on top of <a href="http://docs.python.org/library/unittest.html">Python&#8217;s unittest</a>).  It has most of our complaints handled out of the box.  It&#8217;s very well documented, considers a lot of aspects of testing, supports fixtures, a built-in client, etc.  It&#8217;s a well thought out framework to build tests on.</p>
<p>We&#8217;re being more vigilant about requiring tests this time around, but they also aren&#8217;t as frustrating to write.  When you write them they actually work and they stay working.  Most of what you want is built in already.  For example, I wrote the password reset form we needed on AMO in Django.  With CakePHP and SimpleTest I&#8217;d have no idea how to test that the email was actually working.  It&#8217;s apparently possible <a href="http://www.curioussymbols.com/simplemail/">with a SimpleTest add-on</a> and enough code that I have to scroll in my browser.  With Django&#8217;s test suite the actual code was 5 lines, 3 of which were assertions:</p>
<pre><code class="python">
    def test_request_success(self):
        self.client.post('/en-US/firefox/users/pwreset',
                        {'email': self.user.email})

        eq_(len(mail.outbox), 1)
        assert mail.outbox[0].subject.find('Password reset') == 0
        assert mail.outbox[0].body.find('pwreset/%s' % self.uidb36) > 0
</code></pre>
<p>With the power of the new test suite we&#8217;re once again writing and maintaining our unit tests &#8211; currently at around 390 tests and increasing steadily.  Plenty of people have written about why unit tests are important so I won&#8217;t belabor the point, but I will mention that it&#8217;s a great feeling to be able to commit something and be confident it hasn&#8217;t affected other parts of the site.  It&#8217;s almost as good of a feeling when you write your code and a completely different test fails pointing out a case that you didn&#8217;t even consider but one that would soak up developer time trying to debug down the road.  </p>
<p>Building on a foundation that takes testing seriously is great.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2010/05/03/addons-mozilla-org-%e2%99%a5s-unit-tests-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AMO Development Changes in 2010</title>
		<link>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/</link>
		<comments>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 21:44:12 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Git]]></category>
		<category><![CDATA[hindsight]]></category>
		<category><![CDATA[L10n]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[SVN]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=98</guid>
		<description><![CDATA[The AMO team met in Mountain View last week to develop a 2010 plan.  We&#8217;ve been wanting to change some key areas of our development flow for a while but we needed to make sure time was budgeted in the overall AMO and Mozilla goals.  As usual, the timeline will be tight, but [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="https://addons.mozilla.org/"><abbr title="addons.mozilla.org">AMO</abbr></a> team met in Mountain View last week to develop a 2010 plan.  We&#8217;ve been wanting to change some key areas of our development flow for a while but we needed to make sure time was budgeted in the overall AMO and Mozilla goals.  As usual, the timeline will be tight, but the AMO developers do amazing work and as our changes are implemented, development should just get faster.  I&#8217;ll give a brief summary of the changes we&#8217;re planning; a lot of discussion went into this and I&#8217;m not going to be able to cover everything here.  If you&#8217;ve been in the AMO calls or reading the notes you probably already know most of this.</p>
<h3>Migrating from CakePHP to Django</h3>
<p>This is a big undertaking and we&#8217;ve been discussing it for quite a while.  We&#8217;re currently the highest trafficked site on the internet using <a href="http://cakephp.org/">CakePHP</a> and along with that we&#8217;ve run into a lot of frustrating issues.  CakePHP has serviced AMO well for several years, so it&#8217;s not my intention to bad mouth it here, but I do want to give a fair summary of why we&#8217;re moving on.  Please also note that <em>AMO is still running on CakePHP 1.1 which is, I think, a year out of date</em>?  Three substantial issues:</p>
<ul>
<li><strong>Useful Database Abstraction Layer:</strong>  CakePHP has a concept of database abstraction, but we didn&#8217;t find it powerful enough.  When it did work it would return enormous nested arrays of data causing massive CPU and memory usage (out of memory errors plague us on AMO).  When it didn&#8217;t work, we&#8217;d end up doing queries directly which kind of defeats the purpose.  We couldn&#8217;t use prepared statements so we&#8217;d have to escape variables ourselves.  There was no effective caching built-in and since we just had huge arrays as a response there was no effective way to invalidate the cache we were using (see: <a href="http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/">Caching is easy; Expiration is hard</a>).  The DB layer should return objects that are easy to cache and easy to invalidate.  The built-in Django database classes (combined with memcache) should work fine for us here.</li>
<li><strong>Effective unit tests:</strong>  I&#8217;ve <a href="http://micropipes.com/blog/2009/04/09/addonsmozillaorg-celebrates-1000-passing-unit-tests/">beat the drum about our unit tests before</a> but the simple matter is that it&#8217;s really difficult to do them right with the tools we are using.  Our test data is already very limited, but if we try to run all our tests right now they&#8217;ll run out of memory (and take forever).  The CakePHP method of mocking controllers and models was inadequate for what we needed and difficult to deal with.  We want our unit tests to run quickly, from the command line, and be independent from each other so there aren&#8217;t intermittent problems to waste our time with.  We&#8217;ll be using Django&#8217;s <a href="http://docs.djangoproject.com/en/dev/topics/testing/">built-in testing framework</a>.</li>
<li><strong>Better debugging:</strong>  Debugging in CakePHP amounts to defining a DEBUG level and seeing what is printed on the screen (usually the giant arrays).  We supplemented this with <a href="http://www.xdebug.org/">Xdebug</a> where we needed it, but that&#8217;s still not enough.  A framework should have excellent logging and on-the-fly debugging that displays a full traceback (often something will fail deep within CakePHP and we&#8217;ll get the file/line where PHP gave up, but not the line in our code that started the problem), the values of variables, the page headers, server settings, SQL that was run, what views and elements are in use, etc.  We&#8217;re planning on using a combination of <a href="http://docs.python.org/library/pdb.html">pdb</a>, <a href="http://ipython.scipy.org/moin/">IPython</a>, and the <a href="http://robhudson.github.com/django-debug-toolbar/">django-debug-toolbar</a> to make all of this easily accessible while developing.</li>
</ul>
<p>Those are the major issues we&#8217;re having right now, but if you want to dig into the comparison some more check out our <a href="https://wiki.mozilla.org/AMO:v4">discussion wiki pages</a>, but realize the majority of discussion happened in person.</p>
<h3>Moving away from <abbr title="Subversion">SVN</abbr></h3>
<p>We moved AMO into SVN in 2006 and it&#8217;s treated us relatively well.  Somewhere along the line, we decided to tag our production versions at a revision of trunk instead of keeping a separate tag and merging changes into it.  It&#8217;s worked for us but it&#8217;s a hard cutoff on code changes, which means that while we&#8217;re in a code freeze no one can check anything in to trunk.  As we begin to branch for larger projects this will become more of a hassle, so I&#8217;m planning on going back to a system where a production tag is created and changes are merged into it as they are ready to go live.</p>
<p>Most of the development team has been using <a href="http://kernel.org/pub/software/scm/git/docs/git-svn.html">git-svn</a> for several months and, aside from the commands being far more verbose, we haven&#8217;t had many complaints.  We&#8217;ve discovered <a href="http://git-scm.com/">Git</a> is a much more powerful development tool and we expect to use it directly starting some time next year.  As of now, we expect to maintain the /locales/ directory in SVN so this change doesn&#8217;t affect localizers but we&#8217;ll keep people notified if there are any changes to that process.</p>
<h3>Continuous Integration</h3>
<p>I mentioned excellent testing being one of the reasons we&#8217;re moving to Django.  Along with that testing is the opportunity for continuous integration.  We plan on using <a href="https://hudson.dev.java.net/">Hudson</a> as the framework for our continuous integration.  With excellent test coverage and quick feedback from Hudson this should drastically lower our regressions and boost our confidence when we deploy.  Speaking of which&#8230;</p>
<h3>Faster Deployment</h3>
<p>For most of 2009 we&#8217;ve pushed on 3 week cycles.  2 weeks of development, 1 week of <abbr title="Quality Assurance">QA</abbr> and <abbr title="Localization">L10n</abbr>.  Delays and regressions being what they are, I think we averaged a little better than a push a month.  This is a fairly rapid cycle for a lot of development shops, but I feel like it&#8217;s holding us back.  We&#8217;ve heard a lot of success stories about shorter  cycles and I&#8217;d like to aim for deployment (optionally, of course) of a few times per week.  By shortening the development cycle we reduce the stress of:</p>
<ul>
<li><strong>the developers:</strong>  Everyone likes to see what they&#8217;ve done go out quicker and it means less conflicts with others when the patches are smaller.</li>
<li><strong>the QA team:</strong> Right now we dump 2 weeks of work on them and say we need it done right away.  With smaller cycles they can verify small changes as they go and not be overwhelmed.</li>
<li><strong>the infrastructure team:</strong> Smaller changes means less to go wrong and with a continuous integration server and some automation they can have minimal involvement with the whole process.</li>
<li><strong>the localizers:</strong> Every time we release we dump a bunch of changes on these fantastic people and tell them we need them back in a week.  Most of the time they plow forward and get them done on time.  If they don&#8217;t though, they are stuck with waiting for the next 3 week cycle.  If we push often, it&#8217;s not a big deal.</li>
<li><strong>the product managers:</strong> These guys come up with crazy ideas for us to implement and then they stare at graphs and numbers to see if it worked.  With shorter cycles they can get faster feedback about what works and what doesn&#8217;t.</li>
<li><strong>the users:</strong> Faster release cycles means bugs that are fixed in the repository are fixed on the live site sooner.  &#8217;nuff said.</li>
</ul>
<h3>Process Data Offline</h3>
<p>Much of AMO relies on cron jobs to get things done.  All the statistics, add-on download numbers, how popular an add-on is, all the star rating calculations, any cleanup or maintenance tasks &#8211; these are all run via cron and they are so intensive that the database has trouble keeping up.  We&#8217;re planning on utilizing <a href="http://gearman.org/">Gearman</a> to farm all this work out to other machines in incremental pieces instead of single huge queries.  Any heavy calculating that can be done offline will be moved to these external processors which should help improve the speed of the site and make all our statistics more reliable (as currently the cron jobs have a tendency to fail before they are complete).</p>
<h3>Improve the Documentation</h3>
<p>Documentation is a noble goal of many developers but it rarely gets enough attention.  We evaluated our <a href="https://wiki.mozilla.org/AMO:Developers">current documentation</a> and found it is woefully out of date.  By being on a wiki that is rarely used it doesn&#8217;t get updated except when someone tries to use it and sees it&#8217;s not right.  We&#8217;re hoping to change that by moving the developer documentation into the code repository itself.  We&#8217;ll be able to integrate with generated API docs, style the docs however we want, and check in changes right along with our code patches.  When someone checks out a copy of AMO, they&#8217;ll get all the documentation right along with it.  We&#8217;ll use <a href="http://sphinx.pocoo.org/">Sphinx</a> to build the docs.</p>
<p>The outline above details several large, high-level changes but there are a lot of other plans for smaller improvements as well.  This post got a lot longer than I was expecting, but I&#8217;m really excited about the direction AMO is headed for 2010.  As these changes are implemented the site will become more responsive and reliable, and we&#8217;ll be able to adapt to the needs of Mozilla&#8217;s users even faster.  As always, feedback and discussion are welcome and stay tuned for further back end improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/feed/</wfw:commentRss>
		<slash:comments>37</slash:comments>
		</item>
		<item>
		<title>Some considerations when adding Tags to AMO</title>
		<link>http://micropipes.com/blog/2009/03/02/some-considerations-when-adding-tags-to-amo/</link>
		<comments>http://micropipes.com/blog/2009/03/02/some-considerations-when-adding-tags-to-amo/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 23:43:12 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[L10n]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=75</guid>
		<description><![CDATA[Tags broke into the limelight around the time &#8220;Web 2.0&#8243; was becoming popularized.  They provided a simple but effective way to categorize objects and many sites are using them now.  Despite their proliferation, I haven&#8217;t found any documentation on the internet regarding standards for implementing tags.  
A tag library exists for CakePHP [...]]]></description>
			<content:encoded><![CDATA[<p>Tags broke into the limelight around the time &#8220;Web 2.0&#8243; was becoming popularized.  They provided a simple but effective way to categorize objects and many sites are using them now.  Despite their proliferation, I haven&#8217;t found any documentation on the internet regarding standards for implementing tags.  </p>
<p>A <a href="http://bakery.cakephp.org/articles/view/tag-cloud">tag library exists for CakePHP</a> but it, and many others, are too simplistic for what we want.</p>
<p>We&#8217;ve written our tagging goals into a plan but have some technical details we still need to figure out.  While reviewing what we have a couple questions arose that we thought people would have opinions on.</p>
<p>1) What should the range of allowed characters be?  Our first instinct was simplicity, something like <em>/[A-Za-z0-9-]/</em> (that is, all English letters and numbers and a dash).  This is easy to handle on our end but leaves out everyone that doesn&#8217;t want to add tags using the English alphabet.  There is some debate how useful it would be to allow other Unicode characters, particularly when you think about #2 below.</p>
<p>2) Tags are most useful when they are normalized.  By allowing Unicode characters we run the risk of diluting our tag cloud.  For example, resume and résumé are close enough that for our purposes they are equivalent.  If we allow Unicode we&#8217;ll have to deal with converting characters like é to e and vice versa for searches.  At that point we&#8217;ll need a list of &#8220;equivalent&#8221; characters &#8211; not impossible but it will slow things down (both development and speed of a search).  The second question is:  Assuming you think we should allow Unicode characters, what characters are equivalents?  Here is a quick idea from <a href="http://php.oregonstate.edu/manual/en/function.strtr.php">php.net&#8217;s strtr() documentation</a>:</p>
<pre><code class="php">
$a = 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûýýþÿŔŕ';
$b = 'aaaaaaaceeeeiiiidnoooooouuuuybsaaaaaaaceeeeiiiidnoooooouuuyybyRr';
</code></pre>
<p>Some other aspects of our current plan are:</p>
<ul>
<li>Tags are not localizable in the same way as other strings on the site (like categories).  There isn&#8217;t anything stopping someone from using &#8220;WebDev&#8221; as a tag or creating a new tag with &#8220;WebDev&#8221; translated in their language.  However, there won&#8217;t be any relationship between the two translated tags.</li>
<li>Tags are separated by spaces.  Spaces within tags are allowed with quotes.</li>
<li>Spaces will be preserved when displaying a tag on the add-on&#8217;s page, however, they will be removed for displaying the tag in a URL and for doing logical operations on the back end like searching.  This means searching for &#8220;Portland OR&#8221; will actually be collapsed to &#8220;PortlandOR&#8221; and will match either &#8220;Portland OR&#8221; or &#8220;PortlandOR&#8221; tags.  This is consistent with <a href="http://flickr.com/">flickr</a>.</li>
<li>If unicode is allowed we&#8217;ll preserve characters as they are entered even if we are actually searching on their &#8220;equivalents.&#8221;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2009/03/02/some-considerations-when-adding-tags-to-amo/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>How addons.mozilla.org defends against XSS attacks</title>
		<link>http://micropipes.com/blog/2009/02/23/how-addonsmozillaorg-defends-against-xss-attacks/</link>
		<comments>http://micropipes.com/blog/2009/02/23/how-addonsmozillaorg-defends-against-xss-attacks/#comments</comments>
		<pubDate>Mon, 23 Feb 2009 16:16:56 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=70</guid>
		<description><![CDATA[One of the things that gets a lot of news time these days is XSS.  There are a lot of places that explain what it is and how to prevent it but most are oversimplified or don&#8217;t provide real world examples.  I thought I&#8217;d explain a couple of the ways AMO attempts to [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that gets a lot of news time these days is <abbr title="Cross Site Scripting">XSS</abbr>.  There are a lot of places that explain what it is and how to prevent it but most are oversimplified or don&#8217;t provide real world examples.  I thought I&#8217;d explain a couple of the ways <a href="https://addons.mozilla.org/"><abbr title="addons.mozilla.org">AMO</abbr></a> attempts to prevent it.</p>
<p>I&#8217;m not trying to invite attackers by posting this.  My goal is to provide a (hopefully) working example from a real world, high-traffic site.  I think the people exploiting XSS have a fairly good idea what they are doing and, too often, the people attempting to secure their sites don&#8217;t.  Since AMO is open source I&#8217;m not sharing anything that isn&#8217;t available already anyway (side note: please don&#8217;t depend on security by obscurity).  </p>
<p>Firstly, this chunk of code sits in CakePHP&#8217;s <a href="http://svn.mozilla.org/addons/trunk/site/app/config/bootstrap.php">bootstrap.php</a> and runs very close to the start of every request:</p>
<pre><code>
if (array_key_exists('url',$_GET) &#038;&#038;
    !preg_match('/\/api\//', $_GET['url']) &#038;&#038;
    preg_match('/[^\w\d\/\.\-_!: ]/u',$_GET['url'])) {
    header("HTTP/1.1 400 Bad Request");
    exit;
}</code></pre>
<p>Since a lot of XSS attacks are launched from the URL we implemented this simple white list of characters we&#8217;ll allow.  If anything outside of that white-list is in the URL we return an invalid request header and die.  This isn&#8217;t a lot of protection but it does narrow the field of what our application expects and has to deal with (particularly with control characters, high level ASCII, etc.).</p>
<p>The second, and more important section of code is in our <a href="http://svn.mozilla.org/addons/trunk/site/app/app_controller.php">app_controller class</a>.  We wrote a custom sanitize() function that any string going into one of our views gets run through:</p>
<pre class="php"><code>
$sanitize_patterns = array(
    'patterns'      => array("/%/u", "/\(/u", "/\)/u", "/\+/u", "/-/u"),
    'replacements'  => array("&amp;#37;", "&amp;#40;", "&amp;#41;", "&amp;#43;", "&amp;#45;")
    );

........

$data = iconv('UTF-8', 'UTF-8//IGNORE', $data);
$data = htmlspecialchars($data, ENT_QUOTES, 'UTF-8');
$data = preg_replace($sanitize_patterns['patterns'], $sanitize_patterns['replacements'], $data);
</code></pre>
<p>This code has several important parts and I&#8217;ll start with the functions.  The first function that modifies the actual data is <a href="http://php.oregonstate.edu/manual/en/function.iconv.php">iconv()</a>.  We ask it to convert our data from UTF-8 to UTF-8 which seems unnecessary but the &#8220;//IGNORE&#8221; part is important &#8211; that means it will throw out any characters it can&#8217;t represent appropriately.  This was added to prevent a proof of concept attack that exploited a <a href="http://en.wikipedia.org/wiki/C0_and_C1_control_codes">C0 ASCII control code</a> character to break the output (discovered on the <a href="http://sla.ckers.org/forum/">sla.ckers.org forums</a>).</p>
<p>The next function, <a href="http://php.oregonstate.edu/htmlspecialchars">htmlspecialchars()</a>, is a pretty well known function and converts special characters to their ASCII equivalents.  The second parameter specifically asks it to encode single quotes.</p>
<p>Lastly we use the array of patterns and replacements declared at the beginning to encode a few final symbols, like parenthesis and the percentage sign, into HTML entities.</p>
<p>This system has worked fairly well for a few years now and as issues are discovered we make changes to it.  If you&#8217;re looking for the latest code please be sure to check <a href="http://svn.mozilla.org/addons/trunk/">our repository</a>.  And, as always, if you find any kind of exploit on AMO please let me know! <img src='http://micropipes.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2009/02/23/how-addonsmozillaorg-defends-against-xss-attacks/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Caching is easy; Expiration is hard</title>
		<link>http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/</link>
		<comments>http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/#comments</comments>
		<pubDate>Wed, 23 Apr 2008 16:36:00 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/</guid>
		<description><![CDATA[Still on a high from our success with memcached in AMO version 2, we decided to go a fairly common route and cache query results in version 3.  This performs admirably particularly with our rediculously long and slow queries.  Over time, though, the popularity of the site and the load on the servers [...]]]></description>
			<content:encoded><![CDATA[<p>Still on a high from <a href="http://micropipes.com/blog/2008/04/18/amo-scalability-then-and-now/">our success with memcached</a> in <abbr title="addons.mozilla.org">AMO</abbr> version 2, we decided to go a fairly common route and cache query results in version 3.  This performs admirably particularly with our <a href="http://blog.mozilla.com/webdev/2007/04/18/teaching-cakephp-to-be-multilingual-part-3/">rediculously long and slow queries</a>.  Over time, though, the popularity of the site and the load on the servers climb, and soon we&#8217;re looking at slowness issues again.  On a rough day we decided to increase the expiration timeout for our queries in memcached from a minute to around an hour.  This gives the database servers some breathing room but causes excessive delay on the AMO site when add-ons are updated and things like <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=425315" title="Implement full-time cache with instant invalidation">bug 425315</a> and it&#8217;s friends are born.  Weird things happen when parts of a site expire at different times and consequently user experience (particularly add-on developers) suffers.</p>
<p>The problem we&#8217;re running into in the bug linked above is knowing when to expire a cache.  Consider when an add-on author updates the summary of their add-on.  We know we&#8217;ll have to flush the queries on that page out of memcached, and that&#8217;s easy enough, but what about all the other places the summary is used?  Search results pages, add-on detail pages, recommended lists, the <abbr title="Application Programming Interface">API</abbr>, etc.  Now we&#8217;ve got to figure out the queries used on those pages and expire them too.  Suddenly I&#8217;m wishing we were caching objects in memcached instead of queries.</p>
<p>I looked in to other ways to use memcached and they all have their pros and cons.  Caching entire pages means we&#8217;d have to store different versions for a person that is logged in vs. logged out and also what permissions they had (pages have different options for localizers, admins, developers, etc.).  Caching objects is attractive, but the way <a href="http://cakephp.org">CakePHP</a> does queries makes this a non-option (namely, it&#8217;s not asking objects for values, it does joins directly on the db).  Directly caching queries seems like the best fit because we can affect just the parts of the pages we want and it will work with CakePHP&#8217;s current system&#8230;just as soon as we figure out how to relate updating a row to all of it&#8217;s associated queries.</p>
<p>I attached an idea to the bug but regardless of the process we use, figuring out how to implement a full time cache that we can expire on the fly is going to be an important step in keeping the AMO site usable as our traffic increases.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/04/23/caching-is-easy-expiration-is-hard/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>CakePHP makes upgrading easy</title>
		<link>http://micropipes.com/blog/2008/04/01/cakephp-makes-upgrading-easy/</link>
		<comments>http://micropipes.com/blog/2008/04/01/cakephp-makes-upgrading-easy/#comments</comments>
		<pubDate>Wed, 02 Apr 2008 05:32:53 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/04/01/cakephp-makes-upgrading-easy/</guid>
		<description><![CDATA[Laura attended CakeFest a couple months ago and got to meet some core Cake developers in person.  In doing so she let slip that AMO was running on a pretty old version (1.1.12 &#8211; Released in December of 2006).  Apparently 1.1.15 had some major performance boosts and since we melted the cluster a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.laurathomson.com/">Laura</a> attended <a href="http://cakefest.org/">CakeFest</a> a couple months ago and got to meet some core Cake developers in person.  In doing so she let slip that <abbr title="addons.mozilla.org">AMO</abbr> was running on a pretty old version (1.1.12 &#8211; Released in December of 2006).  Apparently 1.1.15 had some major performance boosts and since we melted the cluster a few times recently (the <a href="http://wiki.mozilla.org/Update:Remora_API_Docs">new <abbr title="Application Programming Interface">API</abbr></a> was the culprit) we thought it would be a good idea to investigate upgrading.</p>
<p>I downloaded 1.1.15 and 1.1.19, set up a couple symlinks to swap them into my dev copy and looked at how hard it would be to upgrade.</p>
<p>Hats off to the CakePHP developers.  After merging in a short patch to the core CakePHP session code, most of the site worked right out of the box.  I made a few minor tweaks to our code for things that had changed (and filed <a href="https://trac.cakephp.org/ticket/4140">ticket 4140</a> for them) but all in all it was pretty painless.  Nice work guys.  It shows a lot of planning and foresight to make a framework that doesn&#8217;t have a bunch of code-breaking changes after years of development.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/04/01/cakephp-makes-upgrading-easy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How my cookies became a one way street</title>
		<link>http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/</link>
		<comments>http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 19:27:47 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/</guid>
		<description><![CDATA[I&#8217;ve been playing with CakePHP&#8217;s session code lately and ran across an interesting (read: nerdy) problem with cookies on AMO.
First, some background:
The uniqueness of a cookie in the browser is determined by all the attributes when it&#8217;s set (not just it&#8217;s name).  That means I can have multiple cookies named AMOv3 as long as [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been playing with CakePHP&#8217;s session code lately and ran across an interesting (read: nerdy) problem with cookies on <a href="http://addons.mozilla.org/">AMO</a>.</p>
<p>First, some background:</p>
<p>The uniqueness of a cookie in the browser is determined by all the attributes when it&#8217;s set (not just it&#8217;s name).  That means I can have multiple cookies named <q>AMOv3</q> as long as another attribute (eg. path) is different when I set the second cookie.</p>
<p>According to <a href="http://www.faqs.org/rfcs/rfc2109.html">RFC 2109</a>:</p>
<blockquote><p> If multiple cookies satisfy the criteria [...] they are ordered in the Cookie header such that those with more specific Path attributes precede those with less specific.</p></blockquote>
<p>In PHP, however, cookies are indexed in the $_COOKIE variable by name, which means I can send several cookies with the same name and only the first cookie will show up. <sup>[1]</sup></p>
<hr />
<p>So, what&#8217;s this have to do with AMO?  For some reason, CakePHP hasn&#8217;t been consistent in the past regarding cookie paths. Sometimes they were <q>/</q> (which is correct), sometimes they were something else like <q>/en-US/firefox/users/</q> and sometimes users had multiple cookies with different paths.</p>
<p>From the <a href="http://php.oregonstate.edu/setcookie">PHP manual</a>:</p>
<blockquote><p>Cookies must be deleted with the same parameters as they were set with. If the value argument is an empty string, or FALSE, and all other arguments match a previous call to setcookie, then the cookie with the specified name will be deleted from the remote client. </p></blockquote>
<p>When a browser sends a cookie back to a server it only sends the name and value.  If a cookie was set with a path that I don&#8217;t know I have no way to remove that cookie!</p>
<p>To compound the problem, if a cookie with the same name has a more specific path, it shows up in $_COOKIE and there is no indication the other cookie even exists.  This means on the front page your session can be fine and after clicking on a deeper URL you&#8217;re suddenly requesting a session that has been expired long ago.</p>
<p>We rolled out a change to session handling last week that prevented myself and another developer from logging in because we had a mess of cookies with different paths.  I haven&#8217;t had a barrage of emails so I don&#8217;t think it&#8217;s affected many people, but if you have any troubles with logging in please let me know.  If you&#8217;re in a hurry, clearing your cookies for the addons.mozilla.org domain will fix any problems.</p>
<p>For the record, any new login cookies on AMO will expire when the browser closes which should prevent stale cookies from coming back to haunt us.</p>
<p>[1] If you need to get at all the cookies the browser is sending, you&#8217;ll have to dig through $_SERVER['HTTP_COOKIE'] manually.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/03/11/how-my-cookies-became-a-one-way-street/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When is a TINYINT(1) not a TINYINT(1)?</title>
		<link>http://micropipes.com/blog/2008/03/07/when-is-a-tinyint1-not-a-tinyint1/</link>
		<comments>http://micropipes.com/blog/2008/03/07/when-is-a-tinyint1-not-a-tinyint1/#comments</comments>
		<pubDate>Fri, 07 Mar 2008 23:50:30 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[o rly]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/03/07/when-is-a-tinyint1-not-a-tinyint1/</guid>
		<description><![CDATA[When you&#8217;re using CakePHP!
Turns out CakePHP considers a TINYINT(1) to be a Boolean.  Judging from all the support tickets that have been filed, I&#8217;m not the first person to get taken off guard by this behavior.  When I asked about it on IRC, the response was that since MySQL considers a TINYINT(1) to [...]]]></description>
			<content:encoded><![CDATA[<p>When you&#8217;re using <a href="http://cakephp.org/">CakePHP</a>!</p>
<p>Turns out CakePHP considers a TINYINT(1) to be a Boolean.  Judging from all <a href="https://trac.cakephp.org/ticket/1253">the</a> <a href="https://trac.cakephp.org/ticket/3903">support</a> <a href="https://trac.cakephp.org/ticket/4026">tickets</a> that have been filed, I&#8217;m not the first person to get taken off guard by this behavior.  When I asked about it on IRC, the response was that since MySQL considers a TINYINT(1) to be a Boolean, CakePHP does too.  That&#8217;s not true.</p>
<p>From the <a href="http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html">MySQL manual</a>:</p>
<blockquote><p>As of MySQL 5.0.3, a BIT data type is available for storing bit-field values. (Before 5.0.3, MySQL interprets BIT as TINYINT(1).)</p></blockquote>
<p>That&#8217;s saying if I request a BIT it will make it a TINYINT, not if I request a TINYINT it will make it a BIT.  Having a framework change the definitions of my database columns sounds crazy to me, but judging from <a href="https://trac.cakephp.org/ticket/1253">the ticket filed in 2006</a> this has been CakePHP&#8217;s policy for a long time.  Despite the long standing precedent I can&#8217;t find any documentation about it online other than the closing remarks of those support tickets.  </p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/03/07/when-is-a-tinyint1-not-a-tinyint1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Frameworks that start sessions for every visitor make me sad</title>
		<link>http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/</link>
		<comments>http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/#comments</comments>
		<pubDate>Thu, 06 Mar 2008 07:00:48 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/</guid>
		<description><![CDATA[I might have played the devil&#8217;s advocate when Lars was hating on frameworks at the barcamp last weekend, but that doesn&#8217;t mean I don&#8217;t see his point.  The latest in a series of frustrations with frameworks kept me up until 3am last night.  What better way to cap it off than complaining on [...]]]></description>
			<content:encoded><![CDATA[<p>I might have played the devil&#8217;s advocate when <a href="http://staff.osuosl.org/~lohnk/blog/">Lars</a> was hating on frameworks at the <a href="http://barcamp.org/BeaverBarCamp">barcamp</a> last weekend, but that doesn&#8217;t mean I don&#8217;t see his point.  The latest in a series of frustrations with frameworks kept me up until 3am last night.  What better way to cap it off than complaining on the internet?</p>
<p>Today&#8217;s subject is anonymous sessions.  Frameworks (and developers) love them because they are simple and convenient, but it comes at a cost.  Keeping track of sessions for every visitor on a high traffic site is far too expensive to be practical.  Developers should know how to work around this, but their frameworks need to support them.</p>
<p>The first framework on my mind is <a href="http://drupal.org/">Drupal</a>.  I filed <a href="http://drupal.org/node/201122">an issue</a> last year that Drupal should support disabling anonymous sessions.  It&#8217;s still unassigned so I&#8217;m guessing it&#8217;s not a high priority, but it was one of the main things that made me choose not to use Drupal on mozilla.com.  I <a href="http://drupal.org/node/183006">wrote some ideas</a> on how to handle it and got some responses from people suffering the same fate.  No word on any progress though.</p>
<p>The second framework, <a href="http://cakephp.org/">CakePHP</a>, has an AUTO_SESSION variable that, <a href="http://micropipes.com/blog/2008/01/07/cakephps-cache-that-wouldnt-quit/">just like $cacheQueries</a>, is far to easy to misplace faith in.</p>
<p>By setting AUTO_SESSION to false, you can&#8217;t read or write to the session.  Working as advertised?  Not so much.  If you take a closer look at what&#8217;s actually happening you&#8217;ll see that the session is still getting started, it&#8217;s just that CakePHP is blocking your access to it.  Even with AUTO_SESSION off, a cookie with a unique ID is set, and <strong>a row is still inserted into the sessions table</strong>.  That last part almost brought down <a href="https://addons.mozilla.org/">AMO</a> last night.  I wrote <a href="http://viewvc.svn.mozilla.org/vc/addons/trunk/site/cake/libs/controller/components/session.php?r1=10970&#038;r2=10969&#038;pathrev=10970">a patch that disables anonymous sessions for real</a>, but anyone that has talked to me about patching core code knows I don&#8217;t like to do it.</p>
<p>When you&#8217;re writing code, framework or not, don&#8217;t forget about scalability.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/03/06/frameworks-that-start-sessions-for-every-visitor-make-me-sad/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>CakePHP&#8217;s cache that wouldn&#8217;t quit</title>
		<link>http://micropipes.com/blog/2008/01/07/cakephps-cache-that-wouldnt-quit/</link>
		<comments>http://micropipes.com/blog/2008/01/07/cakephps-cache-that-wouldnt-quit/#comments</comments>
		<pubDate>Mon, 07 Jan 2008 22:26:46 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[CakePHP]]></category>
		<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/01/07/cakephps-cache-that-wouldnt-quit/</guid>
		<description><![CDATA[I had the joy of debugging some unit tests the other day on AMO and ran into caching trouble.  Turns out the bug for this was filed over a year ago, but I tested it in the latest build of Cake (1.1.18.5850) and it&#8217;s still not fixed.

Cake&#8217;s models have a Boolean variable called $cacheQueries [...]]]></description>
			<content:encoded><![CDATA[<p>I had the joy of debugging some unit tests the other day on <a href="http://addons.mozilla.org"><abbr title="addons.mozilla.org">AMO</abbr></a> and ran into caching trouble.  Turns out the <a href="https://trac.cakephp.org/ticket/1915">bug for this</a> was filed over a year ago, but I tested it in the latest build of Cake (1.1.18.5850) and it&#8217;s still not fixed.</p>
<hr />
<p>Cake&#8217;s models have a Boolean variable called $cacheQueries which I misplaced my faith in early on.  Turns out this does disable query caching&#8230;sometimes.</p>
<p>The test I was writing was pretty straight forward: Look in the database for some information to make sure it wasn&#8217;t there, add the info, look in the database to make sure it was there.  You can <a href="http://svn.mozilla.org/addons/trunk/site/app/tests/controllers/editors_controller.test.php">see the actual code</a>, but this example is simplified:</p>
<pre><code>
1. $this->Addon->cacheQueries = false; // Disable the built-in caching.
2. $ret = $this->Addon->query("SELECT addon_id FROM `addons_tags` WHERE addon_id={$this->testdata['addonid']}");
3. $this->assertEmpty($ret, 'Data exists!'); // this should be empty
4. $this->Addon->doSomething($this->testdata['addonid']);
5. $ret = $this->Addon->query("SELECT addon_id FROM `addons_tags` WHERE addon_id={$this->testdata['addonid']}");
6. $this->assertNotEmpty($ret, 'Data exists!'); // The array should have info in it
</code></pre>
<p>The problem is, line 5 always returned an empty array.  I turned the DEBUG mode to 2 so Cake would print out all the queries it was doing and discovered it never actually made the second SELECT call.  I smell query caching!</p>
<p>Let&#8217;s dig through some CakePHP code to figure out what&#8217;s up:</p>
<p>$this->Addon->query() is passed through AppModel::query() to <a href="http://api.cakephp.org/model__php4_8php-source.html#l01334">Model::query()</a> and eventually works it&#8217;s way down to <a href="http://api.cakephp.org/dbo__source_8php-source.html#l00177">DboSource::query()</a> with the same arguments.  This is where things go south.  Near the top of that function you&#8217;ll see:</p>
<pre><code>
    if (count($args) == 1) {
        return $this->fetchAll($args[0]);
</code></pre>
<p>The function signature for <a href="http://api.cakephp.org/dbo__source_8php-source.html#l00290">DboSource::fetchAll()</a> looks like:</p>
<pre class="php"><code>
    function fetchAll($sql, $cache = true, $modelName = null)
</code></pre>
<p>You can see the second parameter is a Boolean for caching and by default it&#8217;s on.  That&#8217;s the fly in my clam chowder!</p>
<p>So, two ways around it:</p>
<p>The first way is to scroll up about 20 lines and look at the end of DboSource::query().  There is a handler there for a second parameter to your original query() function.  Pass in false and voila, things just work.  Of course, I didn&#8217;t realize this until I was writing this post. I&#8217;m pretty sure it&#8217;s not documented anywhere unless you look at the code.  </p>
<p>The second way is to use execute() instead of query().  If you look at <a href="http://api.cakephp.org/dbo__source_8php-source.html#l00153">DboSource::execute()</a> you&#8217;ll see it suffers no caching and is as close to a direct SQL call as you can get.</p>
<p>The <a href="http://manual.cakephp.org/chapter/models">CakePHP manual</a> has this to say about query() and execute():</p>
<blockquote><p>Custom SQL calls can be made using the model&#8217;s query() and execute() methods. The difference between the two is that query() is used to make custom SQL queries (the results of which are returned), and execute() is used to make custom SQL commands (which require no return value).</p></blockquote>
<p>Apparently there are some other differences as well.  (The fact that execute() returns results seems counter to what the manual suggests in the paragraph above but I may just be misinterpreting what they mean.)  Regardless, if your tests are failing double check that you&#8217;re not fighting the cache and save yourself some time.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/01/07/cakephps-cache-that-wouldnt-quit/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
