Some considerations when adding Tags to AMO

Tags broke into the limelight around the time "Web 2.0" was becoming popularized. They provided a simple but effective way to categorize objects and many sites are using them now. Despite their proliferation, I haven't found any documentation on the internet regarding standards for implementing tags. A tag library exists for CakePHP but it, and many others, are too simplistic for what we want. We've written our tagging goals into a plan but have some technical details we still need to figure out. While reviewing what we have a couple questions arose that we thought people would have opinions on. 1) What should the range of allowed characters be? Our first instinct was simplicity, something like /[A-Za-z0-9-]/ (that is, all English letters and numbers and a dash). This is easy to handle on our end but leaves out everyone that doesn't want to add tags using the English alphabet. There is some debate how useful it would be to allow other Unicode characters, particularly when you think about #2 below. 2) Tags are most useful when they are normalized. By allowing Unicode characters we run the risk of diluting our tag cloud. For example, resume and résumé are close enough that for our purposes they are equivalent. If we allow Unicode we'll have to deal with converting characters like é to e and vice versa for searches. At that point we'll need a list of "equivalent" characters - not impossible but it will slow things down (both development and speed of a search). The second question is: Assuming you think we should allow Unicode characters, what characters are equivalents? Here is a quick idea from php.net's strtr() documentation:

Differentiate Bugzilla emails?

Bugzilla is an awesome bug tracker that is used by hundreds of companies. I've got accounts on several projects' trackers and I'm sure many others do also. When I get mail from Bugzilla it's not obvious which project it's from. My email client (GMail) only shows the "from name" so all I see for these projects is: Mozilla: bugzilla-daemon Pootle: bugzilla-daemon Miro: bugzilla kernel.org: bugme-daemon Apache: bugzilla Wouldn't it make sense to differentiate each projects' emails in the from name? Maybe even by default (something like "%SITE_NAME% Bugzilla")? Reed says it's a personal problem because his mail client shows the full address. Am I the only one? :(

How addons.mozilla.org defends against XSS attacks

One of the things that gets a lot of news time these days is XSS. There are a lot of places that explain what it is and how to prevent it but most are oversimplified or don’t provide real world examples. I thought I’d explain a couple of the ways AMO attempts to prevent it.

Verbatim Server Downtime

Translate Toolkit 1.3.0 was released a few days ago. I was following along with trunk on my development box and I wanted to upgrade our alpha install to take advantage of the new features (namely, speed improvements) and the django framework. I attempted this tonight and it was not a pretty upgrade (or install, for that matter). Among the medley of problems is Django ticket #6548. Django assumes it's not behind an SSL proxy so when it does any redirects it doesn't use https. This means logging in and logging out work on our server but the user is presented with a jarring "bad request" interstitial. The current status is that user accounts are not migrated and, even if they were, I can't seem to set permissions for projects. Since there are some odd problems that we haven't seen elsewhere and this is an alpha install I'm going to leave it as is and debug some of the issues over the next few days. Expect downtime. If there are questions visit #verbatim on irc.mozilla.org.

Add-on Statistics Status (part 2)

This is the second update about add-ons' statistics. Read part one. Statistics for both update pings and download counts have been updated beginning with February 1 through today, February 6th. Some notes: New statistics are stored in UTC and data processing happens shortly after the logs close. This means you can expect new data at around 8pm PST or shortly after. Download numbers will drop dramatically. They have been recorded incorrectly[1] for the past several weeks. Bug 472538 has more details. We'll begin replacing statistics back to 2008-11-15 over the next few weeks as processing time allows. An aside that you may not know: When Firefox looks for an update to an add-on we count that as an "update ping." If it finds the update it will hit releases.mozilla.org directly for the new add-on. That means that in your current stats numbers updates are not counted as downloads, or another way, "download counts" are the counts of someone actually clicking the "Install Now" button on addons.mozilla.org. Since we're pulling these statistics from a team dedicated to crunching numbers we're getting richer and more reliable data now. This frees up our time to fix existing stats bugs and also to add additional data views (like what locale your users are using). Good things are coming; keep an eye on your stats! Update 2008-02-07: HP issued a critical alert regarding potential data loss which affected our servers. Our IT team applied the fix but upon restart discovered it's been way too long since the file system had fsck run on it. Since there is so much data on the system it will take several more hours to finish, then IT will restore log files, and then we can begin to process the stats for this weekend. In short, stats won't be current for another day or two. [1] The technical reason is that Firefox does 2 or 3 GET requests to a server when it installs an add-on. The filter we had to remove duplicate requests was broken.