Add-on Statistics Status

Add-on statistics have been intermittent for a couple months and are just recently getting the attention they need.

Our current process is to count download statistics once per day and update ping statistics once per week (update pings are a sampling of the complete set). The reliability of the script generating these statistics has been falling as our data size has grown and we've had several bugs filed regarding the numbers it's produced. Most of the time they are relatively small fixes and the script continued to limp along.

Currently we're facing questionable results in both sets of statistics (bug 468570 for update pings, bug 472538 for download counts). I've been debugging the update pings script and despite solving some problems we're continuing to see the script fail to run properly.

Parallel to AMO development, Daniel Einspanjer has been working on a larger statistics parser that will aggregate data from many Mozilla sites into a dashboard with easy visualizations. It turns out he's already processing the AMO logs and pulling out more data than us more often and in less time.

With a system like that available it doesn't make sense for us to continue to develop (and, in this case heavily modify) our local statistics scripts. With that in mind, our next steps are:

  1. Verify the results we (used to) get with the AMO scripts match those of the new system
  2. Create a transformation script to push the data from Daniel's project to the AMO database
  3. Turn off the AMO scripts
  4. Back fill statistics through at least November 15th, 2008 to replace our flailing stats. If the comparisons in step 1 reveal miscounting from before that we'll back fill as far as we need to.

These steps will let us meet the immediate goal of getting the statistics we offer now to be reliable and complete. In the future we can look at pulling additional data from the new metrics system. The target date to switch to the new system is the end of next week, Jan 31 2009. Once we make the switch we can evaluate how long the parsing takes and give an estimate of how long back filling will take. As always, let me know if there are any concerns.

Update 2009-02-02: We compared the scripts' results and found a discrepancy among add-ons that have significant external download numbers. The current stats script verified the GUID matches and then counted the update. The new stats script verified the GUID and the version before counting the update. This means if a specific version isn't hosted on AMO the new script doesn't count it. I think the current method of verifying only the GUID is more useful to authors and the new script is being changed. That means we'll have to re-run and re-compare the numbers (a single day is taking about 5 hours now). Other numbers are showing early promise. I'll continue to update as we progress.


wild thought! can there not be some distributed approach to aggregating data, etc. so that volunteers can help. Such approach as in case of wikia (not any more), some scientific projects, etc.
-- Mrinal, 22 Jan 2009
We've got plenty of processing power now. The AMO script was written (like many things) before it had to deal with so much data so it's pretty inefficient. Once the new system is connected we'll be able to grab update pings every day instead of just every week with no troubles.
-- Wil Clouser, 22 Jan 2009
yay! - can't wait!
-- Jay Meattle, 27 Jan 2009
Well, at least the new script will keep the trash out of the stats - for Adblock Plus I see version numbers like "{{VERSION}}", "0.0", "", "true", "3.0.3" etc. Those versions have never been released, that's people who created their own builds. On the other hand, having the usage data for development builds (which are not available on AMO) is useful.
-- Wladimir Palant, 06 Feb 2009
@Wladimir: Daniel is modifying the script so it doesn't filter those out. I felt it was useful for authors to know what versions were out there, official or not.
-- Wil Clouser, 06 Feb 2009

Post a comment

Feel free to email me with any comments, and I'll be happy to post them on the articles. This is a static site so it's not automatic.