<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>All Night Diner &#187; open web</title>
	<atom:link href="http://micropipes.com/blog/tag/open-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://micropipes.com/blog</link>
	<description>because at 3am anything sounds good</description>
	<lastBuildDate>Mon, 03 May 2010 17:34:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Differentiate Bugzilla emails?</title>
		<link>http://micropipes.com/blog/2009/03/02/differentiate-bugzilla-emails/</link>
		<comments>http://micropipes.com/blog/2009/03/02/differentiate-bugzilla-emails/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 17:29:57 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[open web]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=74</guid>
		<description><![CDATA[Bugzilla is an awesome bug tracker that is used by hundreds of companies.  I&#8217;ve got accounts on several projects&#8217; trackers and I&#8217;m sure many others do also.
When I get mail from Bugzilla it&#8217;s not obvious which project it&#8217;s from.  My email client (GMail) only shows the &#8220;from name&#8221; so all I see for [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.bugzilla.org/">Bugzilla is an awesome bug tracker</a> that is used by <a href="http://www.bugzilla.org/installation-list/">hundreds of companies</a>.  I&#8217;ve got accounts on several projects&#8217; trackers and I&#8217;m sure many others do also.</p>
<p>When I get mail from Bugzilla it&#8217;s not obvious which project it&#8217;s from.  My email client (GMail) only shows the &#8220;from name&#8221; so all I see for these projects is:</p>
<p><a href="https://bugzilla.mozilla.org/">Mozilla</a>: bugzilla-daemon<br />
<a href="http://bugs.locamotion.org/">Pootle</a>: bugzilla-daemon<br />
<a href="http://bugzilla.pculture.org/">Miro</a>: bugzilla<br />
<a href="http://bugzilla.kernel.org/">kernel.org</a>: bugme-daemon<br />
<a href="https://issues.apache.org/bugzilla/">Apache</a>: bugzilla</p>
<p>Wouldn&#8217;t it make sense to differentiate each projects&#8217; emails in the from name?  Maybe even by default (something like &#8220;%SITE_NAME% Bugzilla&#8221;)?</p>
<p>Reed says it&#8217;s a personal problem because his mail client shows the full address.  Am I the only one? <img src='http://micropipes.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2009/03/02/differentiate-bugzilla-emails/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Thoughts on branching an open source project</title>
		<link>http://micropipes.com/blog/2008/10/17/thoughts-on-branching-an-open-source-project/</link>
		<comments>http://micropipes.com/blog/2008/10/17/thoughts-on-branching-an-open-source-project/#comments</comments>
		<pubDate>Fri, 17 Oct 2008 23:42:29 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[hindsight]]></category>
		<category><![CDATA[open web]]></category>
		<category><![CDATA[Verbatim]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/?p=59</guid>
		<description><![CDATA[I think any good manager will tell you that looking back over the choices you&#8217;ve made is an important step to improvement.  In an effort to improve myself (and help anyone in a similar situation) I wrote this post with a few thoughts about branching an open source project (in this case branching Pootle [...]]]></description>
			<content:encoded><![CDATA[<p>I think any good manager will tell you that looking back over the choices you&#8217;ve made is an important step to improvement.  In an effort to improve myself (and help anyone in a similar situation) I wrote this post with a few thoughts about branching an open source project (in this case branching <a href="http://translate.sourceforge.net/wiki/pootle/index?redirect=1">Pootle</a> to make <a href="https://wiki.mozilla.org/Verbatim">Verbatim</a>).  My goal is not to criticize anyone&#8217;s past decisions, including mine, but just to review the pros and cons and what I would do differently in the future.  So, a few thoughts late on a Friday:</p>
<p>When I started the planning for branching Pootle I was very focused on scalability (or lack thereof) and most of my initial goals were to improve that including replacing flat files with mysql, creating a cacheable URL structure, etc.  In hindsight, I should have realized that this project wasn&#8217;t going to be getting nearly the traffic load some of our other sites were getting and my priorities were out of order.  What I should have been thinking about was usability and interface improvements.  Due to my lack of foresight the project launched with enhancements in both areas but I think the time we spent on scalability was premature and the user interface suffered.</p>
<p>Whether it&#8217;s writing more comments than code or making sure meetings have agendas I&#8217;m a huge fan of communication.  When branching a project, particularly when there are plans to merge the branch back into trunk, communication is vital.  I think <a href="https://wiki.mozilla.org/Verbatim:Meeting_Notepad">our meetings</a> are productive but communication on a smaller scale is still a struggle.  Both Pootle and Verbatim ended up writing the same code in a few cases which could have easily been avoided.  In this particular case the timezones can make it difficult to synchronize but it&#8217;s something I&#8217;ll work at more in the future.</p>
<p>Something we have done a good job with is making a schedule and updating it with new developments.  I really want to expand our effort here though.  I think one of the difficulties of someone joining a project like this is direction; what are the goals of the project and how are we getting there?  Several of us have talked about it on IRC and we all have a good general idea but for someone that isn&#8217;t as involved it&#8217;s hard to follow.  Once we get over the next big hump (replacing <a href="http://jtoolkit.sourceforge.net/">jToolkit</a>) I think this will begin to fall into place with smaller bugs/features revealing themselves and providing a way for volunteers to get footholds on the project as a whole.</p>
<p>Lastly, it might be obvious, but if you&#8217;re planning on maintaining the branch or merging back to trunk make sure you get along with the lead developers.  I&#8217;m fortunate to work with the Pootle developers who clearly care deeply about the project.  From talking to them it&#8217;s obvious they have the end users&#8217; best interests in mind and are excited that we can all work together to improve the end product.  And really, that&#8217;s what open source is all about and it&#8217;s great to be a part of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/10/17/thoughts-on-branching-an-open-source-project/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t Settle for Mediocrity on the Web</title>
		<link>http://micropipes.com/blog/2008/02/25/dont-settle-for-mediocrity-on-the-web/</link>
		<comments>http://micropipes.com/blog/2008/02/25/dont-settle-for-mediocrity-on-the-web/#comments</comments>
		<pubDate>Mon, 25 Feb 2008 17:07:49 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[personal]]></category>
		<category><![CDATA[open web]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/02/25/dont-settle-for-mediocrity-on-the-web/</guid>
		<description><![CDATA[When I browse the web looking to purchase a service, I find there are two pretty distinct kinds of sites.  One feels like it was made in the early 90&#8217;s:  it&#8217;s mostly functional, almost renders correctly, and has the odd combination of distracting images and colors we thought was a good idea back [...]]]></description>
			<content:encoded><![CDATA[<p>When I browse the web looking to purchase a service, I find there are two pretty distinct kinds of sites.  One feels like it was made in the early 90&#8217;s:  it&#8217;s mostly functional, almost renders correctly, and has the odd combination of distracting images and colors we thought was a good idea back then.  The second kind of site is like a breath of fresh air:  navigation that flows, easy to read content, effective images, and a severe discrimination against stuff like &lt;marquee&gt;.</p>
<p>Why the disconnect?  Did we suddenly have a budget for planning a website beyond the back of a napkin?  Did graphic designers figure out how to translate their pen and paper skills onto the web?  Did the web slowly evolve into something that could provide a canvas for more than just plain text and drawing boxes with <abbr title="American Standard Code for Information Interchange">ASCII</abbr>?  Well, yeah &#8211; all the above.  But what&#8217;s on my mind right now is the mental attitude &#8211; what I see as the mindset of the 90&#8217;s.</p>
<p>I suspect the first kind of site is purely legacy and only exists because of habit.  It&#8217;s maintained out of habit, it has a budget out of habit, and people visit out of habit.  If the site launched today with zero users it would probably remain that way until it was retired as a failure.  These are not pleasing sites for either the current users or the new visitors.  The only things these sites have going for them are momentum and division.  </p>
<p>Their momentum is driven by recognition &#8211; the sites have been around so long people either know about them or they show up first in search results.  This is a valuable position to be in but it&#8217;s not permanent and without proper maintenance will change.</p>
<p>The other leg of their shaky foundation, how they retain users, is the division.  They separate themselves from the crowd by giving their users  just enough reason not to leave.  Often that reason is that people have already invested so much time, money, and energy putting information into the site that they don&#8217;t want to leave even if another site is substantially better.  Not only does this hurt the user, it hurts the web (and thus, all of us).  When people continue to use mediocre sites it continues to send the message that it&#8217;s OK to not improve and to not add value.</p>
<p>The second style of sites &#8211; the breath of fresh air style &#8211; is the result of a newer way of thinking.  I&#8217;m not just talking about colors that don&#8217;t hurt your eyes.  I&#8217;m talking about a fundamental shift of viewpoints.  Integrate with other sites?  Sure.  Provide useful content in forms that other sites can consume?  You bet.  Publish <abbr title="application programming interface">API</abbr>&#8217;s that let people do things you haven&#8217;t thought of?  Let users export their data and do what they want with it?  Share your content with a permissive license?  Hell yeah.</p>
<p>This is what we should be demanding as users of the web.  Give us accessible content.  Give us thought-out workflow.  Give us options.  Give us value, and we&#8217;ll choose you.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/02/25/dont-settle-for-mediocrity-on-the-web/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>10000 commits and going strong</title>
		<link>http://micropipes.com/blog/2008/02/06/10000-commits-and-going-strong/</link>
		<comments>http://micropipes.com/blog/2008/02/06/10000-commits-and-going-strong/#comments</comments>
		<pubDate>Wed, 06 Feb 2008 08:34:24 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[open web]]></category>
		<category><![CDATA[SVN]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2008/02/06/10000-commits-and-going-strong/</guid>
		<description><![CDATA[Mozilla&#8217;s SVN repository was started on September 2nd, 2006 and just hit 10000 commits.  That&#8217;s an average of over 19 commits a day for 520 days straight!
After my positive experience with python I was gearing up for a script to do some repository analysis when I ran across MPY SVN Stats.  After a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://svn.mozilla.org/">Mozilla&#8217;s SVN repository</a> was started on September 2nd, 2006 and just hit 10000 commits.  That&#8217;s an average of over 19 commits a day for 520 days straight!</p>
<p>After my <a href="http://micropipes.com/blog/2008/01/02/the-most-worthless-bot-on-irc/">positive experience with python</a> I was gearing up for a script to do some repository analysis when I ran across <a href="http://mpy-svn-stats.berlios.de/">MPY SVN Stats</a>.  After a fast download and a one line command I had charts and tables full of info.  So, here&#8217;s some late night stat work for everyone:</p>
<p>We had a total of 103 people that committed code directly to SVN, 69 of which had 10 or more commits.  The top 25 committers by total numbers of commits are:</p>
<blockquote><pre>
No    Author                                Commits    Percentage
1  	wclouser#mozilla.com                1558       15.58%
2 	fligtar#gmail.com                   739        7.39%
3 	steven#silverorange.com             707        7.07%
4 	reed#reedloden.com                  639        6.39%
5 	pascal.chevrel#mozilla-europe.org   567        5.67%
6 	fwenzel#mozilla.com                 551        5.51%
7 	nelson#wordmaster.org               524        5.24%
8 	mkaply#us.ibm.com                   481        4.81%
9 	paul#glaxstar.com                   392        3.92%
10 	mgalli#mgalli.com                   364        3.64%
11 	morgamic#mozilla.com                311        3.11%
12 	dougt#meer.net                      307        3.07%
13 	michael.koch#enough.de              179        1.79%
14 	ahajdukewycz#mozilla.com            173        1.73%
15 	shaver#mozilla.com                  162        1.62%
16 	reed#mozilla.com                    158        1.58%
17 	abuchanan#mozilla.com               112        1.12%
18 	dougt#mozilla.com                   105        1.05%
19 	eshepherd#mozilla.com               101        1.01%
20 	erik#raincitystudios.com            97         0.97%
21 	mfinkle#mozilla.com                 89         0.89%
22 	robert#accettura.com                89         0.89%
23 	smalolepszy#aviary.pl               82         0.82%
24 	oremj#mozilla.com                   76         0.76%
25 	tim.babych#gmail.com                58         0.58%
</pre>
</blockquote>
<p>My numbers here (wclouser#mozilla.com) are inflated because I did a lot of the branching/tagging on our projects.  If you throw my number out as an anomaly you&#8217;ll see that no single person has committed more than 7.5% of the code in SVN.  That&#8217;s a great community hard at work!</p>
<p>Thanks for all your help!</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2008/02/06/10000-commits-and-going-strong/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Mozilla24 is coming up!</title>
		<link>http://micropipes.com/blog/2007/09/10/mozilla24-is-coming-up/</link>
		<comments>http://micropipes.com/blog/2007/09/10/mozilla24-is-coming-up/#comments</comments>
		<pubDate>Mon, 10 Sep 2007 21:13:45 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[open web]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2007/09/10/mozilla24-is-coming-up/</guid>
		<description><![CDATA[Mozilla24 is a worldwide conference about technology and the future of the web.  I won&#8217;t duplicate the about page, but check out the line up of speakers.  
Whether you can attend in person or just visit online it should be able to offer something for anyone interested in the open web.  If [...]]]></description>
			<content:encoded><![CDATA[<p>Mozilla24 is a worldwide conference about technology and the future of the web.  I won&#8217;t duplicate the about page, but check out the <a href="http://www.mozilla24.com">line up of speakers</a>.  </p>
<p>Whether you can attend in person or just visit online it should be able to offer something for anyone interested in the open web.  If you&#8217;ve got some free time at the end of this week, <a href="http://www.mozilla24.com/en-US/get_involved/">get involved</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2007/09/10/mozilla24-is-coming-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ten Tips for Website Localization</title>
		<link>http://micropipes.com/blog/2007/07/26/ten-tips-for-website-localization/</link>
		<comments>http://micropipes.com/blog/2007/07/26/ten-tips-for-website-localization/#comments</comments>
		<pubDate>Thu, 26 Jul 2007 16:44:33 +0000</pubDate>
		<dc:creator>Wil Clouser</dc:creator>
				<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[L10n]]></category>
		<category><![CDATA[open web]]></category>

		<guid isPermaLink="false">http://micropipes.com/blog/2007/07/26/ten-tips-for-website-localization/</guid>
		<description><![CDATA[This post has some general tips that I&#8217;d recommend to anyone wanting to write a multilingual web application.  The majority of my code these days is PHP, but I think these tips are applicable to most web programming languages.  In no particular order:
UTF-8 is your friend.  Use it.
The big step from ASCII [...]]]></description>
			<content:encoded><![CDATA[<p>This post has some general tips that I&#8217;d recommend to anyone wanting to write a multilingual web application.  The majority of my code these days is <abbr title="PHP: Hypertext Preprocessor">PHP</abbr>, but I think these tips are applicable to most web programming languages.  In no particular order:</p>
<h3><abbr title="Universal Transformation Format">UTF</abbr>-8 is your friend.  Use it.</h3>
<p>The big step from <abbr title="American Standard Code for Information Interchange">ASCII</abbr> to Unicode was the potential to use multiple bytes to represent a single character.  With ASCII, each character was given a number between 0 and 255 and that&#8217;s all the programmer could use.  If another character needed to be shown, the numbers were reused and a different font was loaded.  If people didn&#8217;t have the same fonts, they got errors or undefined results.</p>
<p>Enter Unicode and UTF-8.  With the creation of UTF-8, characters from all over the world are assigned numbers between 0 and 1,114,111.  This is fantastic news, because you can store text in many different languages without having to worry about specific encodings.  This also means you can support language fall back for sections of your web page.  If you have 80% of your page translated for a specific language, you can fall back to an alternative, all while using the same encoding.  <em>(an aside: Be sure to use lang=&#8221;" attributes in your HTML tags if you&#8217;re changing the language mid-stream).</em></p>
<h3>Don&#8217;t concatenate strings</h3>
<p>Code like this makes me sad, and will make your localizers cry (or quit):</p>
<pre class="php"><code>
$item = "toast";

// Example one - This is bad
echo _("Sometimes I eat")." {$item} "._("and sometimes I don't.");

// Example two - This is better
echo sprintf(_("Sometimes I eat %s and sometimes I don't."), $item);
</code></pre>
<p>In the first example, chances are good the localizer will get the list of strings to translate and the two separate calls to _() will look like two different sentences with no context around them.  Firstly, the phrases by themselves make no sense, and secondly, a localizer needs to be able to look at an entire sentence (and sometimes more) to understand how to translate it most effectively.  </p>
<p>The second example uses the printf() standard %s to let the localizer know you&#8217;ll be substituting a string into the middle of the sentence.  This is the current best practice for creating sentences with variables.  Depending on what the string is, they may still be upset, but that&#8217;s out of scope for this tip (<a href="http://en.wikipedia.org/wiki/Declension">here&#8217;s a hint though</a>).</p>
<h3>Don&#8217;t use machine translation</h3>
<p>In recent years great progress has been made towards programmatically translating documents from language to language.  That said, it is <em>far</em> from being an acceptable replacement for a fluent translator.  The edge cases and &#8220;what ifs&#8221; on the technical/logical side of the translation are enough for me to say that, but when you start talking about potentially offensive translations (that&#8217;s the next tip) this is a definite requirement.  Just look at an example of <a href="http://translate.google.com/translate?u=http%3A%2F%2Fwww.germnews.de%2F&#038;langpair=de%7Cen&#038;hl=EN&#038;safe=off&#038;ie=UTF-8&#038;oe=UTF-8&#038;prev=%2Flanguage_tools">an automated German to English translation</a>.  It&#8217;s readable but it&#8217;s far from polished &#8211; not something you want as a first impression to your site.</p>
<h3>Be culturally sensitive</h3>
<p>If you&#8217;re not very familiar with your target culture ask for an opinion from someone who is (or hire a localizer who is).  Seemingly innocent words, phrases, and images could be misunderstood by another culture.  If you use terminology that is only understood in your region or culture, the best case you can hope for is that a visitor to your site just won&#8217;t understand and will ignore it, but it really reduces your credibility and the overall enjoyment of visiting your site.</p>
<h3>Use multi-byte functions</h3>
<p>This may be a little PHP specific, but it&#8217;s good to be aware of it in any language.  PHP has <a href="http://php.oregonstate.edu/manual/en/ref.strings.php">string functions</a> and <a href="http://php.oregonstate.edu/manual/en/ref.mbstring.php">multibyte string functions</a>.  The latter functions support characters that fill up more than one byte (ie. UTF-8 characters).  This is essential when manipulating strings with letters outside of the Latin alphabet.  If you&#8217;re not using PHP, at the least, verify your programming language will manipulate multi-byte strings correctly.</p>
<h3>Separate your views from  your logic</h3>
<p>I&#8217;m a fan of <abbr title="Model View Controller">MVC</abbr> separation, but there are plenty of <a href="http://en.wikipedia.org/wiki/Architectural_pattern_%28computer_science%29">other architectural patterns</a>.  Depending on what process and software you use for localization you may be giving template files to localizers.  If that&#8217;s the case, the simpler the better &#8211; you don&#8217;t need a bunch of complex code around the strings waiting to be translated.  Even if you&#8217;re using a method that doesn&#8217;t require giving template files to localizers, updating strings is easier, and whoever does maintenance on your software in the future will thank you.</p>
<h3>Use (meaningful) placeholder text</h3>
<p>This one might be a little controversial and is <a href="http://www.gnu.org/software/gettext/">gettext</a> specific.  The documented and recommended way to use gettext is to pass an English string to the gettext() function.  This serves two purposes:  It lets the localizer see the complete English string when they are translating, and it let&#8217;s gettext fall back to English if a translation isn&#8217;t available.</p>
<p>I&#8217;m suggesting using a substitute string in place of the English string.  For example, instead of _(&#8220;Error: Your cart is full!&#8221;) I would use _(&#8220;error_cart_full&#8221;).  English translations are done in the .po file, just like every other locale.  By following this rule, it&#8217;s possible to change the English text, without affecting the other translations.  Using the documented method means that even adding a comma means changing every locale&#8217;s .po file and then recompiling them all.  If you&#8217;ve got localizers watching for changes on their files (through a shared repository) this means they have to check and verify any changes &#8211; it&#8217;s a hassle and it&#8217;s time consuming for everyone involved.</p>
<p>The first purpose I mentioned, seeing English strings, can be duplicated by running `<em>msgattrib &#8211;set-fuzzy $file1 | msgmerge -NUs $file2</em>` where $file1 is the updated en-US .po file, and $file2 is the outdated .po file from another locale.  This will merge the English strings into the other locale, but will mark them as fuzzy, so gettext will ignore them until they are translated.</p>
<p>The second purpose can be addressed just by making sure the strings you&#8217;re trying to use are available.  If you need to use a new English string on the site, and the localizer is unavailable, you can temporarily move the fall back logic into your code:</p>
<pre><code>
// This is a temporary fix!
if (_("string_to_translate") == "string_to_translate") {
  // Print the English string
} else {
  // Print the translated string
}
</code></pre>
<p>While we&#8217;re on the subject of .po files, useful comments should be added to the file wherever appropriate to help provide context and hints for localizers.</p>
<h3>Be aware of word length</h3>
<p>Words in different languages have different lengths &#8211; words in Asian languages generally have fewer characters than English, and German words, more.  When designing the layout for your site, bear this in mind.  Don&#8217;t hard code widths to elements holding text &#8211; the words should be able to flow and expand as necessary.  This can be tough with today&#8217;s complex sites, but <abbr title="Cascading Style Sheets">CSS</abbr> will go a long way to help.  Also, when accepting user input, don&#8217;t put unneeded arbitrary length restrictions on the input.</p>
<h3>Don&#8217;t use graphics as text</h3>
<p>This is just a good idea in general, but it makes even more sense when localizing pages.  Creating images is time consuming and has more potential for error.  Using an appropriate encoding and employing CSS should get close to the same effect (with an extra point for accessibility).  If you need to use an image, be prepared to accept localized strings and make the image yourself &#8211; localizers may not have the time, skills, or software they need to create the images.</p>
<h3>Be aware of how changing the locale can affect strings</h3>
<p>Setting the LC_ALL variable doesn&#8217;t just change the formatting of strings &#8211; it also changes currency formatting, time/date formatting, how things are sorted, and what symbols represent numbers/lettters/etc.  Some Examples:</p>
<pre></code>
setlocale(LC_ALL, 'fr_FR');
$num = 1.5;
var_dump($num); // Prints 1.5
echo $num; // Prints 1,5
</code></pre>
<p>Internally, the decimal is represented by a period, and all the php functions will recognize that (eg. <em>/[0-9.]+/</em> matches, whereas <em>/[0-9,]+/</em> does not).  However, if you need to print the string to pass it to another library or page (into a mysql query, passing to javascript, etc.) it&#8217;s going to become a comma.  Another example:</p>
<pre class="php"></code>
preg_match('/\w/', 'ホーム'); // Will never match, regardless of LC_ALL
</code></pre>
<p>Using regular expressions on UTF-8 data can be risky.  The \w and [[:alpha:]] character escapes only ever match single byte values (ie. characters with values up to 256) with the <a href="http://php.oregonstate.edu/manual/en/ref.pcre.php">preg functions</a>.  The <a href="http://www.pcre.org/pcre.txt"><abbr title="Perl-Compatible Regular Expressions">PCRE</abbr> Documentation</a> says:</p>
<blockquote><p>
 &#8220;This remains true even when PCRE includes Unicode property support, because to do otherwise would slow down PCRE in many common cases. If you really want to test for a  wider sense  of,  say,  &#8220;digit&#8221;,  you must use Unicode property tests such as \p{Nd}.&#8221;
</p></blockquote>
<p>If we need to match UTF-8 strings with regular expressions in PHP, we can use:</p>
<pre class="php"></code>
mb_regex_encoding('UTF-8');
mb_ereg('\w+', 'ホーム', $match);
print_r($match); // Prints: Array ( [0] => ホーム  )
</code></pre>
<p>By setting the internal regular expression encoding to UTF-8, and using the <a href="http://us2.php.net/manual/en/function.mb-ereg.php">mb_ereg()</a> function, we can match multibyte characters with regular expressions.  Realize though, that this has the performance issues the PCRE documentation mentioned.</p>
]]></content:encoded>
			<wfw:commentRss>http://micropipes.com/blog/2007/07/26/ten-tips-for-website-localization/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>
