Thursday, December 21, 2006 12:01 AM
Michael S. Kaplan
It's not right when IsRightToLeft is wrong
Yesterday in the post For the [locale] explorer in you...., I mentioned that there was a bug. Francois is actually the person who saw it, and he came and asked me about it....
The bug can be seen in the picture of the Uighur (PRC) culture (ug-CN):

Can you see it?
The side effect of the bug is the way that the parentheses are screwed up in the Native Name at the top. But the actual bug is in the purple Text section, where it claims that Bidirectional is False for a fulture for which it shoulc clearly be True.
As you can tell by looking in the source for Culture Explorer 2.0, Francois is simply using the TextInfo.IsRightToLeft property both to fill in that purple item and to set the TextBox.RightToLeft property of the controls containing native text with lines like:
NameNative.RightToLeft = ci.TextInfo.IsRightToLeft ? RightToLeft.Yes : RightToLeft.No;
So, there is a bug either in the Windows locale data for Uighur, or in the .NET Framework code that synthesizes a Windows Only culture from the Windows data.
My psychic powers suggested to me that the Windows data was correct, because although locale data can have mistakes on occasion, it is more likely that the specific locale data was reviewed than a generic process that may not have been tested across every possible culture since it shipped before Vista was widely available. It could have gone either way, I guess it was just a judgment thing.
To prove the acuity of my psychic powers, I suppose I could just ask you to run the How To [NOT] detect that a locale is bidi or even the How To detect that a locale is bidi code, or I could make you look at the binary FONTSIGNATURE for the locale, which in WCHAR values returned by GetLocaleInfo looks like this:
\x2000\x0000\x0000\x8000\x0008\x0000\x0000\x8800\x0000\x0000\x0000\x0000\x0000\x0000\x0000\x0000
But for those who do not find such a view to be too comfortable and who wanted more than just a rerun of blog posts past, let's take the following managed code instead:
using System;
using System.Globalization;
namespace Testing {
class LdmlDump {
[STAThread]
static void Main(string[] args) {
CultureInfo ci;
string stCulture;
// First figure out the name
if(args.Length > 0) {
stCulture = args[0];
} else {
stCulture = CultureInfo.CurrentCulture.Name;
}
// Create the culture and say what it is
ci = new CultureInfo(stCulture, false);
Console.WriteLine("\r\nUsing the following culture: '{0}' ({1})\r\n", ci.DisplayName, ci.Name);
// Create the replacement and fill it
CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(stCulture, CultureAndRegionModifiers.Replacement);
carib.LoadDataFromCultureInfo(ci);
carib.LoadDataFromRegionInfo(new RegionInfo(stCulture));
carib.Save(stCulture + ".ldml");
}
}
}
Stick it in a file called DumpLdml.cs and compile it with the following from CMD:
csc DumpLdml.cs /r:sysglobl.dll
Now you can run it on any culture on the machine. This code may come in handy in future posts, too. :-)
We'll try both ar-SA and ug-CN, with mn-Mong-CN for luck:
E:\Users\michkap>DumpLdml.exe ar-SA
Using the following culture: 'Arabic (Saudi Arabia)' (ar-SA)
E:\Users\michkap>DumpLdml.exe ug-CN
Using the following culture: 'Uighur (PRC)' (ug-CN)
E:\Users\michkap>DumpLdml.exe mn-Mong-CN
Using the following culture: 'Mongolian (Traditional Mongolian, PRC)' (mn-Mong-CN)
Now looking at the LDML for each, one finds some interesting info. Both ar-SA and ug-CN have the following in them for the font signature:
<msLocale:fontSignature>
<msLocale:unicodeRanges>
<msLocale:range type="13" />
<msLocale:range type="63" />
<msLocale:range type="67" />
<msLocale:layoutProgress type="horizontalRightToLeft" />
</msLocale:unicodeRanges>
while mn-Mong-CN has:
<msLocale:fontSignature>
<msLocale:unicodeRanges>
<msLocale:range type="81" />
<msLocale:layoutProgress type="verticalBeforeHorizontal" />
</msLocale:unicodeRanges>
The layoutProgress is referring to the bits I talked about previously in How To [NOT] detect that a locale is bidi -- the following bits in the Unicode subset bitfields:
| 123 |
Windows 2000 or later: Layout progress, horizontal from right to left |
| 124 |
Windows 2000 or later: Layout progress, vertical before horizontal |
| 125 |
Windows 2000 or later: Layout progress, vertical bottom to top |
You can kind of tell where the language in the LDML comes from, huh? :-)
Anyway, it is clear that ug-CN has these bits set correctly, so the bug has to be in the .NET Framework code that synthesizes the Windows Only culture not using this information. Perhaps understandable given how obscure it is though -- further proof that we need our own LCTYPE containing the information in a more easily digested form? :-)
By the way Francois, I verified that this bug has already been reported in the .NET Framework, so no need to bug a new bug in. Though you could bump the number of occurrences if you wanted to. :-)
This post brought to you by ת (U+05ea, a.k.a. HEBREW LETTER TAV)