There is a problem with the notion of both trailing spaces and fixed width in SQL Server, when you are using Windows collations.

I bring this up since I have a meeting this week where people will be talking about what are the differences between what Windows provides in comparisons and what SQL Server needs. And it is easiest when one is going into a meeting to discuss to technical issues if one understand the underlying technical information behind people's requirements.... :-)

In theory (if you are a regular reader here) you might even know what the problem is. Because even though I have never talked about it, I have talked about all of the underlying pieces that can lead one to understanding that there is indeed a problem.

Now the theory behind the trailing space behavior is actually based on a wider behavior in the SQL standard that goes beyond Microsoft SQL Server, and it is mainly there to help defined behavior of comparisons between CHAR and VARCHAR (or between NCHAR and NVARCHAR). Since CHAR/NCHAR is presumed to have a fixed length, in theory any comparison between CHAR//NCHAR columns of the same length are supposed to be "easier" due to that fixed length that they share.

But with Windows-based collations, it is the language-independent text elements and the language-dependent sort elements (both discussed previously in posts like this one and this one) that help to make up the actual elements that are compared. For our purpose let's assume that sort elements contain the set of all text elements and extend them with language specific behavior....

Thus on all collations U+00e5 (å, LATIN SMALL LETTER A WITH RING ABOVE) is equal to U+0061 U+030a (å, a.k.a. LATIN SMALL LETTER A + COMBINING RING ABOVE), even though they are not same length. This is a Unicode equivalence that is should always be true.

And on some but not all collations U+00c6 (Æ, a.k.a. LATIN CAPITAL LETTER AE) is treated as equal to U+0041 U+0045 (AE, a.k.a. LATIN CAPITAL LETTER A + LATIN CAPITAL LETTER E), even though they are not the same length. This is not a Unicode equivalence but one defined in individual collations (in this example most of them).

(for the most part text elements only refer to Unicode columns, but sort elements refer to both Unicode and non-Unicode comparisons....)

And since neither SQL Server nor the SQL Standard defines the notion of either text elements or sort elements, the whole conversation about comparisons between fixed length text columns is meaningless in the context of using Windows collations, which are based on the notion of comparing unique sort elements.

Thus the only thing that the weird and hard to define/explain behavior of trailing spaces and how they affect comparisons in SQL Server provide is a source of confusion and bugs, rather than a defined way for a collation function to perhaps behave.

As weird as the behavior I mentioned here, if not significantly weirder. And probably behind a lot of issues like the one Brian Gonsalves mentioned in this comment.

Does that mean that SQL Server needs a new set of text data types whose maximums involve sort elements?

Such a data type seems to really not be necessary or even useful -- the behavior of VARCHAR and NVARCHAR with the length based on the underlying storage seems more meaningful here, and turning off the whole trailing spaces behavior seems more sensible anyway.

To me at least.

In the end, people are going to have an uphill battle trying to describe the requirements behind anything other than padding strings with spaces themselves prior to calling a function like CompareString. Particularly needing new flags for special/different behavior for "trailing spaces". Since such behavior make no sense whatsoever from a collation perspective anyway.

It should be an interesting conversation, in any case.... :-)

 

This post brought to you by  (U+fb03, a.k.a. LATIN SMALL LIGATURE FFI)