Normalize. The Normalize method changes Unicode character sequences. A string's buffer is represented in Unicode. Normalize affects how the Unicode characters are ordered.
We explore how the representations of string data change. This method is not need in many C# programs, but when it is needed, it is important.
This program introduces a string with an accent on the lowercase a. We call Normalize with no parameters, and then Normalize with the parameters NormalizationForm.FormD, FormKC, and FormKD.
Then We print, with Console.WriteLine, the resulting strings to the screen as we go along.
Here The first call to Normalize uses the NormalizationForm.FormC enum in its implementation. This detail can be seen in IL Disassembler.
Info The results have 2 forms: the "a" with the accent on top, and an ASCII "a" with a single-quote character following it.
Finally In FormD and FormKD, the single-quote character follows the accented letter.
using System; using System.Text; const string input = "á"; string val2 = input.Normalize(); Console.WriteLine(val2); string val3 = input.Normalize(NormalizationForm.FormD); Console.WriteLine(val3); string val4 = input.Normalize(NormalizationForm.FormKC); Console.WriteLine(val4); string val5 = input.Normalize(NormalizationForm.FormKD); Console.WriteLine(val5);
á a ' á a '
IsNormalized. In Unicode strings, there are different normalization forms. With the IsNormalized method you can test for normalized character data.
Detail We declare a string that has an accent in it. With Normalize and IsNormalized, only non-ASCII characters are affected.
And IsNormalized returns true if the string is normalized to FormC. It returns false if the form is FormD.
Note You can also pass an argument to IsNormalized. In this case, that specific normalization form is checked.
using System; using System.Text; const string input = "á"; string val2 = input.Normalize(); string val3 = input.Normalize(NormalizationForm.FormD); Console.WriteLine(input.IsNormalized()); Console.WriteLine(val2.IsNormalized()); Console.WriteLine(val3.IsNormalized()); Console.WriteLine( val3.IsNormalized(NormalizationForm.FormD));
True True False True
A discussion. Mainly, the Normalize method is useful for interoperability purposes. If you have to interact with another program that uses Unicode, it would be important to call Normalize.
Tip There is no reason to call Normalize if you are just using ASCII or if you are not interoperating with another Unicode form.
Discussion, continued. IsNormalized addresses the need to determine the normalization status of a string. Normalization is necessary when interoperating with other systems.
Detail You can ignore IsNormalized and just leave strings in their default normalization format.
A summary. Normalize() provides interoperation with other systems. It is not a commonly needed string method. But it reveals an important detail of the string implementation.
