Take a look at how it is possible to output all of the characters from a single byte character set printable or not. The output file will contain Japanese characters such as チホヤツセ.
Encoding enc = Encoding.GetEncoding("shift_jis");
byte[] m_bytes = new byte [1];
StreamWriter sw = new StreamWriter(@"C:\shift_jis.txt");
for (int i = 0; i < 256; i++)
{
m_bytes.SetValue ((byte)i,0);
String Output = enc.GetString(m_bytes);
sw.WriteLine(Output);
}
sw.Close();
sw.Dispose();
Here is my attempt to do this with a double byte character set.
Encoding enc = Encoding.GetEncoding("iso-2022-jp");
byte[] m_bytes = new byte[2];
StreamWriter sw = new StreamWriter(@"C:\iso-2022-jp.txt");
for (int i = 0; i < 256; i++)
{
m_bytes.SetValue((byte)i, 0);
for (int j = 0; j < 256; j++)
{
m_bytes.SetValue((byte)j, 1);
String Output = null;
Output = enc.GetString(m_bytes);
sw.WriteLine(Output);
}
}
sw.Close();
sw.Dispose();
The problem is the output file still only contains the first 255 characters. Each byte is evaluated separately and gives the character back for that byte individually. The output string always contains two characters and not one. Since characters in the character set are represented with two bytes you must have to specify them with two bytes right?
So how do you iterate through and print all characters from a double byte character set?
If it is ok to have them in unicode order, you could:
If you want to order it by byte sequence, you could:
or with a different sorting order (length first):
Note that in all the examples I'm only generating the chars of the basic BMP plane. I don't think that characters outside the basic BMP plane are included in any encoding... If necessary I can modify the code to support it.
Just out of curiousity, the first version of the code with handling of non-BMP characters (that aren't present in iso-2022-jp):