I've been adding some unit tests to a Base64 encoding library and discovered that the existing code did not properly handle some non-english characters. I traced the error to the code which converts a string into an array of bytes. I figured, "oh", boneheaded move. I was expecting that each character would be stored in a single byte and I just need to expect two bytes per character because I'm dealing with Unicode strings in .Net.
That got me a little farther but then I realized that the byte order was reversed from what I expected it to be. The letter A would convert to 0x41, 0x00 instead of what I expected which was 0x00, 0x41. Fix that and now some of the tests are passing and some are failing.
Two hours later I realize that I had a link to this great article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.If you're having any similar issues you should read this article first.
It turns out that UTF-8, which is what Microsoft uses for most string conversion in .Net dynamically chooses how many bytes to write per character based on what's needed. The normal ASCII characters like A are encoded in a single byte and the larger characters are encoded in 2 or more bytes.
I adjusted my test cases to account for the way UTF-8 really works instead of the way I thought all unicode strings were handled and green lights across the board for the unit tests. Hope this saves someone else the time that I wasted.