Which of these three lines is correct as the first line of a UTF-16 document? One byte per UniCode: Two bytes char-NUL (Lo-Hi halves of 16-bit UniCode) per UniCode: <?xml version="1.0" encoding="UTF-16" ?> Two bytes NUL-char (Hi-Lo halves of 16-bit UniCode) per UniCode: <?xml version="1.0" encoding="UTF-16" ?> It's my understanding that the first option, having the XML declaration in 8-bit format (US-ASCII) then switching to UTF-16 after it, is totally wrong. The *entire* file must be in UTF-16, even the declaration of the format. Either of the other encodings, either UTF-16-LoHi or UTF-16-HiLo, is valid. XML readers are supposed to try parsing the first logical line of the file using all possible UniCode codings (US-ASCII, UTF-8, UTF-16-hl, UTF-16-lh, UTF-32-lmmh, and UTF-32-hmml), and then guess that the correct encoding is whichever gives a consistent self-referential encoding declaration. Is this correct? For example, the following are valid first-lines of a UTF-32 file: Four bytes char-NUL-NUL-NUL (Lo-Mids-Hi bytes of 32-bit UniCode) per UniCode: <?xml version="1.0" encoding="UTF-32" ?> Four bytes NUL-NUL-NUL-char (Hi-Mids-Lo bytes of 32-bit UniCode) per UniCode: <?xml version="1.0" encoding="UTF-32" ?> Is that all correct now? I'm going to try to create an empty UTF-16-HiLo file ... done. Please edit the URL in your Web browser to replace "/OldNotes/utf.txt" by "/New/UTF-16-HiLo.XML", and see how it works. Then tell me which XML-equipped Web browser you used and how it worked. (Go over to /WAP.html and click on "Contact me".) Are there **any** XML-equipped Web browsers in existance yet?? I tried it in lynx just now, and it refuses to even try to render it, offers only download. It recognizes the file as MIME type text/xml, but that's probably only because of the XML extension. I rather doubt lynx is capable of parsing the UTF-16-HiLo text, because after all it would then be trivial then to ignore three out of every four bytes and simply render it as if the MIME type were text/plain. After all, it already completely ignores the high-order bit of Latin-1 WebPages, so why can't it do the analagous thing for UTF-16 WebPages? (Actually it *does* parse UTF-8 WebPages correctly before discarding all but the 7 low-order bits?? Or does it simply discard the high-order bit of every byte regardless of where it is within multi-byte UniCodes? I'll have to do some experiments as to what lynx really does. Meanwhile, nevermind)