Most text files are stored using ASCII characters - each character is encoded using one byte (8 bits). This means that we can have a maximum of 256 different characters. This isn't a problem in most English speaking environments, but it does become a problem once you start encoding characters in different languages.
Unicode is a standard for encoding characters that tries to address the problem of encoding all possible international characters into a single, unified format.
As with most standards, there are several flavors to choose from. V supports UCS-2 and UTF-8. (See the note below regarding UTF-16)
Status Bar Indicator
V will automatically detect most Unicode files and display them accordingly, including files with foreign characters. UNI will be displayed in the status bar to indicate that the file is a Unicode file. ANS (for ANSI) will be displayed in the status bar when the file is not a Unicode file.
If V does not guess the correct encoding, you can click on the UNI/ANS indicator in the status bar and select the correct encoding (assuming that you know what it is).
Font Substitution
V does not support font substitution (or font fallback). Under font substitution, if the selected font does not contain a particular character, the program will try to use another font to display that character. Since V does not do font substitution, it is very important to use a font that contains all the characters to be displayed. In particular, the standard Courier font should not be used to display Unicode files - Courier New should be used instead.
UCS-2 vs UTF-16
Strictly speaking, V does not fully support UTF-16 - it only supports UCS-2 (which is the outdated predecessor to UTF-16).
UCS-2 is a fixed length encoding that encodes all characters to a 16 bit value (from 0 to FFFF). UTF-16 is a variable length encoding capable of encoding the entire Unicode range of characters. In particular, UTF-16 can be used to encode characters greater than FFFF.
However, in most cases, UCS-2 and UTF-16 are identical. If users encounter any problems viewing Unicode files, please contact [email protected] (preferably attaching a copy of the Unicode file).
Notes
V does not support UTF-32
V does not support RTL (Right To Left) display