Most text files are stored using ASCII characters - each character is encoded using one byte (8 bits). This means that we can have a maximum of 256 different characters.  This isn't a problem in most English speaking environments, but it does become a problem once you start encoding characters in different languages.

 

Unicode is a standard for encoding characters that tries to address the problem of encoding all possible international characters into a single, unified format.

 

As with most standards, there are several flavors to choose from. V supports UCS-2 and UTF-8. (See the note below regarding UTF-16)

 

Status Bar Indicator

 

V will automatically detect most Unicode files and display them accordingly, including files with foreign characters. UNI will be displayed in the status bar to indicate that the file is a Unicode file. ANS (for ANSI) will be displayed in the status bar when the file is not a Unicode file.

 

If V does not guess the correct encoding, you can click on the UNI/ANS indicator in the status bar and select the correct encoding (assuming that you know what it is).

 

Font Substitution

 

V does not support font substitution (or font fallback). Under font substitution, if the selected font does not contain a particular character, the program will try to use another font to display that character. Since V does not do font substitution, it is very important to use a font that contains all the characters to be displayed. In particular, the standard Courier font should not be used to display Unicode files - Courier New should be used instead.

 

UCS-2 vs UTF-16

 

Strictly speaking, V does not fully support UTF-16 - it only supports UCS-2 (which is the outdated predecessor to UTF-16).

 

UCS-2 is a fixed length encoding that encodes all characters to a 16 bit value (from 0 to FFFF). UTF-16 is a variable length encoding capable of encoding the entire Unicode range of characters. In particular, UTF-16 can be used to encode characters greater than FFFF.

 

However, in most cases, UCS-2 and UTF-16 are identical. If users encounter any problems viewing Unicode files, please contact [email protected] (preferably attaching a copy of the Unicode file).

 

 

Notes

 

V does not support UTF-32

 

V does not support RTL (Right To Left) display