Skip to main content

Redirection / Piping and Unicode at the Windows Command Prompt

Questions about how to do Unicode input / output and piping appear regularly on the JP Software Support Forums. It can be a complicated issue due to the way Windows was built — using primarily ASCII files externally, but UTF16 for most of the Windows APIs. This means that every time the command processor (TCC or CMD) reads an ASCII file, it has to convert it to UTF-16 before it can call an API or display it on the screen. And when you do output redirection to a file (i.e., “dir > file.dat”) the command processor has to convert it from UTF16 back to ASCII. Unfortunately, this ASCII -> UTF16 -> ASCII conversion is not 100% reliable. Depending on your code page and font, the ASCII going in to Windows will not always match the ASCII coming back.

One solution is to use Unicode instead of ASCII. TCC can transparently handle ASCII or Unicode files (either UTF16 or UTF8) anywhere it’s looking for file input. If you set the “Unicode output” directive in TCC (OPTION / Startup), TCC will use UTF16 for output redirection or piping, and look for UTF16 in its pipe input. This approach works well if you’re using TCC internal commands, or external apps that recognize UTF16 files.

Alternatively, the current version of TCC also allows you to use UTF8 (widely used by everything except Windows) for input and output. This requires disabling the “Unicode output”directive, and setting three things:

  1. Change your code page to 65001 (enter “chcp 65001″at the prompt).
  2. Add the directive “UTF8Output=Yes” to your TCMD.INI file in the [4NT] section.
  3. Go to OPTION / Startup and select the “UTF8″option.

TCC will then treat all input files as UTF8, and write all of its output as UTF8.

But many Windows apps (including CMD) can’t handle either UTF16 or UTF8 files. So how can we mix our formats — sometimes read & write ASCII, and sometimes Unicode? TCC has options to do this:

  • >:a – Redirected output (STDOUT and/or STDERR) is ASCII (8-bit characters)
  • >:u – Redirected output is UTF16 Unicode
  • >:8 or >:u8 – Redirected output is UTF8
  • >>:a – Appended redirected output is ASCII
  • >>:u – Appended redirected output is UTF16 Unicode
  • >>:8 or >>:u8 – Appended redirected output is UTF16

And you can do the same thing with pipes:

  • |:a – Piped output is ASCII
  • |:u – Piped output is UTF16
  • |:8 or |:u8 – Piped output is UTF8

Combining these options allow us to configure TCC to match our needs – either mostly Unicode with exceptions for apps that can’t handle it, or mostly ASCII with exceptions for apps that can take (or require) UTF8 or UTF16. Also check out the TCC internal command TPIPE, which supports both UTF8 and UTF16 and has a wide variety of conversion options.