How can we display unicode characters? Other posts here don't seem to answer

ClioCJS · May 19, 2023

I see a lot of discussion but I don't see a "here is your solution" followed by "thanks that worked!"

I'm tired of the questionmark inside a box.

I want to see my unicode characters.

I don't get what I'm doing wrong.

Changing code page to 65001 fixes nothing (and I thought, breaks other things).

rconn · May 19, 2023

You need to be using a Unicode font. If you're running TCC in a console window, click on the icon in the upper left corner of the window, select Properties, then the Font tab. If you're running TCC in a Take Command tab window, click on the Options menu, then Tabs, and select a font.

My own preference is for Cascadia Code or Consolas.

AnrDaemon · May 21, 2023

Basically, you could pick any modern programming font.
Following Consolas and Cascadia Code, there are Fira Code, Iosevka, JetBrains Mono and other fonts.

ClioCJS · May 28, 2023

I'm using Consolas. What you've said simply isn't true. The characters don't display. I've spent a week writing a unicode filename scrubber (even the official unicode libraries for python don't correctly substitute all characters) to work around the fact that these absolutely do NOT display right in Consolas.

At least now, I don't have any non-displayable characters, but it probably shoudln't have taken 500+ lines of code and hours of development time. Still, that was eaiser than finding a workable solution for TCC.EXE

ClioCJS · May 28, 2023

That being said. I'd still love to actually see them properly using the Consolas font in Windows 10 with TCC 28.

Not. A. Single. One. Displays.

Including trying Cascadia Code.

rconn · May 28, 2023

You haven't specified the environment where you're running TCC.

1. If you're running TCC in a Windows console, TCC has nothing do to with how the characters are displayed. That's done by Windows, and depends on your font, console properties, and code page.
2. If you're running TCC in a Windows Terminal tab, TCC has nothing to do with how the characters are displayed. That's done by Windows Terminal.
3. If you're running TCC in a Take Command tab, TCC has nothing to do with how the characters are displayed. That's done by Take Command, and depends on your TCMD font and system code page.

Code page 65001 is for UTF-8; you're trying to display UTF-16 characters.

Do you know what the UTF-16 value of those characters are? Consolas does not support the entire UTF-16 character set.

Did you try it with one of the other fonts, like Lucida Console or Cascadia?

Charles Dye · May 28, 2023

rconn said:
Do you know what the UTF-16 value of those characters are? Consolas does not support the entire UTF-16 character set.

You could pass one or two of your problematic filenames to the @ASCII function to get this information. Once we know which characters you can't display, other folks here can try to replicate your issue.

Charles Dye · May 28, 2023

Also, if you're running in a Take Command tab, which version of Take Command are you using? Is it more than a few years old? (I ask because this really is relevant information, not because I'm pushing you to upgrade.)

clio · May 28, 2023

OMG thank you for such a quick response!

rconn said:
You haven't specified the environment where you're running TCC.
1. If you're running TCC in a Windows console, TCC has nothing do to with how the characters are displayed. That's done by Windows, and depends on your font, console properties, and code page.
2. If you're running TCC in a Windows Terminal tab, TCC has nothing to do with how the characters are displayed. That's done by Windows Terminal.

Even though I've been using this command line since the NDOS.COM days in 1988, I'm actually still not sure precisely what this question is asking. It's a TCC.exe window. The window has a title bar that can be clicked for options, and the options has a terminal tab.

rconn said:
3. If you're running TCC in a Take Command tab, TCC has nothing to do with how the characters are displayed. That's done by Take Command, and depends on your TCMD font and system code page.

But based on all 3 of your answers saying TCC has nothing to do with it (i've read this before, but it's so counter-intuitive that i tend to repeatedly forget it), it sounds like which environment i'm in doesn't quite make as much of a difference as I'd thought? It's always fascinating to learn more. TCC is probably my favorite program ever.

rconn said:
Code page 65001 is for UTF-8; you're trying to display UTF-16 characters.

So you're saying really, my answer might be to switch code pages.
Not even a TCC issue, just one you probably have to field way too often (sorry!!!!!!!!!!!!!!!)

I want to say there was a reason I couldn't switch codepages.
I thought it was compatibility with lots of old utilities that i have integrated into many granular workflows that i've worked on for literally decades, but I don't remember.
I could try again now that I know this is the *right* solution. When I know a solution is right, I tend to be more invested in it and more likely to reach a happy conclusion.

rconn said:
Do you know what the UTF-16 value of those characters are? Consolas does not support the entire UTF-16 character set.

Did you try it with one of the other fonts, like Lucida Console or Cascadia?

Lucida Console for life!

But I tried Cascadia too.

But I guess since I'm on the wrong code-page, it doesn't matter.

[second comment forthcoming]

clio · May 28, 2023

I believe the filenames are Japanese. (The example I chose was a set of anime soundtracks.)

clio · May 28, 2023

I guess... which code page should i switch to? I know about the utility to do so and have played with it before, i just never reached a happy place.

Charles Dye · May 28, 2023

ClintJCL said:
Even though I've been using this command line since the NDOS.COM days in 1988, I'm actually still not sure precisely what this question is asking. It's a TCC.exe window. The window has a title bar that can be clicked for options, and the options has a terminal tab.

That's a console window. (And you're using a pretty recent version of Windows; that "Terminal" tab is new.)

ClintJCL said:
So you're saying really, my answer might be to switch code pages.

I'm pretty sure it has nothing to do with code pages. There are just very few console fonts which include Japanese characters. Try SimSun-ExtB; it's hideous, but does include Japanese and Chinese.

rconn · May 28, 2023

Ah - those filenames are probably not Unicode, they're DBCS wide characters. To display them you'll need to switch to one of the DBCS codepages (like 932 for Japanese).

That's a rare occasion when TCC does actually do something with the output rather than handing it all to Windows. TCC needs to scan DBCS strings so it can put the cursor in the right position after Windows writes the character(s).

ClioCJS · May 30, 2023

Admittedly completely off-topic now, but super interesting.

I somehow doubt there's a codepage that would display all the characters properly and still have compatibility?

I just want my youtube et al downloads to be viewable, but people put emojis and ANY kind of foreign character in them, sometimes to be cute.

I already worked a week or so on an automatic file renamer, but I'd still like to see the characters accurately in the console as the renamer wipes them out.

The renamer uses 4 python language translation libraries (polyglot first, then 3 language specific ones), then 3 phonetic mapping tables for 3 other alphabets, then a generalized unidecode library, then a custom mapping table for all the places i disagree with any of the above as well as for all the unicode characters that the unidecode library fails to decode. Which is a lot.

I constantly run into new characters, I have to add them to the mapping table. Some paste. Some can't paste and can only be referred to by the \u unicode_char_code. And some seemingly can't even be referred to by that (so i had to make a workaround for those).

It throws an exception if an unmapped character is found, and is wrapped up in a BAT file that captures the error level and runs itself over and over again until any unmapped characters are properly mapped. Forces me to fix the code.

But half the time i'm adding them to the mapping table, i can't view them. I have to paste them into google to see what they are. Would be great to be able to see the characters i'm working with.

But not just Japan. All of them.

Charles Dye · May 30, 2023

Have you tried Take Command? There's a trial period, so you can test it for free.

Take Command has a different set of issues, but at least you get font substitution.

ClioCJS · May 30, 2023

A different set of issues sounds way too scary for me :) I'm 35-years-set-in-my-ways here.

Search

Welcome!

How can we display unicode characters? Other posts here don't seem to answer

ClioCJS

rconn

Administrator

AnrDaemon

ClioCJS

ClioCJS

rconn

Administrator

Charles Dye

Super Moderator

Charles Dye

Super Moderator

clio

clio

clio

Charles Dye

Super Moderator

rconn

Administrator

ClioCJS

Charles Dye

Super Moderator

ClioCJS

Similar threads