WAD Limitations on display of unicode punctuation marks

Feb 23, 2012
240
3
I have many files whose names contain punctuation marks from the Hebrew Unicode Block; specifically, they include characters U+05F3 and U+05F4 (Hebrew single-quote and Hebrew double-quote, respectively. Both are valid characters in Windows filenames).
When I run "dir" on such files (in a Take Command tab), the two aforementioned characters always display as question marks. Similarly, if I try to paste these characters into the command line, they come out as question marks.
Yet, I have verified that the font that my TC console is set to use (Miryam Fixed Regular) does indeed contain glyphs these characters.
So, why would TC block the display of these valid characters? What causes TC to change a given character to a question mark? Just as I can display and type regular characters from the Hebrew Unicode Block in my console (with the Miryam Fixed font), so too I would expect to be able to display the aforementioned punctuation marks as well.
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,461
88
Albuquerque, NM
prospero.unm.edu
Be aware that there are two fonts involved when you're using Take Command: the TC tab font (Options / Configure Take Command / Tabs / Font), and the font used by the underlying console window (detach a tab to check this). The console window is invisible, but its font is still significant! Both must support your characters, or you'll get the question marks.
 
Feb 23, 2012
240
3
Hi Charles,
In my TCC detached tab, if I set my font to "Courier New", I can indeed see the Hebrew punctuation characters. However, in TC, even if the font is set to "Courier New", the Hebrew punctuation is displayed as question marks.
I wonder, therefore, what the difference is between the two cases; why is TC deciding to replace the character with question mark in these cases, even though the glyph does exist in the TC tab font?
- Avi

Be aware that there are two fonts involved when you're using Take Command: the TC tab font (Options / Configure Take Command / Tabs / Font), and the font used by the underlying console window (detach a tab to check this). The console window is invisible, but its font is still significant! Both must support your characters, or you'll get the question marks.
 
Feb 23, 2012
240
3
Hi Charles,
I think you've solved the issue for me. You were correct that my TCC.exe was defaulting to Raster Fonts. I have now changed tcc.exe to default to "Courier New", and now, all of a sudden, TC does indeed display the Hebrew punctuation marks that I referred to earlier; no more question marks!
Thank you for your help. It still seems a bit mysterious to me; why is TC be limited to the characters supported by the current default font for tcc.exe? I'd be interested to hear the answer, in order to better understand how TC/TCC works.
By the way, when TCC.exe was set to use default Raster Fonts, it didn't display Hebrew letters, either (even though TC was displaying them). Yet it did map the Hebrew letters to high-ascii chars (which were displayed as english chars with diacritics). So it seems that there is something a bit more complex going on here. TC did use its font to display Hebrew chars, even though tcc.exe couldn't display them, apparently because tcc.exe at least knew how to map them to high ascii. In contrast, the Hebrew quote marks were simply question-mark characters - without any mapping to high ascii.


If you detach a tab that's showing the issue, is it in fact using Courier New? Or is it perhaps defaulting to e.g. Raster Fonts?
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,461
88
Albuquerque, NM
prospero.unm.edu
Remember, Take Command doesn't replace the Windows console system; it hides it and provides a nicer interface, but the Windows console is still there, underneath.

I think what's happening is that when you try to print a character that isn't supported by the console font, the Windows console subsystem helpfully substitutes some character that is available. Sometimes it's a question mark, as you've seen; other times Windows substitutes a character that looks similar -- an unaccented Roman letter for an accented one, even a grave accent for a single open quote (Gah! -- fixed in Windows 7, thank goodness.)

Then when Take Command looks in the console buffer for characters to display, it finds the wrong character and dutifully displays it -- even if the correct one is available in the TC tab font. Take Command has no way of knowing that the character in the buffer wasn't the one you requested.
 
Feb 23, 2012
240
3
Hi Charles,
Thanks for your explanation. This makes a lot of sense, and fits the various phenomena that I have been seeing. As you note, it's not always a question mark. For instance, the "per mile" sign (U+2030; it looks like this: ‰) was being displayed as a percent sign, even though my TC font contained a proper "per mile" sign. The percent sign looks similar, but is certainly not the same thing, and it is alarming to consider how the console changes characters in this way.
However, now that I have switched the console to my Hebrew-based "Courier New", I have found an interesting phenomenon: my TC console now displays *all* unicode characters (as far as I can tell), including many characters not found neither in my TCC font nor in my TC font.
Take, for instance, the "postal mark face" (U+3020; it looks like this: 〠) There is no glyph for this character in my TCC font (Courier New) nor in my TC font (Miryam fixed). In TCC.exe, it appears as an empty box. However, in TC, I see it perfectly; presumably it is using the default unicode glyph for that character.
So, my conclusion so far is the following:
1] when the console is set to "Raster Fonts", it is limited to the characters in the raster font, or perhaps, it is limited to the 256 characters in the code page.
2] on the other hand, once the console is set to a truetype font, it preserves the full range of unicode characters within the console buffer, even if they cannot be displayed in the current console font, and even if the current console code page does not have a placemarker for them.
3] Within TC, when retrieving a character from the console buffer for which a glyph does not exist within the current TC font, the corresponding default unicode glyph is displayed instead. (Presumably this switch happens at the level of the ScriptOut or TextOut API).
Does this make sense to you? Do you reproduce my results?
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,461
88
Albuquerque, NM
prospero.unm.edu
You may well be right; I can display that "postal face" even though it doesn't occur in my tab font (Consolas) either. Perhaps Windows is substituting a different font that does contain that character? Rex might know; I sure don't....
 
Nov 2, 2009
294
6
Chile
www.farah.cl
Hi Charles,
Thanks for your explanation. This makes a lot of sense, and fits the various phenomena that I have been seeing. As you note, it's not always a question mark. For instance, the "per mile" sign (U+2030; it looks like this: ‰) was being displayed as a percent sign, even though my TC font contained a proper "per mile" sign. The percent sign looks similar, but is certainly not the same thing, and it is alarming to consider how the console changes characters in this way.
However, now that I have switched the console to my Hebrew-based "Courier New", I have found an interesting phenomenon: my TC console now displays *all* unicode characters (as far as I can tell), including many characters not found neither in my TCC font nor in my TC font.
Take, for instance, the "postal mark face" (U+3020; it looks like this: 〠) There is no glyph for this character in my TCC font (Courier New) nor in my TC font (Miryam fixed). In TCC.exe, it appears as an empty box. However, in TC, I see it perfectly; presumably it is using the default unicode glyph for that character.

I guess Windows is handling composite fonts internally, to be able to show a particular code point.

Can you share the relevant directives and configuration from your system? I'd like to give this a try as well.
 
Feb 23, 2012
240
3
Tcc.exe - Defaults/Font set to "Courier New"
TC - Font set to "Miriam Fixed Regular"
O/S: Windows 7 64-bit.
USP10.dll version: 1.626.7601.17514 (assuming that calls to display text in TC are going through the ScriptOut API, and therefore through usp10.dll. Is this correct, Rex?)
mfarah - What other directives and configuration items would you like to know?

I guess Windows is handling composite fonts internally, to be able to show a particular code point.

Can you share the relevant directives and configuration from your system? I'd like to give this a try as well.
 
Nov 2, 2009
294
6
Chile
www.farah.cl
Tcc.exe - Defaults/Font set to "Courier New"
TC - Font set to "Miriam Fixed Regular"
O/S: Windows 7 64-bit.
USP10.dll version: 1.626.7601.17514 (assuming that calls to display text in TC are going through the ScriptOut API, and therefore through usp10.dll. Is this correct, Rex?)
mfarah - What other directives and configuration items would you like to know?

Avi, I assume you've set the code page to 65001 as well.
 
Feb 23, 2012
240
3
Avi, I assume you've set the code page to 65001 as well.
No, actually, I'm using my default code page (862 = DOS Hebrew). Oddly, the enabling of unicode characters within command prompt hinges entirely on the selected font, rather than on the code page. In any case, 65001 refers specifically to UTF-8 encoding, rather than to unicode in general.
 
Similar threads
Thread starter Title Forum Replies Date
M QueryBox Limitations? Support 18
M Somewhat inconsistent limitations… Support 4
S TRUENAME limitations Support 2
G v28 Display Issue Support 7
samintz WAD Display wrapping issue Support 5
K Fixed Prompt display will be shifted after use dir to display a filename with Chinese. (v25.00.28 x64) Support 18
Jesse Heines How to? How to display picture creation date with dir command Support 6
B Fullwidth Unicode forms display incorrectly Support 5
D Need to set Take Command font size when switching to high DPI display Support 0
T FFind - can we display n number of lines after the find? Support 2
Joe Caverly PSHELL Blank Line Display Support 2
Alpengreis The TCMD Display problems and font size ... Support 2
vefatica Display of special characters in aliases. Support 25
R How to? Display text same as in CMD Support 14
T TCC display issue Support 1
MikeBaas How to? SELECT: exclude files / display prompt Support 2
D WAD TASKBAR options LogOff and ShutDown do not display a dialog Support 2
rfaquino How to? Display filename being copied on a single line Support 2
rps How to? Portable TCMD display problem Support 2
L Windows 8.1 version display Support 5
mfarah How to? Getting pdir to display relative paths. Support 2
D Folders pane display problem when starting Support 4
D Folders pane display problem after rename Support 0
C How to? Multi-Display Configuration Support 5
N Incorrect display under TCMD 13 Support 1
Joe Caverly Display the VIEW window in a Take Command tab window Support 2
L FFIND no result display Support 10
B TCC v12 display double problem Support 2
B Bdebugger / IDE editor doesn't display called batch file Support 1
nikbackm UTF-8 display in TCMD Support 1
Peter Bratton IDE environment window display Support 0
G Display setting for Tabbed Toolbar Support 1
MickeyF Nice alias display Support 0
Peter Murschall TEE cannot handle Unicode output Support 2
T @execstr unicode support Support 6
Peter Murschall TPIPE generate unicode on Piping or redirecting Support 3
D Pasting Unicode data has different behavior on TCC and CMD Support 2
vefatica TYPE goes crazy with no-BOM Unicode file Support 7
Charles Dye TCC smashing Unicode quotes Support 9
Peter Murschall UNICODE mixed with ANSI Code Support 11
Joe Caverly Unicode, Codepage 437, and line characters Support 3
B How to? Convert Unicode to ANSI Support 1
StarliteLemming Fileread fails on Unicode file Support 10
vefatica DO ... /P ... and Unicode? Support 3
vefatica Unicode ... I don't understand Support 1
jadaml Echo unicode characters from UTF-8 Batch files? Support 1
Charles Dye @ASCII vs. @UNICODE Support 5
A How to? Filter history list with unicode chars Support 0
vefatica TYPE, Unicode, installer Support 10
A Include lists and Unicode Support 1

Similar threads