WAD Limitations on display of unicode punctuation marks

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
Feb 23, 2012
238
3
#1
I have many files whose names contain punctuation marks from the Hebrew Unicode Block; specifically, they include characters U+05F3 and U+05F4 (Hebrew single-quote and Hebrew double-quote, respectively. Both are valid characters in Windows filenames).
When I run "dir" on such files (in a Take Command tab), the two aforementioned characters always display as question marks. Similarly, if I try to paste these characters into the command line, they come out as question marks.
Yet, I have verified that the font that my TC console is set to use (Miryam Fixed Regular) does indeed contain glyphs these characters.
So, why would TC block the display of these valid characters? What causes TC to change a given character to a question mark? Just as I can display and type regular characters from the Hebrew Unicode Block in my console (with the Miryam Fixed font), so too I would expect to be able to display the aforementioned punctuation marks as well.
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,376
39
Albuquerque, NM
prospero.unm.edu
#2
Be aware that there are two fonts involved when you're using Take Command: the TC tab font (Options / Configure Take Command / Tabs / Font), and the font used by the underlying console window (detach a tab to check this). The console window is invisible, but its font is still significant! Both must support your characters, or you'll get the question marks.
 
Feb 23, 2012
238
3
#3
Hi Charles,
In my TCC detached tab, if I set my font to "Courier New", I can indeed see the Hebrew punctuation characters. However, in TC, even if the font is set to "Courier New", the Hebrew punctuation is displayed as question marks.
I wonder, therefore, what the difference is between the two cases; why is TC deciding to replace the character with question mark in these cases, even though the glyph does exist in the TC tab font?
- Avi

Be aware that there are two fonts involved when you're using Take Command: the TC tab font (Options / Configure Take Command / Tabs / Font), and the font used by the underlying console window (detach a tab to check this). The console window is invisible, but its font is still significant! Both must support your characters, or you'll get the question marks.
 
Feb 23, 2012
238
3
#5
Hi Charles,
I think you've solved the issue for me. You were correct that my TCC.exe was defaulting to Raster Fonts. I have now changed tcc.exe to default to "Courier New", and now, all of a sudden, TC does indeed display the Hebrew punctuation marks that I referred to earlier; no more question marks!
Thank you for your help. It still seems a bit mysterious to me; why is TC be limited to the characters supported by the current default font for tcc.exe? I'd be interested to hear the answer, in order to better understand how TC/TCC works.
By the way, when TCC.exe was set to use default Raster Fonts, it didn't display Hebrew letters, either (even though TC was displaying them). Yet it did map the Hebrew letters to high-ascii chars (which were displayed as english chars with diacritics). So it seems that there is something a bit more complex going on here. TC did use its font to display Hebrew chars, even though tcc.exe couldn't display them, apparently because tcc.exe at least knew how to map them to high ascii. In contrast, the Hebrew quote marks were simply question-mark characters - without any mapping to high ascii.


If you detach a tab that's showing the issue, is it in fact using Courier New? Or is it perhaps defaulting to e.g. Raster Fonts?
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,376
39
Albuquerque, NM
prospero.unm.edu
#6
Remember, Take Command doesn't replace the Windows console system; it hides it and provides a nicer interface, but the Windows console is still there, underneath.

I think what's happening is that when you try to print a character that isn't supported by the console font, the Windows console subsystem helpfully substitutes some character that is available. Sometimes it's a question mark, as you've seen; other times Windows substitutes a character that looks similar -- an unaccented Roman letter for an accented one, even a grave accent for a single open quote (Gah! -- fixed in Windows 7, thank goodness.)

Then when Take Command looks in the console buffer for characters to display, it finds the wrong character and dutifully displays it -- even if the correct one is available in the TC tab font. Take Command has no way of knowing that the character in the buffer wasn't the one you requested.
 
Feb 23, 2012
238
3
#7
Hi Charles,
Thanks for your explanation. This makes a lot of sense, and fits the various phenomena that I have been seeing. As you note, it's not always a question mark. For instance, the "per mile" sign (U+2030; it looks like this: ‰) was being displayed as a percent sign, even though my TC font contained a proper "per mile" sign. The percent sign looks similar, but is certainly not the same thing, and it is alarming to consider how the console changes characters in this way.
However, now that I have switched the console to my Hebrew-based "Courier New", I have found an interesting phenomenon: my TC console now displays *all* unicode characters (as far as I can tell), including many characters not found neither in my TCC font nor in my TC font.
Take, for instance, the "postal mark face" (U+3020; it looks like this: 〠) There is no glyph for this character in my TCC font (Courier New) nor in my TC font (Miryam fixed). In TCC.exe, it appears as an empty box. However, in TC, I see it perfectly; presumably it is using the default unicode glyph for that character.
So, my conclusion so far is the following:
1] when the console is set to "Raster Fonts", it is limited to the characters in the raster font, or perhaps, it is limited to the 256 characters in the code page.
2] on the other hand, once the console is set to a truetype font, it preserves the full range of unicode characters within the console buffer, even if they cannot be displayed in the current console font, and even if the current console code page does not have a placemarker for them.
3] Within TC, when retrieving a character from the console buffer for which a glyph does not exist within the current TC font, the corresponding default unicode glyph is displayed instead. (Presumably this switch happens at the level of the ScriptOut or TextOut API).
Does this make sense to you? Do you reproduce my results?
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,376
39
Albuquerque, NM
prospero.unm.edu
#8
You may well be right; I can display that "postal face" even though it doesn't occur in my tab font (Consolas) either. Perhaps Windows is substituting a different font that does contain that character? Rex might know; I sure don't....
 
#9
Hi Charles,
Thanks for your explanation. This makes a lot of sense, and fits the various phenomena that I have been seeing. As you note, it's not always a question mark. For instance, the "per mile" sign (U+2030; it looks like this: ‰) was being displayed as a percent sign, even though my TC font contained a proper "per mile" sign. The percent sign looks similar, but is certainly not the same thing, and it is alarming to consider how the console changes characters in this way.
However, now that I have switched the console to my Hebrew-based "Courier New", I have found an interesting phenomenon: my TC console now displays *all* unicode characters (as far as I can tell), including many characters not found neither in my TCC font nor in my TC font.
Take, for instance, the "postal mark face" (U+3020; it looks like this: 〠) There is no glyph for this character in my TCC font (Courier New) nor in my TC font (Miryam fixed). In TCC.exe, it appears as an empty box. However, in TC, I see it perfectly; presumably it is using the default unicode glyph for that character.
I guess Windows is handling composite fonts internally, to be able to show a particular code point.

Can you share the relevant directives and configuration from your system? I'd like to give this a try as well.
 
Feb 23, 2012
238
3
#10
Tcc.exe - Defaults/Font set to "Courier New"
TC - Font set to "Miriam Fixed Regular"
O/S: Windows 7 64-bit.
USP10.dll version: 1.626.7601.17514 (assuming that calls to display text in TC are going through the ScriptOut API, and therefore through usp10.dll. Is this correct, Rex?)
mfarah - What other directives and configuration items would you like to know?

I guess Windows is handling composite fonts internally, to be able to show a particular code point.

Can you share the relevant directives and configuration from your system? I'd like to give this a try as well.
 
#11
Tcc.exe - Defaults/Font set to "Courier New"
TC - Font set to "Miriam Fixed Regular"
O/S: Windows 7 64-bit.
USP10.dll version: 1.626.7601.17514 (assuming that calls to display text in TC are going through the ScriptOut API, and therefore through usp10.dll. Is this correct, Rex?)
mfarah - What other directives and configuration items would you like to know?
Avi, I assume you've set the code page to 65001 as well.
 
Feb 23, 2012
238
3
#12
Avi, I assume you've set the code page to 65001 as well.
No, actually, I'm using my default code page (862 = DOS Hebrew). Oddly, the enabling of unicode characters within command prompt hinges entirely on the selected font, rather than on the code page. In any case, 65001 refers specifically to UTF-8 encoding, rather than to unicode in general.