Bugs in dealing with Chinese characters?

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
Dec 22, 2008
7
0
#1
Since 4DOS, I do like the products of JPSoft very much, especially when 4DOS 7.50 and TCC/LE are released free. I often write scripts to do lots of work.

However, recently, I find several strange behaviors of TCC/LE and TCC and 4NT v8.0+, which are never seen in 4NT 4.00. Can anyone help me to explain why such things happen and how to make them work correctly?

I am using Windows XP and my codepage is 936 (Simplified GBK). I am eager to know the secrets in these strange results and I want to know whether they are intentional or they are bugs in dealing with Chinese characters. Even if these are bugs, I hope to get temporary solutions to resolve these strange issues. Now I try to list some of my dicoveries here.


Assume that we have the following test file containing Chinese characters (this file is GBK encoded):

----file:test.txt----
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT帮助;PgUp:命令历史;Ctrl-PgUp:文件夹历史;Up/Dn:以前命令;
Ctrl-X:展开变量;Ctrl-F:展开别名;Ctrl-A:长短文件名互换;Tab:补齐文件名.
-------------------

Then, let's try the following actions:

1: Run "type test.txt". We should get the same results as shown in file test.txt, however, the display is:
-------------------
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT帮助;PgUp:命令历史;Ctrl-PgUp:文件夹历史;Up/Dn:以前命令;
Ctrl-X:展开变量;Ctrl-F:展开别名;Ctrl-A:长短文件名互换;Tab:补齐文件名.
簰鎹?Tab:琛ラ綈鏂囦欢鍚?
-------------------
Note that some strange chars are added here. And sometimes, different runs of this command may generate different results. Why?

2: Run "list test.txt". We should see the same results as in shown in this file. However, from the display (see attach test-list-result.jpg), we see that all Chinese characters are dropped off in the display! Initially I guess that they are replaced with white spaces, however, by copying the text on screen, I find that it is not so! Here is the text I copied from the screen:
-------------------
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT

;PgUp:



;Ctrl-PgUp:




;Up/Dn:



;
Ctrl-X:



;Ctrl-F:



;Ctrl-A:






;Tab:




.
-------------------
The results above indicate that the Chinese characters are replaced with invisible chars rather than white spaces.

3: Now let's test whether TCC can display Chinese chars correctly. I change to a directory containing folders with Chinese chars and run "DIR /W". The results show that the Chinese chars are displayed correctly, however, they are not aligned correctly in the second column. Here is a part of the display:
-------------------
[_tools] [列车时刻表]

[文革] [极点微风]
-------------------
Note that all Chinese characters are shown correctly. But it is very strange that a blank line is inserted here between two lines containing Chinese chars. And in the second line containing Chinese chars, the right column is not aligned with that in the first line.

4: Now let's try to see whether mostly-used command "ECHO" can work correctly. I run the following command:

for %a in (@test.txt) do echo "%a"

Then I get the following strange results:
-------------------
"Alt+H Bksp X C D E J R O M S A T L"
"F1/Ctrl-F1:4NT帮助;PgUp:命令历史;Ctrl-PgUp:文件夹历史;Up/Dn:以前命令;"
"/Dn:以前命令;"
"?"
""
"Ctrl-X:展开变量;Ctrl-F:展开别名;Ctrl-A:长短文件名互换;Tab:补齐文件名."
"换;Tab:补齐文件名."
"?"
""
-------------------
Note that in the above results, some strange extra lines are added. Those strange lines may contain Chinese characters from last line, or may contain strange ?, or even may be empty lines.

And the below are results of command:
for %a in (@test.txt) do echo %a
-------------------
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT帮助;PgUp:命令历史;Ctrl-PgUp:文件夹历史;Up/Dn:以前命令;
/Dn:以前命令;
?
ECHO is OFF
Ctrl-X:展开变量;Ctrl-F:展开别名;Ctrl-A:长短文件名互换;Tab:补齐文件名.
换;Tab:补齐文件名.
?
ECHO is OFF
-------------------

5: Now let's check whether Chinese chars can be redirected or piped correctly.

I run "type test.txt > aa.txt", and I get expected results: file aa.txt is identical with file test.txt. This is good!

Next I run "type test.txt | y > bb.txt", I find that file bb.txt is different from file test.txt! The file bb.txt contains the following lines:
-------------------
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT°??ú;PgUp:?üá?àúê·;Ctrl-PgUp:???t?Dàúê·;Up/DnCtrl-X:?1?a±?á?;Ctrl-F:?1?a±e??;Ctrl-A:3¤?ì???t???¥??;Tab:21?????
-------------------

6: Now let's check whether we can generate a text file by appending lines. I un the following command:

del cc.txt
for %a in (@test.txt) do (echo %a >> cc.txt)

And I find that file cc.txt contains the following strange lines:
-------------------
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT帮助;PgUp:命令历史;Ctrl-PgUp:文件夹历史;Up/Dn:以前命:?
ECHO is OFF
ECHO is OFF变量;Ctrl-F:展开别名;Ctrl-A:长短文件名互换;Tab:补齐文窿?
E J R O M
-------------------
Note that, once again, we see strange things as shown in 4, however, the results are not completely same as those in 4.

7: Now let's check whether function @line can work well. I try to run the following commands:

echo %@lines[test.txt]
for /L %i in (0,1,%@lines[test.txt] do echo %@line[test.txt,%i]

I got the following results:

-------------------
8
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT帮助;PgUp:命令历史;Ctrl-PgUp:文件夹历史;Up/Dn:以前命令;
/Dn:以前命令;
?
ECHO is OFF
Ctrl-X:展开变量;Ctrl-F:展开别名;Ctrl-A:长短文件名互换;Tab:补齐文件名.
换;Tab:补齐文件名.
?
ECHO is OFF
-------------------

Note that the above results show that %@lines[test.txt]=8, which is not correct. In fact test.txt contains only 3 lines. And the results also show that function %@line cannot give correct results always.

8: Then we can also test the results using DO loop. I run the following batch file test-do.btm:
-------------------
rem file test-do.btm
@echo off
do a in @test.txt
echo %a
enddo
-------------------

Again, the results are strange:
-------------------
Alt+H Bksp X C D E J R O M S A T L
F1/Ctrl-F1:4NT帮助;PgUp:命令历史;Ctrl-PgUp:文件夹历史;Up/Dn:以前命令;
/Dn:以前命令;
?
ECHO is OFF
Ctrl-X:展开变量;Ctrl-F:展开别名;Ctrl-A:长短文件名互换;Tab:补齐文件名.
换;Tab:补齐文件名.
?
ECHO is OFF
-------------------

9: Finally, I find a rather strange problem in my batch scripts. To demonstrate this problem, let's run the following batch file:
-------------------
rem file: test-msg.btm
@echo off
alias msg=`if #%_codepage==#%1 (set _msg=%2$)`
msg 437 Do you want to install font xxx? Please choose: //Copy From Local Folder/Extract From Font Archive/Download and Install/Search in CTAN by Net_pkg/Cancel
msg 936 是否安装字体xxx?请选择://从文件夹拷贝字体/从字体压缩包解压字体/在线下载并安装字体/用Net_pkg搜索CTAN/取消
msgbox OKCANCEL %_msg
-------------------
This script is used to display different message according to current code page. The current code page is 936, so it should display
-------------------
是否安装字体xxx?请选择://从文件夹拷贝字体/从字体压缩包解压字体/在线下载并安装字
体/用Net_pkg搜索CTAN/取消
-------------------

But, the result is not correct. In fact it displays the following message:
-------------------
是否安装字体xxx?请选择://从文件夹拷贝字体/从字体压缩包解压字体/在线下载并安装字
体/用Net_pkg搜索CTAN/取消m Local Folder/Extract From Font Archive/
-------------------

Note that in the above results, additional chars are added to the message. Why such strange things happen? I have no points to understand it. This script works well under earlier version (4.00) of 4NT.



In summary, I find that those operations involving Chinese characters may encounter strange problems, but I did not find these problems in earlier 4NT. I guess that these may be caused by the internal conversion between ANSI chars and Unicode chars or some unknown mechanism of processing chars in TCC. Can anybody explain the above results and give solutions to overcome these critical problems? Many thanks!
 

Attachments