TYPE command issue

#1
WinXP SP3, TCC 10.00.076, 09.02.157

When _unicode=1 and a large file is written to disk using the TYPE command,
e.g.,

type file1 > file2

there appears to be an output buffer overflow at output offsets 0x0001
e000 - 0x0002 0000, 0x0003 e000 - 0x0004 0000, repeating every 0x0002 0000
(I used spaces in the middle of the 32-bit offsets for readability). In
those areas the output file contains only repeated 0x0000 characters. The
output files for V9 and V10 are identical, both showing the same problem.
The problem is the same whether the source file is ASCII or Unicode. There
is no problem if the target is ASCII (i.e., _unicode=0), regardless of the
type of the source file (Unicode or ASCII).

I discovered the problem when I was trying to convert a large ASCII file to
unicode.
--
Steve
 

rconn

Administrator
Staff member
May 14, 2008
10,321
94
#2
WinXP SP3, TCC 10.00.076, 09.02.157

When _unicode=1 and a large file is written to disk using the TYPE command,
e.g.,

type file1 > file2

there appears to be an output buffer overflow at output offsets 0x0001
e000 - 0x0002 0000, 0x0003 e000 - 0x0004 0000, repeating every 0x0002 0000
(I used spaces in the middle of the 32-bit offsets for readability). In
those areas the output file contains only repeated 0x0000 characters. The
output files for V9 and V10 are identical, both showing the same problem.
The problem is the same whether the source file is ASCII or Unicode. There
is no problem if the target is ASCII (i.e., _unicode=0), regardless of the
type of the source file (Unicode or ASCII).

I discovered the problem when I was trying to convert a large ASCII file to
unicode.
--
Steve
TYPE doesn't buffer its output, so that would have to be a Windows problem.
 
#4
rconn wrote:
| TYPE doesn't buffer its output, so that would have to be a Windows
| problem.

When I said "buffer overrun", I did not say a buffer within TCC.EXE. Since
all disk output is buffered, the issue may be one of TCC's TYPE command
being too fast, and not throttling the output if the OS cannot keep up.
Regardless, since I only access the OS through TCC, it shows up here as a
TCC problem, esp. in view of Vince's report that CMD.EXE does not have this
problem. I hope you will find a work-around soon, and release a new build of
V10.
--
Steve
 
#5
I see the same thing. It does not happen when I start CMD.EXE with "/U" and type file.txt > ufile.txt. I get a unicode file without a single 0x0000.
I don't know what it (below) means, but it does sort of characterize the error.

Using WinDbg, breaking on each MultiByteToWideChar, and executing "type file.txt > ufile.txt", I see that MultiByteToWideChar is called every 131072 bytes. When those bytes are written to the destination file, the last 8K bytes are NULs, every time. I once saw MBTWC called after (exactly) 120KB and then again after (exactly) 8KB, but that was well after the error had happened several times.
 
#6
vefatica wrote:
| ---Quote (Originally by vefatica)---
| I see the same thing. It does not happen when I start CMD.EXE with
| "/U" and type file.txt > ufile.txt. I get a unicode file without a
| single 0x0000. ---End Quote---
| I don't know what it (below) means, but it does sort of characterize
| the error.
|
| Using WinDbg, breaking on each MultiByteToWideChar, and executing
| "type file.txt > ufile.txt", I see that MultiByteToWideChar is
| called every 131072 bytes. When those bytes are written to the
| destination file, the last 8K bytes are NULs, every time. I once
| saw MBTWC called after (exactly) 120KB and then again after
| (exactly) 8KB, but that was well after the error had happened
| several times.

Seems to me the work-around would be to always pad the data to be converted
with 8kiB of NULs, and stripping them before writing to the destination
file. This is certainly not possible to do from the TCC command line, needs
to be done internally by TakeCommand.dll dynamically.
--
Steve
 
#7
vefatica wrote:
| ---Quote (Originally by vefatica)---
| I see the same thing. It does not happen when I start CMD.EXE with
| "/U" and type file.txt > ufile.txt. I get a unicode file without a
| single 0x0000. ---End Quote---
| I don't know what it (below) means, but it does sort of characterize
| the error.
|
| Using WinDbg, breaking on each MultiByteToWideChar, and executing
| "type file.txt > ufile.txt", I see that MultiByteToWideChar is
| called every 131072 bytes. When those bytes are written to the
| destination file, the last 8K bytes are NULs, every time. I once
| saw MBTWC called after (exactly) 120KB and then again after
| (exactly) 8KB, but that was well after the error had happened
| several times.

Seems to me the work-around would be to always pad the data to be converted
with 8kiB of NULs, and stripping them before writing to the destination
file. This is certainly not possible to do from the TCC command line, needs
to be done internally by TakeCommand.dll dynamically.
--
Steve
That's a god-awful sounding kludge and it assumes there is a problem and that the problem will always be the same. I doubt a **kludge** is necessary at all. I tried a couple of bare-bones experiments (code below) that certainly do not duplicate the redirection, but use the same buffer sizes (apparently) as TCC. One uses Win32, one uses C. The tests were blisteringly fast (total 0.04 sec for 2.2MB input). The files created were identical and error free.
Code:
    DWORD dwRead, dwWritten;
    SetCurrentDirectory(L"e:\\4ntlogs");
    HANDLE hIn = CreateFile(L"history.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);
    HANDLE hOut = CreateFile(L"uhistory.txt", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0, NULL);
    CHAR aBuf[65535];
    WCHAR uBuf[65536], BOM = 0xFEFF;
    WriteFile(hOut, &BOM, 2, &dwWritten, NULL);
    while ( ReadFile(hIn, aBuf, 65536, &dwRead, NULL) && dwRead != 0 )
    {
        MultiByteToWideChar(CP_ACP, 0, aBuf, dwRead, uBuf, 65536);
        WriteFile(hOut, uBuf, dwRead * sizeof(WCHAR), &dwWritten, NULL);
    }
    CloseHandle(hIn);
    CloseHandle(hOut);

    FILE *fIn = _wfopen(L"history.txt", L"rb");
    FILE *fOut = _wfopen(L"uhistory2.txt", L"wb");
    size_t read;
    fwrite(&BOM, 2, 1, fOut);
    while ( (read = fread(aBuf, 1, 65536, fIn)) || !feof(fIn) )
    {
        MultiByteToWideChar(CP_ACP, 0, aBuf, read, uBuf, 65536);
        fwrite(uBuf, 2, read, fOut);
    }
    fclose(fIn);
    fclose(fOut);
 
#8
vefatica wrote:
| That's a god-awful sounding kludge and it assumes there is a problem
| and that the problem will always be the same. I doubt a **kludge**
| is necessary at all. I tried a couple of bare-bones experiments
| (code below) that certainly do not duplicate the redirection, but
| use the same buffer sizes (apparently) as TCC. One uses Win32, one
| uses C. The tests were blisteringly fast (total 0.04 sec for 2.2MB
| input). The files created were identical and error free.

Could you try something on the order of 32 MB (ASCII)? That's the size of
file I had problem converting to unicode. Actually, it was intended as a
test - start with unicode, use TYPE to convert it to ASCII, and use TYPE
again to reconvert to unicode, compare results. The NULs proved that the
result will not be a usable file at all until something is fixed.
--
Steve
 
#9
On Wed, 23 Sep 2009 21:48:20 -0500, Steve Fábián <> wrote:

|Could you try something on the order of 32 MB (ASCII)? That's the size of
|file I had problem converting to unicode. Actually, it was intended as a
|test - start with unicode, use TYPE to convert it to ASCII, and use TYPE
|again to reconvert to unicode, compare results. The NULs proved that the
|result will not be a usable file at all until something is fixed.

Why? The problem shows itself, with striking regularity and exactly as you
described it, using a 2MB file.
--
- Vince
 
#10
I don't know what it (below) means, but it does sort of characterize the error.

Using WinDbg, breaking on each MultiByteToWideChar, and executing "type file.txt > ufile.txt", I see that MultiByteToWideChar is called every 131072 bytes. When those bytes are written to the destination file, the last 8K bytes are NULs, every time. I once saw MBTWC called after (exactly) 120KB and then again after (exactly) 8KB, but that was well after the error had happened several times.
Again with WinDbg, this time breaking on WriteFile, after the BOM is written, the first call to WriteFile says to write data at buffer XXX ... all 0x20000 of the data is correctly at this location, but WriteFile is asked to write only 0x1E000 bytes. Then WriteFile is called again, asked to write (the remaining?) 0x2000 bytes but the buffer location passed to WriteFile is not where the correct data is (in the previously spec'd buffer) but rather at some other location containing all NULs. The actual numbers I see are:

Write 0x1E000 bytes located at 0x00d553B4 (correct 0x20000 bytes there)
Write 0x02000 bytes located at 0x00d913B4 (NULs there).
 
#11
vefatica wrote:
| Again with WinDbg, this time breaking on WriteFile, after the BOM is
| written, the first call to WriteFile says to write data at buffer
| XXX ... all 0x20000 of the data is correctly at this location, but
| WriteFile is asked to write only 0x1E000 bytes. Then WriteFile is
| called again, asked to write (the remaining?) 0x2000 bytes but the
| buffer location passed to WriteFile is not where the correct data is
| (in the previously spec'd buffer) but rather at some other location
| containing all NULs. The actual numbers I see are:
|
| Write 0x1E000 bytes located at 0x00d553B4 (correct 0x20000 bytes
| there)
| Write 0x02000 bytes located at 0x00d913B4 (NULs there).

Two questions:
1/ what is the program (I presume TCC)?
2/ what is the call hierarchy to get to the WriteFile calls?

For the 2nd question, I would not be surprised if TCC correctly calls a
high-level WinAPI, which in turn does the above mismanipulation...


>From the results of your investigation it is obvious that my request to try
really huge files was totally unnecessary.
--
Steve
 
#12
On Thu, 24 Sep 2009 10:11:32 -0500, Steve Fábián <> wrote:

|vefatica wrote:
|| Again with WinDbg, this time breaking on WriteFile, after the BOM is
|| written, the first call to WriteFile says to write data at buffer
|| XXX ... all 0x20000 of the data is correctly at this location, but
|| WriteFile is asked to write only 0x1E000 bytes. Then WriteFile is
|| called again, asked to write (the remaining?) 0x2000 bytes but the
|| buffer location passed to WriteFile is not where the correct data is
|| (in the previously spec'd buffer) but rather at some other location
|| containing all NULs. The actual numbers I see are:
||
|| Write 0x1E000 bytes located at 0x00d553B4 (correct 0x20000 bytes
|| there)
|| Write 0x02000 bytes located at 0x00d913B4 (NULs there).
|
|Two questions:
|1/ what is the program (I presume TCC)?

Yes, TCC.

|2/ what is the call hierarchy to get to the WriteFile calls?

If the stack info is correct,
takecmd.dll:Type_Cmd -> takecmd.dll:wwriteXP -> WriteFile

|For the 2nd question, I would not be surprised if TCC correctly calls a
|high-level WinAPI, which in turn does the above mismanipulation...

I don't think they come any higher than WriteFile. However, WriteFile could be
te result of something in the "C" runtime (like fwrite).
--
- Vince
 
#13
Again with WinDbg, this time breaking on WriteFile, after the BOM is written, the first call to WriteFile says to write data at buffer XXX ... all 0x20000 of the data is correctly at this location, but WriteFile is asked to write only 0x1E000 bytes. Then WriteFile is called again, asked to write (the remaining?) 0x2000 bytes but the buffer location passed to WriteFile is not where the correct data is (in the previously spec'd buffer) but rather at some other location containing all NULs. The actual numbers I see are:

Write 0x1E000 bytes located at 0x00d553B4 (correct 0x20000 bytes there)
Write 0x02000 bytes located at 0x00d913B4 (NULs there).
I see the something similar happening in a test app with fwrite() (in MSVCRT.DLL), but I don't get a corrupt file. When I say fwrite(128KB total) I get two calls to WriteFile, one 4K and one 124K. If I say 64K I get 4K and 60K; 32K gives 4K and 28K; and so on.