Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Fixed Unexpected result from the @cksum function.

Nov
344
10
For the first time, I have run into an unexpected result from the @cksum function in TCC. I have a rather large zip file (almost 5 gigabytes), for which the @cksum function produces a different output than the cksum command in Unix systems. On the other hand, the @md5 function does produce the same result as the md5sum command in Unix systems.

Screenshots attached from TCC30 and TCC31 (on different machines), plus screenshots from an Amazon 2023 Linux system and a Ubuntu machine I have lying around here.


(I'm marking this with the "Bug" prefix, but I actually can't say that this IS a bug — only a differing result that needs to be verified as a bug in Take Command's code... heck, the error could lie in the implementation of the function in Unix systems, and that would be a hoot!)
 

Attachments

  • TCC30.png
    TCC30.png
    30 KB · Views: 19
  • TCC31.png
    TCC31.png
    21.8 KB · Views: 22
  • amazon.png
    amazon.png
    30 KB · Views: 22
  • ubuntu.png
    ubuntu.png
    84.3 KB · Views: 22
I don't suppose this monstro file is available for download anywhere?
 
The file is not important. I just whipped some up. TCC and WSL's Ubuntu agree for a file of size 2^32-1 and disagree for a file of size 2^32.

Code:
v:\> echo %@cksum[4gig.rnd] %@filesize[4gig.rnd]
1242243914 4294967295

/v> cksum ./4gig.rnd
1242243914 4294967295 ./4gig.rnd

Code:
v:\> echo %@cksum[4gig.rnd] %@filesize[4gig.rnd]
1633492633 4294967296

/v> cksum ./4gig.rnd
1391020473 4294967296 ./4gig.rnd
 
The file is not important. I just whipped some up. TCC and WSL's Ubuntu agree for a file of size 2^32-1 and disagree for a file of size 2^32.

Code:
v:\> echo %@cksum[4gig.rnd] %@filesize[4gig.rnd]
1242243914 4294967295

/v> cksum ./4gig.rnd
1242243914 4294967295 ./4gig.rnd

Code:
v:\> echo %@cksum[4gig.rnd] %@filesize[4gig.rnd]
1633492633 4294967296

/v> cksum ./4gig.rnd
1391020473 4294967296 ./4gig.rnd

Seems this is indeed the case. I located another humongous file in my hard drive and repeated the test, with the same result as you.

1704661110036.png
 
You can make big files pretty quickly using something like this (doublefile.btm).

Code:
setlocal
set file=%1
set times=%2
do i=1 to %times
    copy /q /b %file+%file doubled_file.tmp
    del /q %file
    ren /q doubled_file.tmp %file
enddo

If you double the size of a 16-byte (2^4) file 28 times you get a 2^32 byte file.

Code:
v:\> echos abcdefghijklmnop > 16bytes.txt

v:\> d 16*
2024-01-07  15:57              16  16bytes.txt

v:\> timer doublefile.btm 16bytes.txt 28
Timer 1 on: 15:57:45
Timer 1 off: 15:58:10  Elapsed: 0:00:25.079

v:\> d 16*
2024-01-07  15:58   4,294,967,296  16bytes.txt
 
I can reproduce this.

From the time it's taking, I think that TCC's @CKSUM is reading through the entire file. I suspect it's the final length checksum that's messing up; treating the file size as a DWORD perhaps?
 
I don't know exactly what these checksums are. Are they otherwise known as "CRC-32" checksums? In Explorer, if you send_to_compressed_folder, open the resulting ZIPfile, and add "CRC-32" to the details you see numbers that don't match with TCC or cksum, even in the <4GB case!

TCC

Code:
v:\> echo %@cksum[4gig-1.rnd] %@filesize[4gig-1.rnd]
2953096408 4294967295

cksum

Code:
/home/vefatica> cksum /v/4gig-1.rnd
2953096408 4294967295 /v/4gig-1.rnd

Explorer

1704662183264.png


Code:
v:\> eval 2953096408=h
B004ACD8
 
It is a CRC-32 checksum, but using a different polynomial and a slightly different algorithm than the one familiar to DOS/Windows users from e.g. PKZIP.

cksum - Wikipedia
 
@md5 doesn't have a problem. How is that function treating the file size?

I don't think MD5 hashes the file size, just all the data. Hashing the file size after the data is a peculiarity of CKSUM.
 
I can reproduce this.

From the time it's taking, I think that TCC's @CKSUM is reading through the entire file. I suspect it's the final length checksum that's messing up; treating the file size as a DWORD perhaps?
Good guess! I wonder if you're right. I have Gnu's source for coreutils (2004). It uses a 32-bit type for the file size. And it computes the file size as it reads the file. There is, however, this check.

Code:
      if (length + bytes_read < length)
    error (EXIT_FAILURE, 0, _("%s: file too long"), file);
 
I have never used cksum, on Unix or Dos/Windows. Never once (at least that I can remember), and I've been doing this sort of thing for longer than I care to admit. MD5, SHA, SHA256, certainly. But never once cksum. I found this with a quick google search:

  • cksum does a 32 bit ckecksum (CRC-32), while md5sum does a "more reliable" 128 bit checksum.
  • cksum being simpler, it may be faster in some cases, but it may also not be the case because md5sum has been highly optimized for speed.

If it were me, I would not try to "fix" this. I would simply consider it "deprecated", and remove it from the next version of TCC. I'm sure he has more important things to work on.

I checked, and there are two versions of cksum on my present system. Either one should do the trick if you REALLY need a cksum routine.

*which /A cksum
cksum is an external : C:\cygwin64\bin\cksum.exe
cksum is an external : C:\Program Files\Git\usr\bin\cksum.exe
 
If it were me, I would not try to "fix" this. I would simply consider it "deprecated", and remove it from the next version of TCC. I'm sure he has more important things to work on.

One could say the same things about @CRC32. At any rate, I suspect this will be a one-line fix.
 
One-line fix? Maybe, maybe not. I just looked for the source code for cksum program, found it here:


About 270 lines of 'c' code, with maybe 50 to 75 lines being block comments.

And yes, I would say exactly the same about CRC32. Or even more so, as that one is so obscure it's not included in Cygwin or Git.

ON EDIT: I just noticed, cksum and crc32 are included in BusyBox, if you really needed them.
 
However: that's something to fix in the CURRENT version. The "Hmm ... a bug, well we will fix that for the next version in some months" would be a really bad attitude.

It may not be that important to you, but it could be important to others (despite the fact that this checksum routine has of course no longer been state-of-the-art - but it could still make sense for certain things.)
 
Last edited:
One could say the same things about @CRC32. At any rate, I suspect this will be a one-line fix.
I'd bet it's one line also. In the one I built myself, using a 32-bit data type duplicated TCC's error perfectly; using a 64-bit data type fixed it.
 
I have never used cksum, on Unix or Dos/Windows. Never once (at least that I can remember), and I've been doing this sort of thing for longer than I care to admit. MD5, SHA, SHA256, certainly. But never once cksum. I found this with a quick google search:

  • cksum does a 32 bit ckecksum (CRC-32), while md5sum does a "more reliable" 128 bit checksum.
  • cksum being simpler, it may be faster in some cases, but it may also not be the case because md5sum has been highly optimized for speed.

If it were me, I would not try to "fix" this. I would simply consider it "deprecated", and remove it from the next version of TCC. I'm sure he has more important things to work on.

I checked, and there are two versions of cksum on my present system. Either one should do the trick if you REALLY need a cksum routine.

*which /A cksum
cksum is an external : C:\cygwin64\bin\cksum.exe
cksum is an external : C:\Program Files\Git\usr\bin\cksum.exe

I'm gonna have to disagree with you on that. The cksum command is part of POSIX and therefore guaranteed to exist on all POSIX-compliant systems (indeed, it's present on both Amazon Linux and Ubuntu, as shown above), but md5sum is not; while it is present on both systems as shown above, it is not on AIX 7.2 (used on the servers where I work at). cksum is, therefore, a good addition to Take Command because it eases the lives of admins who have to move files around differing systems (like me).

Also, requiring the usage of an external command, part of a sizeable package (both cygwin64 and git), kinda defeats the purpose of Take Command... AND that doesn't cover the %@cksum function, which I do use all the time on batch files, even more so than the cksum command itself.
 
When I was supporting large numbers of users, on linux and windows, I never once encountered a need for cksum or crc32. YMMV, but that was my experience. And I will add that I was not allowed to install TCC on my PC or anyone else's PC. I could get things added to the standard Red Hat distribution, or Windows for that matter, if I was willing to fill out paperwork and gain approval. But I failed to get permission for TCC, even as an "occasional exception".

Busybox was allowed, it was never made a part of the official baseline distribution, but I could copy it to the end user's PC and use it there. And I did use it, quite often.

Fixing or not fixing cksum is most definitely not my decision, I just made my opinion known.
 
Code:
C:\>which cksum
cksum is an external : C:\bin\cygwin64\bin\cksum.exe

C:\>cksum test.dat
1279818436 6442450944 test.dat

C:\>echo %@cksum[test.dat]
1279818436

C:\>hash /cksum test.dat
C:\test.dat : 1279818436

C:\>

I can confirm that TCC now gives results matching the Cygwin cksum.
 
Back
Top
[FOX] Ultimate Translator
Translate