DIFFER questions....

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#3
Happy holidays Rex. If I want to check if the contents of 2 files are the same, the programs I use report CRC32s, so I guess besides CRC32 being easy to hack - what other reasons are there to use SHA* instead?
 
#5
SHA* will tell you if the files are the same. CRC32 will tell you that they're probably the same -- CRC32 is not guaranteed to be unique.
That's not true. It's mathematically impossible for any hash of limited length (or any finite combination of such hashes) to fail to have "collisions" (different files with the same hash). And the files don't have to be very big.

Take SHA512, for example. There are 2^512 different SHA512 values.

Now consider 65-byte files. There are 256^65 of them. That's 2^(8*65) = 2^520 ... more 65-byte files than possible SHA512 values.

Combine SHA512 and SHA256. There are (2^512) * (2^256) possible pairs of values. That's 2^768 ... same as any 768-bit hash. As above, the 97-byte files number 2^776 ... too many.
 
#8
I use DoubleKiller Pro to test for duplicate files. What is a program like DKP but that uses SHA* ? I guess maybe I could use someone telling me how CRC* and SHA* differ please?
 
#9
At a high level, the difference is that CRCs can be reversed (minus the data lost to the modulus function), so it's relatively trivial to determine what bytes to change and how to change them to create a file which matches a particular CRC. The SHA family is intentionally designed to make this far more difficult, and at least in theory changing one bit/byte/etc will randomize the output in an unpredictable way.

This largely due to intentional design decisions, CRC32 was designed to detect streaming and/or byte level errors with a minimum of CPU and memory usage whereas the SHA family was designed to provide cryptographically secure results.

SHA1 is approaching the point where it can be broken with a reasonable amount of computing power, probably in the hundreds of thousands of dollars worth of compute time which is within the range of a gov't or large corporation. It still has limited use for things like verifying files were not unintentionally corrupted in transit or on disk, but you should not rely on it to guarantee a file has not been modified from a security perspective if your adversary has resources and the expense of such resources is dropping rapidly, so if security is important you should use SHA256 (at a minimum).
 
Likes: Charles G
Sep 24, 2013
22
0
#10
Why use checksums at all? To compute checksums, you must open both files and read them through to the end. If you're going to do that, why not just do a byte-for-byte comparison?
Even better, if comparing byte-by-byte (or word-by-word) you can fail immediately when a difference is seen. With any checksum algorithm you always need to read the entire files.
 
#12
I looked at the DOCS for DIFFER - not what I want. I want to compare 2 directory trees and delete files that are exactly the same in the same sub-folder of the one specified on the command line.

=======================
SAME - Compare folder trees, optionally removing files in 2nd folder tree that are the same as 1st folder tree

/A: attribute select
/S transverse sub-folders
/D delete files in 2nd folder tree that are exact duplicates in 1st folder tree
... same optional DEL flags, /W /B among them.....
/SHA* 1, 512, etc

=========================

Something like this........
 

rconn

Administrator
Staff member
May 14, 2008
10,204
86
#13
Even better, if comparing byte-by-byte (or word-by-word) you can fail immediately when a difference is seen. With any checksum algorithm you always need to read the entire files.
That would be much slower if you have duplicates - DIFFER saves the hash values for all the files it finds & only needs to compare those.
 
#16
How would I compare two folder trees and make a list of files that are in the same sub-folder and has the same SHA* value and size?
I copied v:\22extract\ to v:\22extractxx. This is crude but it seems to work. It's slow because of all the @SHA512s. It would be easier to polish it up if it were in a BTM. It only does a one-way search, i.e., for each file in c:\22extract\ it looks for one in v:\22extractxx\. I suppose you could do it in both directions.
Code:
do f in /d"v:\22extract" /s * ( set full=%@full[%f] & set target=%@rereplace[(?i)v:\\22extract,v:\\22extractxx,%full] & if exist %target .and. %@sha512[%target] == %@sha512[%full] echo %full = %target )
The first few lines of output are.
Code:
V:\22extract\8FFDD1C = v:\22extractxx\8FFDD1C
V:\22extract\tcmd.exe = v:\22extractxx\tcmd.exe
V:\22extract\8FFDD1C\32-bit = v:\22extractxx\8FFDD1C\32-bit
V:\22extract\8FFDD1C\ANSI32.dll = v:\22extractxx\8FFDD1C\ANSI32.dll
P.S., the @SHA512 will probably choke if %target doesn't exist. That can be patched up.
 
#18
P.S., the @SHA512 will probably choke if %target doesn't exist. That can be patched up.
Thank you Vince.

1) What does "(?i)" mean?

2) do f in /d"v:\22extract" /a: /s *

if I wanted to do all files - even hidden / system?
The "(?i)" makes the regular expression case insensitive. I had a lot of trouble at first (without "(?i)") because @FULL[] was returning a string with the drive letter in uppercase and I had specified lowercase in the regular expression.

I imagine "/a:" (or maybe just "/a") will give you everything ... I didn't test it.
 
#20
One last thing - I have some files / folder names with "&" - how best to do the DO from above?
In each directory, I made a directory "a&b" containing a file "c&d". The only change I made to the command was to quote the (only) occurrence of %f. That worked, producing the expected output and no errors.
Code:
do f in /d"v:\22extract" /s * ( set full=%@full["%f"] & set target=%@rereplace[(?i)v:\\22extract,v:\\22extractxx,%full] & if exist %target .and. %@sha512[%target] == %@sha512[%full] echo %full = %target )
Here are two of the output lines.
Code:
"V:\22extract\a&b" = "v:\22extractxx\a&b"
"V:\22extract\a&b\c&d" = "v:\22extractxx\a&b\c&d"
 
#23
Vince,

I redirected the output to a text file so that any errors would be seen easily; Please notice the COMMENT block in the attached BTM.....

I have also attached a DIR /a: /b /s c:\DataBackup\* > list file for your assistance....
 

Attachments

#24
What about the COMMENT block?

I didn't read those (huge) TXT files very carefully, but it seems to work. Apparently you didn't need to quote %f after all. When I recommended putting it in a BTM, I meant something like what's below (new and improved (?) and tested). That makes it much easier to manage. To be more robust, it could use more bullet-proofing ... what if I had to pass it a quoted direcctory name? ... what if file names have "%" in them (I don't even want to touch that one!)?

Code:
set dir1=%@rereplace[\\,\\\\,%1]
set dir2=%@rereplace[\\,\\\\,%2]
gosub process > %3
quit

:process
do f in /d"%dir1" /s *
    set full=%@full["%f"]
    set target=%@rereplace[(?i)%dir1,%dir2,%full]
    iff exist %target then
        if %@sha512[%target] == %@sha512[%full] echo %full = %target
    endiff
enddo
return
I used it like this.
Code:
Dupfind.btm v:\22extract v:\22extractxx v:\dupfiles/txt
 
#26
Code:
[C:\Users\Galloway\Desktop\DupFind]Dupfind.btm c:\DataBackup i:\ C:\Users\Galloway\Desktop\DupFind\samefile.txt
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\Dupfind.btm [8]  The system cannot find the file specified.
 ""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\Dupfind.btm [8]  The system cannot find the file specified.
 ""
Any thoughts what might be causing the error messages?
 
#27
Line 8 is empty in the BTM you posted in the ZIP.

It's probably files with special characters in their names. The percent sign is likely because it's not protected by double quotes ... and the ampersand if you're still not quoting %f.

Give the BTM a more elaborate ON ERROR, something like

Code:
on error ( echo %full & echo %target & quit )
That might help you track down the problematic files.

These (and there are more) might be a problem.

C:\DataBackup\Program Files (x86)\Hp\Digital Imaging\help\DJ_AIO_05_F4400_readme\phone_list_urls_lar_weuro_ap-150%.png
C:\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\NativeClient\%COREALLUSERPATH%\TCSReminderTrace.log.xml
C:\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\Dependencies\%COREALLUSERPATH%\TCSReminderTrace.log.xml
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Functions~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Miscellaneous~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Plugins~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Scripting~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - TPIPE~.feed-ms
 
#29
I don't know Charles. I made such a file in both directories and ran the same (exactly) BTM. I got no errors and wound up with this in the output file.
Code:
"V:\22extract\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\ZH_TW_eula.html" = "v:\22extractxx\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\ZH_TW_eula.html"
Then I deleted the target file and ran the BTM again. Again I got no errors and the file name did not show in the output file.

Maybe it's an INI setting. I don't know.
 
#30
I ran the BTM in your quoted, LFN, maxlen post you started earlier today and I am getting several
Code:
[C:\Users\Galloway\Desktop\DupFind]MaxLen.btm c:\
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [3]  The system cannot find the file specified.
""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [3]  The system cannot find the file specified.
""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [3]  The system cannot find the file specified.
""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\ProgramData\Microsoft\Diagnosis\SoftLanding\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\ProgramData\Microsoft\Diagnosis\SoftLandingStage\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\ProgramData\Microsoft\WwanSvc\Profiles\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\System Volume Information\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Users\All Users\Microsoft\Diagnosis\SoftLanding\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Users\All Users\Microsoft\Diagnosis\SoftLandingStage\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Users\All Users\Microsoft\WwanSvc\Profiles\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\Registration\CRMLog\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\System32\com\dmp\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\System32\LogFiles\WMI\RtBackup\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\System32\spool\PRINTERS\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\SysWOW64\com\dmp\"
It is a recently Win 7 reloaded box....

Guess I could post my TCMD.INI - see if anything there is different...
 

Attachments