# Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

# DIFFER questions....

#### Charles G

Why does DIFFER use the SHA* functions and not CRC32 ?

Because CRC32 is trivial to hack.

Happy holidays Rex. If I want to check if the contents of 2 files are the same, the programs I use report CRC32s, so I guess besides CRC32 being easy to hack - what other reasons are there to use SHA* instead?

SHA* will tell you if the files are the same. CRC32 will tell you that they're probably the same -- CRC32 is not guaranteed to be unique.

If the programs you use report CRC32's, you should get yourself some new programs.

SHA* will tell you if the files are the same. CRC32 will tell you that they're probably the same -- CRC32 is not guaranteed to be unique.
That's not true. It's mathematically impossible for any hash of limited length (or any finite combination of such hashes) to fail to have "collisions" (different files with the same hash). And the files don't have to be very big.

Take SHA512, for example. There are 2^512 different SHA512 values.

Now consider 65-byte files. There are 256^65 of them. That's 2^(8*65) = 2^520 ... more 65-byte files than possible SHA512 values.

Combine SHA512 and SHA256. There are (2^512) * (2^256) possible pairs of values. That's 2^768 ... same as any 768-bit hash. As above, the 97-byte files number 2^776 ... too many.

If there was such a thing as SHA32 - how would it compare with CRC32 ?

Why use checksums at all? To compute checksums, you must open both files and read them through to the end. If you're going to do that, why not just do a byte-for-byte comparison?

I use DoubleKiller Pro to test for duplicate files. What is a program like DKP but that uses SHA* ? I guess maybe I could use someone telling me how CRC* and SHA* differ please?

At a high level, the difference is that CRCs can be reversed (minus the data lost to the modulus function), so it's relatively trivial to determine what bytes to change and how to change them to create a file which matches a particular CRC. The SHA family is intentionally designed to make this far more difficult, and at least in theory changing one bit/byte/etc will randomize the output in an unpredictable way.

This largely due to intentional design decisions, CRC32 was designed to detect streaming and/or byte level errors with a minimum of CPU and memory usage whereas the SHA family was designed to provide cryptographically secure results.

SHA1 is approaching the point where it can be broken with a reasonable amount of computing power, probably in the hundreds of thousands of dollars worth of compute time which is within the range of a gov't or large corporation. It still has limited use for things like verifying files were not unintentionally corrupted in transit or on disk, but you should not rely on it to guarantee a file has not been modified from a security perspective if your adversary has resources and the expense of such resources is dropping rapidly, so if security is important you should use SHA256 (at a minimum).

Why use checksums at all? To compute checksums, you must open both files and read them through to the end. If you're going to do that, why not just do a byte-for-byte comparison?
Even better, if comparing byte-by-byte (or word-by-word) you can fail immediately when a difference is seen. With any checksum algorithm you always need to read the entire files.

Even better, if comparing byte-by-byte (or word-by-word) you can fail immediately when a difference is seen. With any checksum algorithm you always need to read the entire files.
And if the file sizes differ, you don't need to read even a single byte.

I looked at the DOCS for DIFFER - not what I want. I want to compare 2 directory trees and delete files that are exactly the same in the same sub-folder of the one specified on the command line.

=======================
SAME - Compare folder trees, optionally removing files in 2nd folder tree that are the same as 1st folder tree

/A: attribute select
/S transverse sub-folders
/D delete files in 2nd folder tree that are exact duplicates in 1st folder tree
... same optional DEL flags, /W /B among them.....
/SHA* 1, 512, etc

=========================

Something like this........

Even better, if comparing byte-by-byte (or word-by-word) you can fail immediately when a difference is seen. With any checksum algorithm you always need to read the entire files.

That would be much slower if you have duplicates - DIFFER saves the hash values for all the files it finds & only needs to compare those.

And if the file sizes differ, you don't need to read even a single byte.

How would I compare two folder trees and make a list of files that are in the same sub-folder and has the same SHA* value and size?

How would I compare two folder trees and make a list of files that are in the same sub-folder and has the same SHA* value and size?
I copied v:\22extract\ to v:\22extractxx. This is crude but it seems to work. It's slow because of all the @SHA512s. It would be easier to polish it up if it were in a BTM. It only does a one-way search, i.e., for each file in c:\22extract\ it looks for one in v:\22extractxx\. I suppose you could do it in both directions.
Code:
``do f in /d"v:\22extract" /s * ( set full=%@full[%f] & set target=%@rereplace[(?i)v:\\22extract,v:\\22extractxx,%full] & if exist %target .and. %@sha512[%target] == %@sha512[%full] echo %full = %target )``

The first few lines of output are.
Code:
``````V:\22extract\8FFDD1C = v:\22extractxx\8FFDD1C
V:\22extract\tcmd.exe = v:\22extractxx\tcmd.exe
V:\22extract\8FFDD1C\32-bit = v:\22extractxx\8FFDD1C\32-bit
V:\22extract\8FFDD1C\ANSI32.dll = v:\22extractxx\8FFDD1C\ANSI32.dll``````

P.S., the @SHA512 will probably choke if %target doesn't exist. That can be patched up.

Thank you Vince.

1) What does "(?i)" mean?

2) do f in /d"v:\22extract" /a: /s *

if I wanted to do all files - even hidden / system?

P.S., the @SHA512 will probably choke if %target doesn't exist. That can be patched up.
Thank you Vince.

1) What does "(?i)" mean?

2) do f in /d"v:\22extract" /a: /s *

if I wanted to do all files - even hidden / system?

The "(?i)" makes the regular expression case insensitive. I had a lot of trouble at first (without "(?i)") because @FULL[] was returning a string with the drive letter in uppercase and I had specified lowercase in the regular expression.

I imagine "/a:" (or maybe just "/a") will give you everything ... I didn't test it.

One last thing - I have some files / folder names with "&" - how best to do the DO from above?

One last thing - I have some files / folder names with "&" - how best to do the DO from above?
In each directory, I made a directory "a&b" containing a file "c&d". The only change I made to the command was to quote the (only) occurrence of %f. That worked, producing the expected output and no errors.
Code:
``do f in /d"v:\22extract" /s * ( set full=%@full["%f"] & set target=%@rereplace[(?i)v:\\22extract,v:\\22extractxx,%full] & if exist %target .and. %@sha512[%target] == %@sha512[%full] echo %full = %target )``

Here are two of the output lines.
Code:
``````"V:\22extract\a&b" = "v:\22extractxx\a&b"
"V:\22extract\a&b\c&d" = "v:\22extractxx\a&b\c&d"``````

Thank you again Vince. I was trying to solve it by playing with SETDOS /C

Thank you again Vince. I was trying to solve it by playing with SETDOS /C
If you put it into a BTM, which I recommend, you won't need the "&" characters and SETDOS might work nicely.

Vince,

I redirected the output to a text file so that any errors would be seen easily; Please notice the COMMENT block in the attached BTM.....

I have also attached a DIR /a: /b /s c:\DataBackup\* > list file for your assistance....

#### Attachments

• DupFind.zip
789.2 KB · Views: 298

I didn't read those (huge) TXT files very carefully, but it seems to work. Apparently you didn't need to quote %f after all. When I recommended putting it in a BTM, I meant something like what's below (new and improved (?) and tested). That makes it much easier to manage. To be more robust, it could use more bullet-proofing ... what if I had to pass it a quoted direcctory name? ... what if file names have "%" in them (I don't even want to touch that one!)?

Code:
``````set dir1=%@rereplace[\\,\\\\,%1]
set dir2=%@rereplace[\\,\\\\,%2]
gosub process > %3
quit

:process
do f in /d"%dir1" /s *
set full=%@full["%f"]
set target=%@rereplace[(?i)%dir1,%dir2,%full]
iff exist %target then
if %@sha512[%target] == %@sha512[%full] echo %full = %target
endiff
enddo
return``````

I used it like this.
Code:
``Dupfind.btm v:\22extract v:\22extractxx v:\dupfiles/txt``

i just meant that some of the files processed by the BTM returned the error "The system cannot find the file specified." - so just trying to find what filename / char would cause that error message.....

Code:
``````[C:\Users\Galloway\Desktop\DupFind]Dupfind.btm c:\DataBackup i:\ C:\Users\Galloway\Desktop\DupFind\samefile.txt
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\Dupfind.btm [8]  The system cannot find the file specified.
""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\Dupfind.btm [8]  The system cannot find the file specified.
""``````

Any thoughts what might be causing the error messages?

Line 8 is empty in the BTM you posted in the ZIP.

It's probably files with special characters in their names. The percent sign is likely because it's not protected by double quotes ... and the ampersand if you're still not quoting %f.

Give the BTM a more elaborate ON ERROR, something like

Code:
``on error ( echo %full & echo %target & quit )``

These (and there are more) might be a problem.

C:\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\NativeClient\%COREALLUSERPATH%\TCSReminderTrace.log.xml
C:\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\Dependencies\%COREALLUSERPATH%\TCSReminderTrace.log.xml
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Functions~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Miscellaneous~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Plugins~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - Scripting~.feed-ms
C:\DataBackup\Users\Galloway\AppData\Local\Microsoft\Feeds\T&T - TPIPE~.feed-ms

I am attaching errors.txt and the BTM.....

#### Attachments

• Dupfind.btm
390 bytes · Views: 232
• errors.txt
254 bytes · Views: 273
I don't know Charles. I made such a file in both directories and ran the same (exactly) BTM. I got no errors and wound up with this in the output file.
Code:
``"V:\22extract\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\ZH_TW_eula.html" = "v:\22extractxx\DataBackup\Program Files (x86)\Hewlett-Packard\HP Setup\ZH_TW_eula.html"``

Then I deleted the target file and ran the BTM again. Again I got no errors and the file name did not show in the output file.

Maybe it's an INI setting. I don't know.

I ran the BTM in your quoted, LFN, maxlen post you started earlier today and I am getting several
Code:
``````[C:\Users\Galloway\Desktop\DupFind]MaxLen.btm c:\
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [3]  The system cannot find the file specified.
""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [3]  The system cannot find the file specified.
""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [3]  The system cannot find the file specified.
""
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\ProgramData\Microsoft\Diagnosis\SoftLanding\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\ProgramData\Microsoft\Diagnosis\SoftLandingStage\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\ProgramData\Microsoft\WwanSvc\Profiles\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\System Volume Information\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Users\All Users\Microsoft\Diagnosis\SoftLanding\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Users\All Users\Microsoft\Diagnosis\SoftLandingStage\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Users\All Users\Microsoft\WwanSvc\Profiles\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\Registration\CRMLog\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\System32\com\dmp\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\System32\LogFiles\WMI\RtBackup\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\System32\spool\PRINTERS\"
TCC: (Sys) C:\Users\Galloway\Desktop\DupFind\MaxLen.btm [9]  The system cannot find the path specified.
"C:\Windows\SysWOW64\com\dmp\"``````

It is a recently Win 7 reloaded box....

Guess I could post my TCMD.INI - see if anything there is different...

#### Attachments

• TCMD.INI
3.5 KB · Views: 242

Replies
5
Views
2K
Replies
1
Views
1K
Replies
3
Views
462
Replies
9
Views
1K
Replies
1
Views
808
Replies
2
Views
2K
Replies
25
Views
7K
Replies
2
Views
2K
Replies
6
Views
2K
Documentation Questions re: ON
Replies
13
Views
3K
Replies
5
Views
2K
Replies
5
Views
2K
Replies
4
Views
2K
Replies
0
Views
1K
Replies
5
Views
2K
Replies
10
Views
3K
Replies
7
Views
3K
Replies
3
Views
2K
Replies
4
Views
2K
Replies
4
Views
2K
Replies
3
Views
3K
Replies
8
Views
3K
Replies
18
Views
4K
Replies
35
Views
7K
Replies
15
Views
4K
Replies
3
Views
2K
Replies
1
Views
3K
Replies
16
Views
5K
Replies
2
Views
2K
C
Replies
9
Views
4K
Replies
4
Views
3K
Replies
2
Views
3K
Replies
8
Views
4K
Replies
2
Views
3K
Replies
3
Views
4K