utf8 chcp regression in tcc17?

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
Dec 10, 2014
63
1
#1
There seems to be a change (or bug?) in tcc17 with chcp in the handling of unicode or native win1252 encodings, breaking all my batch files. Attached are test batch files encoded in either utf8 or win1252. This is what I get (note that the chcp 65001 is ignored on tcc17):

Code:
[C:\Program Files\JPSoft\TCMD16x64]ver
TCC  16.03.55 x64   Windows 7 [Version 6.1.7601]
[C:\Program Files\JPSoft\TCMD16x64]chcp 1252
Active code page: 1252
[C:\Program Files\JPSoft\TCMD16x64]umlaut-1252-le
TeßtÄÖÜäöü
[C:\Program Files\JPSoft\TCMD16x64]umlaut-utf8-le
TeßtÄÖÜäöü
[C:\Program Files\JPSoft\TCMD16x64]chcp 65001
Active code page: 65001
[C:\Program Files\JPSoft\TCMD16x64]umlaut-1252-le
Te�t������
[C:\Program Files\JPSoft\TCMD16x64]umlaut-utf8-le
TeßtÄÖÜäöü
... and ...

Code:
[C:\Program Files\JPSoft\TCMD17x64]ver
TCC  17.00.62 x64   Windows 7 [Version 6.1.7601]
[C:\Program Files\JPSoft\TCMD17x64]chcp 1252
Active code page: 1252
[C:\Program Files\JPSoft\TCMD17x64]umlaut-1252-le
TeßtÄÖÜäöü
[C:\Program Files\JPSoft\TCMD17x64]umlaut-utf8-le
TeßtÄÖÜäöü
[C:\Program Files\JPSoft\TCMD17x64]chcp 65001
Active code page: 65001
[C:\Program Files\JPSoft\TCMD17x64]umlaut-1252-le
TeßtÄÖÜäöü
[C:\Program Files\JPSoft\TCMD17x64]umlaut-utf8-le
TeßtÄÖÜäöü
The bug/change is the garbled output on the last line, while the 1252 output is fine (but shouldn't after the chcp to unicode)
 

Attachments

Last edited:

rconn

Administrator
Staff member
May 14, 2008
10,096
85
#2
TCC/LE has no UTF8 support. Just like CMD -- none at all. All you're seeing is whatever Windows decides to dump to the console, which is going to be semi-random depending on the font, keyboard code page, and console code page you're using.

UTF8 support (somewhat limited due to Windows not really supporting it internally) was added to TCC a couple of versions ago. I'll take a look at your files, but I certainly don't expect TCC v17's UTF8 support to exactly match TCC/LE's non-UTF8 support.
 
Dec 10, 2014
63
1
#3
TCC/LE has no UTF8 support. Just like CMD -- none at all. All you're seeing is whatever Windows decides to dump to the console, which is going to be semi-random depending on the font, keyboard code page, and console code page you're using.
It's not about writing these strings to the console, I'm using tcc to process image metadata after chcp'ing to 65001: read with exiftool, process, write with exiftool. And with tcc17, this chcp is definitely broken (i.e. ignored).

As to the "LE missing utf" theory: I installed multiple versions and traced it - it works up to tcc16 (writes the correct output to screen or files), but is broken in tcc17. So you can replace "[C:\Program Files\JPSoft\TCCLE13x64]" with "[C:\Program Files\JPSoft\TCMD16x64]".
 

rconn

Administrator
Staff member
May 14, 2008
10,096
85
#4
How is TCC involved if you're using exiftool? Exactly what are you reading / writing from TCC?

I assure you that like CMD, TCC/LE definitely has no UTF8 support, in any sense. TCC v16 has some support, TCC v17 has more. The significant difference in v17 is with handling extended ASCII characters (i.e., 128 - 255).
 
Dec 10, 2014
63
1
#5
How is TCC involved if you're using exiftool? Exactly what are you reading / writing from TCC?
I run exiftool to list image tags with 'set result=%@EXECARRAY[output,exiftool yadayadayada image.jpg]' and pull the 'tagname=tagcontent' result lines from the output array one by one into tcc variables. I then do some nice and cozy checking, rearranging and setting. After that, I assemble an exiftool command line like 'exiftool -xmp-dc:Title="I_löve_UTF" -xmp-dc:Author=%tagname image.jpg'

I assure you that like CMD, TCC/LE definitely has no UTF8 support, in any sense. TCC v16 has some support, TCC v17 has more. The significant difference in v17 is with handling extended ASCII characters (i.e., 128 - 255).
Roger that about LE, but the fact remains that the above command sequences even shows a difference in tcc16 on the one hand side and 17 on the other: chcp is simply ignored. The result is that if you're working with a text file encoded in utf8 and containing characters like 'ÄÖÜäöüß', you're hit for six.

You did run the batch file (utf vs. win1252 encoded) and saw the regression for yourself, right?
 
Last edited:
#6
I am interested in learning more about exiftool - just installed Advanced Renamer and would like to produce a file for each folder that has files with exif tags - like MP3 files - then also know what each tag is.

e.g. files in C:\Z_UserFiles1\MP3_001\ that are MP3 files and have EXIT data would then produce a CSV file in C:\UserFiles\ named MP3_001.CSV which could be massaged and outputted to a batch renamer script - or done with TCC/ TCMD outright.....

Be spending the next few days looking at the HELP for exiftool.... but would appreciate other help please...
 
Dec 10, 2014
63
1
#7
I am interested in learning more about exiftool - just installed Advanced Renamer and would like to produce a file for each folder that has files with exif tags - like MP3 files - then also know what each tag is.
For what it's worth, that is how I read select exif tags into tcc ... good luck :-)

setdos /x+6
unsetarray exifoutput >nul 2>nul
option //UnicodeOutput=No
if not isfile tags.txt exiftool -charset exiftool=LATIN -charset iptc=utf8 -f -S -L -n -XMP-plus:Custom1 -XMP-plus:Custom2 -XMP-plus:Custom3 -XMP-plus:Custom4 -XMP-plus:Custom5 image.jpg >!tags.txt
option //UnicodeOutput=Yes
setdos /x-6
setarray /r tags.txt exifoutput

setdos /x-456
do count=0 to %@EVAL[%@ARRAYINFO[exifoutput,5]-1]
setdos /x+7
set variable=%@REPLACE[ ,,%@REPLACE[/,,%@REPLACE[-,,%@INSTR[0,%@EVAL[%@INDEX["%exifoutput[%count]",:]-1],%exifoutput[%count]]]]]
set value=%@TRIMALL[%@INSTR[%@EVAL[ %@INDEX["%exifoutput[%count]",:]+1],%exifoutput[%count]]]
setdos /x-7
if "%value%" eq "-" unset value
if defined value set %variable=%value%
set exifinfo[%counter,%count%,0]=%variable
set exifinfo[%counter,%count%,1]=%value
enddo
 
Last edited:
#8
@Juppycmd - not sure what the variables should contain:

%srcimg% =
%exifcache% =

For what it's worth, that is how I read select exif tags into tcc ... good luck :-)

setdos /x+6
unsetarray exifoutput >nul 2>nul
option //UnicodeOutput=No
if not isfile tags.txt exiftool -charset exiftool=LATIN -charset iptc=utf8 -f -S -L -n -XMP-dc:Source %srcimg% -XMP-dc:Identifier -File:ImageHeight -File:ImageWidth -XMP-photoshop:Urgency -XMP-xmp:Rating -XMP-xmp:Label -XMP-iptcExt:MaxAvailHeight -XMP-iptcExt:MaxAvailWidth -XMP-dc:Title-de -XMP-dc:Title-en -XMP-dc:Description-de -XMP-dc:Description-en -XMP-plus:Custom1 -XMP-plus:Custom2 -XMP-plus:Custom3 -XMP-plus:Custom4 -XMP-plus:Custom5 >!%exifcache%
option //UnicodeOutput=Yes
setdos /x-6
setarray /r %exifcache% exifoutput

setdos /x-456
do count=0 to %@EVAL[%@ARRAYINFO[exifoutput,5]-1]
setdos /x+7
set variable=%@REPLACE[ ,,%@REPLACE[/,,%@REPLACE[-,,%@INSTR[0,%@EVAL[%@INDEX["%exifoutput[%count]",:]-1],%exifoutput[%count]]]]]
set value=%@TRIMALL[%@INSTR[%@EVAL[ %@INDEX["%exifoutput[%count]",:]+1],%exifoutput[%count]]]
setdos /x-7
if "%value%" eq "-" unset value
if defined value set %variable=%value%
set exifinfo[%counter,%count%,0]=%variable
set exifinfo[%counter,%count%,1]=%value
enddo