Thought someone might like this: Code: [TCC]$ commify --help commify - add commas to sequences of 4 consecutive integers, but only to the non-decimal portion of a string of numbers Usage: commify [-options] [file1 file2 ...] Options: -? = display this help message -h = display this help message -d = in the decimal portion, insert an underscore after every 3 digits [TCC]$ echo Test (123456789012) (-098765.234580980) (+2389089808.99934) | commify Test (123,456,789,012) (-098,765.234580980) (+2,389,089,808.99934) [TCC]$ echo Test (123456789012) (-098765.234580980) (+2389089808.99934) | commify -d Test (123,456,789,012) (-098,765.234_580_980) (+2,389,089,808.999_34) Here is the source: Code: @echo off :: filename: commify.btm :: author: Eric Pement :: date: 2012-04-16 14:38 CDT if "%1" == "/h" .or. "%1" == "/?" .or. "%1" == "--help" goto syntax :: locate sed -=[ CUSTOMIZE NEXT LINE ]=- set SEDEXE=c:\enp\bin\sed.exe if not exist %SEDEXE% goto no_sed if "%1" == "-d" goto decimal if "%1" == "" .and. %_pipe == 0 goto no_args :main %SEDEXE% -r ":a;s/(^|[^0-9.])([0-9]+)([0-9]{3})/\1\2,\3/g;ta" %$ goto end :decimal shift %SEDEXE% -r ":a;s/(^|[^0-9.])([0-9]+)([0-9]{3})/\1\2,\3/g;ta; :b;s/\.([0-9]{3}_)*([0-9]{3})([0-9])/.\1\2_\3/;tb" %$ goto end :no_sed call error_beep.btm echo ERROR! - Executable [%SEDEXE%] not found! Quitting ... goto end :no_args call error_beep.btm echoERR ` ERROR: No file to look for, nor values from STDIN!` echoERR `SYNTAX: commify [-options] [file1 file2 ...]` echo. :syntax TEXT commify - add commas to sequences of 4 consecutive integers, but only to the non-decimal portion of a string of numbers Usage: commify [-options] [file1 file2 ...] Options: -? = display this help message -h = display this help message -d = in the decimal portion, insert an underscore after every 3 digits ENDTEXT :end unset /q SEDEXE Enjoy!
Very nice. That said, is there a way to get the "thousands" separator from Windows' settings? That way, it would be easy enough to generalize this batch to convert, say, 123456 to 123.456 in Spanish-speaking countries (and others) and 123,456 in English-speaking countries (ditto for the decimal separator). Thanks.
Actually, you ought to use the ones used by TCC, available by %@option[ThousandsChar] rather than the one from windows. But there are a couple of other minor regionalization issues, too - some countries use 4-digit groups instead of 3-digit ones on the integer side, and 5-digit groups in the fractional side; furthermore, many actually use just the space character between groups. For example, in Hungary the comma is the decimal separator, space the thousands separator (3-digit groups), also the fractional part separator (5-digit groups). For full flexibility, you could use your own section in the .INI file, with appropriate keywords. Other points: - for tabularized data you also need the total width left and right of the decimal separator. - I think it would be better for this feature to be a function to deal with individual entries as they are generated rather than a postprocessor filter for a report file...
I tried using %@option[ThousandsChar], but I get the word "Auto", which is not appropriate here. For good generalization, I think it would be better to define the thousands separator as variable in the script and then let the user set it themselves (just as the path to sed.exe is defined in the script). Then again, non-English speakers wouldn't want to call it "commify" either. (smile)
I thought about some of the issues you mentioned. On using the space character instead of the underscore in post-decimal groups, I would often prefer that myself. But then I realized that I use commify in so many different contexts and occasionally count "words" from a line (à la "cut" or "awk"), so I decided against it. Likewise with tabularized data, there are several contexts when "commify" will modify (say) line #1, but line #2 contains only a 3-digit number, so it's not matched and hence not touched, so the tabular data is out of alignment. If you don't mind, could you illustrate succinctly what modifying the .INI file might look like, if one wanted to identify regionally-defined separators or groups? Thanks in advance.
This is probably too simplistic, but in TCC the current decimal separator is %@instr[1,1,%@eval[1/2]]. Then couldn't one assume that the group separator is comma when the decimal separator is dot, and dot when the decimal separator is comma? See also SETDOS /G