commify

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#1
Thought someone might like this:
Code:
[TCC]$ commify --help
commify - add commas to sequences of 4 consecutive integers, but only
   to the non-decimal portion of a string of numbers

Usage:    commify [-options] [file1 file2 ...]

Options:
   -? = display this help message
   -h = display this help message
   -d = in the decimal portion, insert an underscore after every 3 digits

[TCC]$ echo Test (123456789012) (-098765.234580980) (+2389089808.99934) | commify
Test (123,456,789,012) (-098,765.234580980) (+2,389,089,808.99934)

[TCC]$ echo Test (123456789012) (-098765.234580980) (+2389089808.99934) | commify -d
Test (123,456,789,012) (-098,765.234_580_980) (+2,389,089,808.999_34)
Here is the source:

Code:
@echo off
:: filename: commify.btm
::   author: Eric Pement
::     date: 2012-04-16 14:38 CDT

if "%1" == "/h" .or. "%1" == "/?" .or. "%1" == "--help"    goto syntax
 
:: locate sed  -=[ CUSTOMIZE NEXT LINE ]=-
set SEDEXE=c:\enp\bin\sed.exe

if not exist %SEDEXE%                goto no_sed
if "%1" == "-d"                      goto decimal
if "%1" == "" .and. %_pipe == 0      goto no_args

:main
%SEDEXE% -r ":a;s/(^|[^0-9.])([0-9]+)([0-9]{3})/\1\2,\3/g;ta" %$
goto end

:decimal
shift
%SEDEXE% -r ":a;s/(^|[^0-9.])([0-9]+)([0-9]{3})/\1\2,\3/g;ta; :b;s/\.([0-9]{3}_)*([0-9]{3})([0-9])/.\1\2_\3/;tb" %$
goto end

:no_sed
call error_beep.btm
echo ERROR! - Executable [%SEDEXE%] not found! Quitting ...
goto end

:no_args
call error_beep.btm
echoERR ` ERROR: No file to look for, nor values from STDIN!`
echoERR `SYNTAX: commify [-options] [file1 file2 ...]`
echo.

:syntax
TEXT
commify - add commas to sequences of 4 consecutive integers, but only
   to the non-decimal portion of a string of numbers

Usage:    commify [-options] [file1 file2 ...]

Options:
   -? = display this help message
   -h = display this help message
   -d = in the decimal portion, insert an underscore after every 3 digits
ENDTEXT

:end
unset /q SEDEXE
Enjoy!
 
#3
Very nice.

That said, is there a way to get the "thousands" separator from Windows' settings? That way, it would be easy enough to generalize this batch to convert, say, 123456 to 123.456 in Spanish-speaking countries (and others) and 123,456 in English-speaking countries (ditto for the decimal separator).

Thanks.
 
#4
Actually, you ought to use the ones used by TCC, available by %@option[ThousandsChar] rather than the one from windows. But there are a couple of other minor regionalization issues, too - some countries use 4-digit groups instead of 3-digit ones on the integer side, and 5-digit groups in the fractional side; furthermore, many actually use just the space character between groups. For example, in Hungary the comma is the decimal separator, space the thousands separator (3-digit groups), also the fractional part separator (5-digit groups). For full flexibility, you could use your own section in the .INI file, with appropriate keywords.

Other points:
- for tabularized data you also need the total width left and right of the decimal separator.
- I think it would be better for this feature to be a function to deal with individual entries as they are generated rather than a postprocessor filter for a report file...
 
#5
. . . is there a way to get the "thousands" separator from Windows' settings? That way, it would be easy enough to generalize this batch to convert, say, 123456 to 123.456 in Spanish-speaking countries (and others) and 123,456 in English-speaking countries (ditto for the decimal separator).
I tried using %@option[ThousandsChar], but I get the word "Auto", which is not appropriate here. For good generalization, I think it would be better to define the thousands separator as variable in the script and then let the user set it themselves (just as the path to sed.exe is defined in the script).

Then again, non-English speakers wouldn't want to call it "commify" either. (smile)
 
#6
Actually, you ought to use the ones used by TCC, available by %@option[ThousandsChar] rather than the one from windows. But there are a couple of other minor regionalization issues, too - some countries use 4-digit groups instead of 3-digit ones on the integer side, and 5-digit groups in the fractional side; furthermore, many actually use just the space character between groups. For example, in Hungary the comma is the decimal separator, space the thousands separator (3-digit groups), also the fractional part separator (5-digit groups). For full flexibility, you could use your own section in the .INI file, with appropriate keywords.

Other points:
- for tabularized data you also need the total width left and right of the decimal separator.
- I think it would be better for this feature to be a function to deal with individual entries as they are generated rather than a postprocessor filter for a report file...
I thought about some of the issues you mentioned. On using the space character instead of the underscore in post-decimal groups, I would often prefer that myself. But then I realized that I use commify in so many different contexts and occasionally count "words" from a line (à la "cut" or "awk"), so I decided against it. Likewise with tabularized data, there are several contexts when "commify" will modify (say) line #1, but line #2 contains only a 3-digit number, so it's not matched and hence not touched, so the tabular data is out of alignment.

If you don't mind, could you illustrate succinctly what modifying the .INI file might look like, if one wanted to identify regionally-defined separators or groups?
Thanks in advance.
 
May 31, 2008
376
2
#7
This is probably too simplistic, but in TCC the current decimal separator is %@instr[1,1,%@eval[1/2]]. Then couldn't one assume that the group separator is comma when the decimal separator is dot, and dot when the decimal separator is comma? See also SETDOS /G