New plugin: SafeChars

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,384
39
Albuquerque, NM
prospero.unm.edu
#1
Here's a new approach to the old problem of reading and writing text which may contain 'dangerous' characters. This plugin reads text and remaps any problem characters to equivalents in the Unicode "Halfwidth and Fullwidth Forms" block ( http://unicode.org/charts/PDF/UFF00.pdf ). These characters have no significance to TCC and may be handled like any other text.

When you write the text back out with //UnicodeOutput=No, TCC's Unicode-to-ASCII translation will automatically restore these remapped characters to their original values. For //UnicodeOutput=Yes, I provide a function which replaces fullwidth punctuation with ASCII equivalents before writing it.

http://www.unm.edu/~cdye/plugins/safechars.html
 
#2
Charles Dye wrote:
| Here's a new approach to the old problem of reading and writing text
| which may contain 'dangerous' characters. This plugin reads text and
| remaps any problem characters to equivalents in the Unicode
| "Halfwidth and Fullwidth Forms" block (
| http://unicode.org/charts/PDF/UFF00.pdf ). These characters have no
| significance to TCC and may be handled like any other text.
|
| When you write the text back out with //UnicodeOutput=No, TCC's
| Unicode-to-ASCII translation will automatically restore these
| remapped characters to their original values. For
| //UnicodeOutput=Yes, I provide a function which replaces fullwidth
| punctuation with ASCII equivalents before writing it.
|
| http://www.unm.edu/~cdye/plugins/safechars.html

Thanks, Charles! IMHO this is a much better method than using the "binary"
buffer for text files containing arbitrary characters. It would be nice if
you could enhance it to use e.g. the SETDOS command to handle the actual
special characters, so that batch programs using it would not need a
different selection than the user's "normal" choice. One method that comes
to mind is to internally surround it with the equivalent of setlocal, setdos
"std", @safe..., endlocal.
--
Steve
 
#3
On Thu, 29 Oct 2009 11:17:08 -0500, Charles Dye <>
was claimed to have wrote:


>Here's a new approach to the old problem of reading and writing text
>which may contain 'dangerous' characters. This plugin reads text and
>remaps any problem characters to equivalents in the Unicode "Halfwidth
>and Fullwidth Forms" block ( http://unicode.org/charts/PDF/UFF00.pdf ).
>These characters have no significance to TCC and may be handled like
>any other text.
>
>When you write the text back out with //UnicodeOutput=No, TCC's
>Unicode-to-ASCII translation will automatically restore these remapped
>characters to their original values. For //UnicodeOutput=Yes, I
>provide a function which replaces fullwidth punctuation with ASCII
>equivalents before writing it.
>
>http://www.unm.edu/~cdye/plugins/safechars.html
This looks absolutely beautiful... Thanks!
 
#4
Sorry to tag this to an existing thread, but does any other email
subscriber see my last post to this thread as having two FROM fields?

From: "JP Software Forums" <neil@jpsoft.com>
From: thedave <>

Is this a known issue?
 
#5
thedave wrote:
| Sorry to tag this to an existing thread, but does any other email
| subscriber see my last post to this thread as having two FROM fields?
|
| From: "JP Software Forums" <neil@jpsoft.com>
| From: thedave <>
|
| Is this a known issue?

I received it with 2 "from" fields. It is how the mail-delivered version
started our, had been fixed, and now happens only occasionally.
--
Steve
 
#6
Charles Dye wrote:
| http://www.unm.edu/~cdye/plugins/safechars.html

I have just used it to find the aliases that invoke a specified batch file:

alias | ffind/vkmt"descript.btm" | ( setdos/x-4 %+ for %l in (@con:) ( set
l=%@safeenv[l] %+ echo %@format[-20,%@word["=",0,%l]]%@word["=",1,%l] ) )

Output:

desc*riptions call %bat\descript.btm DESC %&
descs call %bat\descript.btm DESCS %&
desca call %bat\descript.btm ALL %&
descas call %bat\descript.btm ALLS %&
undesc*ribed call %bat\descript.btm UNDSC %&
unds call %bat\descript.btm UNDS %&


Notice:

1/ I use 2 pipes - the first to find the desired aliases, the 2nd to format
the output
2/ the RH (reporting) pipe instance still needs SETDOS/x-4 so that
environment variables in the alias definition would not be expanded, but it
does not need to be cancelled, because it is localized to its own instance

Thanks for the neat tool!
--
Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,384
39
Albuquerque, NM
prospero.unm.edu
#7
I have just used it to find the aliases that invoke a specified batch file:

alias | ffind/vkmt"descript.btm" | ( setdos/x-4 %+ for %l in (@con: ) ( set
l=%@safeenv[l] %+ echo %@format[-20,%@word["=",0,%l]]%@word["=",1,%l] ) )

2/ the RH (reporting) pipe instance still needs SETDOS/x-4 so that
environment variables in the alias definition would not be expanded, but it
does not need to be cancelled, because it is localized to its own instance
I was twisting my brains trying to understand why you need the SETDOS -- that's exactly the kind of thing I'm trying to escape! And it finally dawned on me: You're bumping into one of those cmd.exe compatibility hacks that make Rex so reluctant to modify FOR. FOR variables are only created in the environment if the variable name is more than one character long....

Code:
alias | ffind/vkmt"descript.btm" | ( for %ln in (@con: ) ( set ln=%@safeenv[ln] %+ echo %@format[-20,%@word["=",0,%ln]]%@word["=",1,%ln] ) )
 
#8
Charles Dye wrote:
| ---Quote (Originally by Steve Fbin)---
| I have just used it to find the aliases that invoke a specified
| batch file:
|
| alias | ffind/vkmt"descript.btm" | ( setdos/x-4 %+ for %l in (@con:
| ) ( set
| l=%@safeenv[l] %+ echo
| %@format[-20,%@word["=",0,%l]]%@word["=",1,%l] ) )
|
| 2/ the RH (reporting) pipe instance still needs SETDOS/x-4 so that
| environment variables in the alias definition would not be expanded,
| but it
| does not need to be cancelled, because it is localized to its own
| instance ---End Quote---
| I was twisting my brains trying to understand why you need the
| SETDOS -- that's exactly the kind of thing I'm trying to escape!
| And it finally dawned on me: You're bumping into one of those
| cmd.exe compatibility hacks that make Rex so reluctant to modify
| FOR. FOR variables are only created in the environment if the
| variable name is more than one character long....
|
|
| Code:
| ---------
| alias | ffind/vkmt"descript.btm" | ( for %ln in (@con: ) ( set
| ln=%@safeenv[ln] %+ echo
| %@format[-20,%@word["=",0,%ln]]%@word["=",1,%ln] ) ) ---------

Thanks, Charles! I forgot about that hack. Your version works here, though I
need to change my font to display it. For your information, my V7
installation defaults to raster fonts, and it displayed everything
correctly. In other words, the plugin works well in V7! When I changed from
Lucida to Raster font in my stand-alone V11 screen, the already displayed
result changed from boxes to the correct characters. Great job!
--
Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,384
39
Albuquerque, NM
prospero.unm.edu
#9
Thanks, Charles! I forgot about that hack. Your version works here, though I need to change my font to display it. For your information, my V7 installation defaults to raster fonts, and it displayed everything correctly. In other words, the plugin works well in V7! When I changed from Lucida to Raster font in my stand-alone V11 screen, the already displayed result changed from boxes to the correct characters. Great job!
Thank you for that.

The V7 warning is not idle. Wherever I call a built-in function from one of my own, I have to do an internal SETDOS first (to prevent TCC from trying to expand stuff in the returned data). I restore the original SETDOS state afterwards from the saved value of %_EXPANSION. This variable didn't exist prior to V8, so SafeChars will lose your SETDOS settings in v7 under some circumstances....
 
#10
Charles Dye wrote:
| The V7 warning is not idle. Wherever I call a built-in function
| from one of my own, I have to do an internal SETDOS first (to
| prevent TCC from trying to expand stuff in the returned data). I
| restore the original SETDOS state afterwards from the saved value of
| %_EXPANSION. This variable didn't exist prior to V8, so SafeChars
| will lose your SETDOS settings in v7 under some circumstances....

Normally the expansion state is 0 here. In my only test in v7 SafeChars was
called only from a transient instance of 4NT (the right side of a pipe), so
restoring the expansion state was irrelevant.

I guess you could either save the SETDOS expansion state (using a redirected
output of the SETDOS command), or just report an error if the user
mistakenly calls SafeChars in v7. I'll unlink it from V7 for now.
--
Steve
 
#11
Charles Dye wrote:
| Here's a new approach to the old problem of reading and writing text
| which may contain 'dangerous' characters.
...

Charles, it would be nice if you could eliminate the limitations of reading
text from internal variables and functions, possibly by adding two new
functions: @SAFEVAR and @SAFEFUNC, the latter of which would probably be of
much greater benefit. I for one often use the clipboard as a very fast file,
and process its content a line at a time using @CLIP[].
--
TIA, Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,384
39
Albuquerque, NM
prospero.unm.edu
#12
Charles, it would be nice if you could eliminate the limitations of reading
text from internal variables and functions, possibly by adding two new
functions: @SAFEVAR and @SAFEFUNC, the latter of which would probably be of much greater benefit. I for one often use the clipboard as a very fast file, and process its content a line at a time using @CLIP[].
I'll look into it. (Does any internal variable ever return 'dangerous' characters?)
 
#13
Charles Dye wrote:
| ---Quote (Originally by Steve Fbin)---
| Charles, it would be nice if you could eliminate the limitations of
| reading
| text from internal variables and functions, possibly by adding two
| new
| functions: @SAFEVAR and @SAFEFUNC, the latter of which would
| probably be of much greater benefit. I for one often use the
| clipboard as a very fast file, and process its content a line at a
| time using @CLIP[]. ---End Quote---
| I'll look into it. (Does any internal variable ever return
| 'dangerous' characters?)

Some could, e.g., _winfgwindow _wintitle _winuser _cmdline, and esp.
_selected.
--
Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,384
39
Albuquerque, NM
prospero.unm.edu
#14
Charles, it would be nice if you could eliminate the limitations of reading
text from internal variables and functions, possibly by adding two new
functions: @SAFEVAR and @SAFEFUNC, the latter of which would probably be of much greater benefit. I for one often use the clipboard as a very fast file, and process its content a line at a time using @CLIP[].
It looks doable. @SAFEVAR and @SAFEFUNC turn out to be identical, so I'm just calling it @SAFEEXP.

I'm putting up a test build here: http://www.unm.edu/~cdye/plugins/safechars.html