Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

extract email addresses from FROM, TO, CC, and BCC in EML files

Apr
1,808
17
Code:
[C:\Users\csgal\OneDrive\Desktop\Export]tpipe /input=*.eml,255,1 /simple=28 /output=output.txt
The filename, directory name, or volume label syntax is incorrect.
 "*.eml"

Any idea what i am doing wrong?

/INPUT is:

Code:
/input=filename[,subfolders[,action]]

filename - Filename or folder to read. This can be either a disk file, file list (@filename), or CLIP:. If it is not specified, TPIPE will read from standard input.

subfolders - How many subfolders to include (default 0):
0 - no subfolders
1 to 255 - subfolder(s)
255 - all subfolders

action - the action to take (default 1):
1 - include the files
2 - exclude the files
3 - ignore the files
 
Simply, "*.eml" is not one of the stated three things allowed as the filename.
 
I have the file file processing fixed just by doing a

Code:
for /R %fn in (*.eml) (echo Working on %fn... & tpipe /input="%fn" /simple=28 /output=c:\fldr\outfile.txt /outputappend=1)

but that extracts just the [email protected] - not the name associated with the email address. I am not clear about using /Regex or associated verbs - so what I guess I'd like is a regex to extract just the FROM, To, BCC, and CC names and addresses- where each might be multiple lines.

Once that is done - a way to output each name and address pair on a new line.....
 
This regex might get you started.

"^(To|From|Cc|Bcc):.*$"

Code:
v:\> tpipe /input="Re A question about pointers.txt" /grep=3,0,0,1,0,0,0,0,"^(To|From|Cc|Bcc):.*$"
From:   Joe Blow <[email protected]>
To: Vincent E Fatica

I don't have .EML files but I believe the headers should be plain text, with header names at the beginning of a line in Mixed case followed by a colon.
 
here are some EML files - basically TXT files. the regex above works well but not if the to/cc/bcc are longer then 1 line.
 

Attachments

  • HM_CSG.zip
    135.4 KB · Views: 142
here are some EML files - basically TXT files. the regex above works well but not if the to/cc/bcc are longer then 1 line.
Yeah, I thought of that. The rules are (I believe) that if a header is continued on a new line, that line begins with a space or a tab and that the headers are separated from whatever comes next by a completely empty line. You could add to the regex lines that start with a space or a tab but that would give you a lot more than you want, notably continuations of headers which you're not interested in.

I did a very brief search for software to help and I didn't find anything. Maybe a BTM is in order. It could read the file line by line, keeping track of whether you're inside or outside a header of interest.
 
Is there a way to /grep for the additional lines as long as the first chatacter is a white space/tab or similar character?
 
I don't know. TPIPE has subfilters but I don't know how to use them

Maybe this will help. It seems to do the right thing (far below) on my modified version of one of your files.

Code:
c:\users\vefatica\desktop\hm_csg\hm_csg\inbox> type parse.btm
setlocal
setdos /x-1256789A
set inheader=false
do line in @%1
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            echo %line
            set inheader=true
    elseiff %inheader == true then
        iff %@regex["^[ \t]",%line] == 1 then
            echo %line
        else
            set inheader=false
        endiff
    endiff
enddo
:done
setdos /x+1256789A

Code:
c:\users\vefatica\desktop\hm_csg\hm_csg\inbox> parse.btm "01-Transfer of Google data requested.eml"
From: Google <[email protected]>
To: [email protected]
 [email protected]
    JOE <[email protected]>
Cc: person <[email protected]>
  MARY <[email protected]>
  [email protected]
 
Here's another test of that BTM.

Code:
c:\users\vefatica\desktop\hm_csg\hm_csg\inbox> do f in *.eml ( parse.btm "%f" )

Edit: It did the right thing, but I deleted the output since it's probably not a good idea to post a lot of valid email addresses.
 
Last edited:
@vefatica - I am trying to modify parse.btm so that it writes all address for TO, CC, and BCC all on the same line - the problem I'm trtying to solve now is if FROM and TO appear on consecutive lines, such as "01-Transfer of Google data requested.eml" in post #5 above.

The current parse is below...

Code:
COMMENT

    Trying to put all all names, addresses for TO's, CC's and BCC's all on one line.

    Works fine except if From, TO appear on consecutive lines...

ENDCOMMENT

setlocal
setdos /x-1256789A
set inheader=false
echo ===============================================================================
echo File: %fn
echo ===============================================================================
do line in @%1
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
        echos %@trim[%line]
        set inheader=true
    elseiff %inheader == true then
        iff %@regex["^[ \t]",%line] == 1 then
          echos ` `%@trim[%line]
        else
          echo ' '
          set inheader=false
        endiff
    endiff
enddo
:done
setdos /x+1256789A
 
Go to a new line every time you encounter a header of interest. ... as you have it, but ...

Code:
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            echo.
            echos %@trim[%line]
            set inheader=true
 
You also should use %@char[32]. ` ` won't work because the special meaning of ` has been turned off.

I don't know what you're doing with echo ' ' in the else clause.

I believe two addresses on the same line will be separated by a comma. Add a comma in other cases line this.

Code:
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            echo.
            echos %@trim[%line]
            set inheader=true
    elseiff %inheader == true then
        iff %@regex["^[ \t]",%line] == 1 then
            echos ,%@char[32]%@trim[%line]
        else
            set inheader=false
        endiff
    endiff

I'm getting output like this.



compared to ...

 
@vefatica - thanks for your help! One thing I do see is that it will cause, given the command "parse.btm sample.eml > out.txt", the first line of out.txt to be blank. Anyway to have the fist line not be blank?
 
Hmmm! You could pipe to (for example)

Code:
tail /n 10 /n+1

or to

Code:
findstr /v /r "^$"

The first will give 10 lines (enough for one file) and skip the first line. The second will get rid of all empty lines (there should be only one).

Or, depending on your taste,

Code:
set newline=no
do line in @%1
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            if %newline == yes (echo.) else (set newline=yes)
            echos %@trim[%line]
            set inheader=true

And if you want a newline at the very end (there isn't one) ...

Code:
:done
echo.
setdos /x+1256789A
 
parse.btm is currently:

Code:
COMMENT

    Trying to put all all names, addresses for TO's, CC's and BCC's all on one line.

ENDCOMMENT

setlocal
setdos /x-1256789A
echo ===============================================================================
echo File: %fn
echo ===============================================================================
set inheader=false
set newline=no
do line in @%1
    if "%line" == "" goto done
  iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
    if %newline == yes (echo.) else (set newline=yes)
    echos %@trim[%line]
    set inheader=true
  elseiff %inheader == true then
    iff %@regex["^[ \t]",%line] == 1 then
      echos %@char[32]%@trim[%line]
    else
      set inheader=false
    endiff
  endiff
enddo
:done
echo.
setdos /x+1256789A
 
I created a file with this contents:
Code:
do l in @foo.txt (do i=1 to %((%@words[":,",%l]-1)) (echo %@word[":,",0,%l]: %@word[":,",%i,%l]))
TO:  [email protected]
TO:  [email protected]
TO:  [email protected]
CC:  [email protected]
CC:  [email protected]
CC:  [email protected]
BCC:  [email protected]
BCC:  [email protected]
BCC:  [email protected]
 
A huge thank you to @samintz for the above post! Elegant. Now I have to see what the DO line does....

Guess it's RTM
 
the above parse.btm does work as expected. However if i wanted to check for email addresses when not in the header - how best and fairly efficiently could I use the search string that email[line] has and optionally what if the email address continues to the next line?
 
I guess what i am looking for is a mod to parse.btm that will extract names/addresses from "from/to/cc/bcc" then any email addresses that are in the body. tpipe /simple=28 extracts from the whole email file. Tpipe is working but i am juust interested in the aforementioned parts of the email file.
 
@samintz - parse.btm above works wonders but I'd like to add looking for email addresses in the body of the email. The start of the email is defined as:

> Per standard, after two consecutive CRLF (new lines - carriage return and line feed).
> It is not just for EML file - all emails are in that form. If there is a need to break headers
> into lines the next line must start with a space or tab to avoid having two consecutive CRLF.

so basically looking for random text lengths where @email[randomtext] = true.

From: eWriter help:
@EMAIL TCC internal variable function
@EMAIL uses the regular expression:

"^[\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?$"

to validate the address. This matches 99.99% of valid email address including ip's (which are rarely used). Allows for a-z0-9_.- in the username, but not ending in a full stop (i.e [email protected] is invalid) and a-z0-9- as the optional sub domain(s) with domain name and a 2-7 char (a-z) tld.
 
@vefatica and @samintz - - - -
====================
How would I modify the parse.btm in #15 above so that the rest of the file "after two consecutive CRLF" are sent to

%TMP%\Parse_1.eml

As basically i would like to check the body of the EML for any email addresses where tpipe /simple=18 would return the list of any email addresses please?
 
As i don't want to read the whole file with parse.btm ( #16 above ) then use, FFIND and Tail again, what is the condition I would need to check for so parse.btm stops when a CRLFCRLF is found? I guess when a blank line is found

Actually isn't

if "%line" == "" goto done

what i need, which is already done?
 
@samintz - how would I modify message #15 to incorporate your message #17 - so the FROM / TO / CC / BCC / Reply-To with multiple entries appear as has each on seperate lines? also - wouldn't doing FWrite and not just output redirection speed up processing? Here is the current parse.btm:

Code:
COMMENT

    Trying to put all all names, addresses for TO's, CC's and BCC's all on one line.

ENDCOMMENT

setlocal
  setdos /x-1256789A

  set sOut28=%UserProfile%\Desktop\Parse_EMLs\Parse_28.lst
  set sOut43=%UserProfile%\Desktop\Parse_EMLs\Parse_43.lst

  set sTmpFile=%tmp%\TCMD_TPIPE.txt

  IF NOT ISDIR "%UserProfile%\Desktop\Parse_EMLs" MD /S "%UserProfile%\Desktop\Parse_EMLs"

  set nFiles=0

  DEL /S /E /Y /Q %sOut28 %sOut43

    for /r %fn in (*.eml) do ( set nFiles=%@inc[nFiles] & gosub SubParse "%fn" )

  echo Files: %@comma[%nFiles]

  setdos /x+1256789A
  QUIT

:SubParse
  echo ===============================================================================
  echo Processng: %@full[%fn]
  echo ===============================================================================
  set inheader=false
  set newline=no
  set foundsearch=false
  do line in @fn
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc|Reply-To):",%line] == 1 then
      if %newline == yes (echo.) else (set newline=yes)
      echos %@trim[%line]
      set inheader=true
      set foundsearch=true
    elseiff %inheader == true then
      iff %@regex["^[ \t]",%line] == 1 then
        echos %@char[32]%@trim[%line]
      else
        set inheader=false
      endiff
    endiff
  enddo
  :done
  iff %foundsearch == "false" then
    echo *** NO Search terms found ***
    echoerr *** NO Search terms found ***
  else
    rem find first black line
    set nLocation=%@execstr[ffind /k /m /l /e"^$" %1]
    set nLineNumber=%@strip[[],%nLocation]

    rem make sure a black line was found above
    iff %nLineNumber > 0 then
      tail /n+%nLineNumber "%fn" >! %sTmpFile

      rem tpipe on "%tmp%\TCMD_TPIPE.txt" to get email addresses in the BODY of the EML
      TPIPE /input=%sTmpFile /outputappend=1 /output=%sOut28 /simple=28

      rem tpipe on "%tmp%\TCMD_TPIPE.txt" to get URLs in the BODY of the EML
      TPIPE /input=%sTmpFile /outputappend=1 /output=%sOut43 /simple=43

    else
      Echoerr No blank line: %@full[%fn]
    endiff
  endiff
  echo.
  return
 
Here is the current parse.btm:

Code:
COMMENT

    Trying to put all names, addresses for TO's, CC's and BCC's all on one line.

ENDCOMMENT

setlocal
  setdos /x-1256789A

  set sOut28=%UserProfile%\Desktop\Parse_EMLs\Parse_28.lst
  set sOut43=%UserProfile%\Desktop\Parse_EMLs\Parse_43.lst

  set sTmpFile=%tmp%\TCMD_TPIPE.txt

  IF NOT ISDIR "%UserProfile%\Desktop\Parse_EMLs" MD /S "%UserProfile%\Desktop\Parse_EMLs"

  set nFiles=0

  DEL /S /E /Y /Q %sOut28 %sOut43

    for /r %fn in (*.eml) do gosub SubParse "%fn"

  echo Files: %@comma[%nFiles]

  setdos /x+1256789A
  QUIT

:SubParse
  echo ===============================================================================
  echo Processng: %@full[%fn]
  echo ===============================================================================
  set nFiles=%@inc[nFiles]
  set inheader=false
  set newline=no
  set foundsearch=false
  do line in @%fn
    rem pause inside do line....
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc|Reply-To):",%line] == 1 then
      if %newline == yes (echo.) else (set newline=yes)
      echos %@trim[%line]
      set inheader=true
      set foundsearch=true
    elseiff %inheader == true then
      iff %@regex["^[ \t]",%line] == 1 then
        echos %@char[32]%@trim[%line]
      else
        set inheader=false
      endiff
    endiff
  enddo
  rem pause :Done
  :done
  iff %foundsearch == "false" then
    echo *** NO Search terms found ***
    echoerr *** NO Search terms found ***
  else
    rem find first black line *** error seems to be next few lines
    pause set nLocation=%@execstr[ffind /k /m /l /e"^$" "%fn"]

    set nLocation=%@execstr[ffind /k /m /l /e"^$" "%fn"]

    pause nLocation : [%nLocation]

    set nLineNumber=%@strip[[],%nLocation]

    pause nLineNumber : [%nLineNumber]

    rem make sure a black line was found above
    iff %nLineNumber > 0 then
      tail /n+%nLineNumber "%fn" >! %sTmpFile

      rem tpipe on "%tmp%\TCMD_TPIPE.txt" to get email addresses in the BODY of the EML
      TPIPE /input=%sTmpFile /outputappend=1 /output=%sOut28 /simple=28

      rem tpipe on "%tmp%\TCMD_TPIPE.txt" to get URLs in the BODY of the EML
      TPIPE /input=%sTmpFile /outputappend=1 /output=%sOut43 /simple=43

    else
      Echoerr No blank line: %@full[%fn]
    endiff
  endiff
  echo.
  pause
  return

The error seems to be marked with "***"

in execution:

Code:
[C:\Users\csgal\Desktop\GMailCSGalloway]c:\Z_UserFiles\JPSoft\BTM_002\parse.btm
===============================================================================
Processng: C:\Users\csgal\Desktop\GMailCSGalloway\GMail_CSG\123GreetingsCom\[email protected] has sent you a Christmas ecard..eml
===============================================================================
From: 123Greetings.com <[email protected]>
To: [email protected]
Reply-To: [email protected] nLocation=
nLocation : []^C

[C:\Users\csgal\Desktop\GMailCSGalloway]

I'd like to also get the "set location....." to be on the next line too

I attached the file it was processing top
 

Attachments

Similar threads

Back
Top