Converting XLSX to Tab-separated TXT file

#1
Does anyone know an elegant way to dump out a spreadsheet to the Tab-separated text file format using a command line util? I do a lot of processing on our Bill of Materials that are in spreadsheet format; but I have to convert them to text first using Excel. I am looking for a solution that adds the minimal amount of other software tools as we have several company sites we would like to use this at - and they prefer to use what they have on their machines. I'm currently using TCC BTM files and AWK.

I'm leaning toward Python, to replace AWK, as they seem to have a way to do this; but wanted to see if you all had a better solution.

Here's a link showing the Python code; I just don't know Python yet :)
https://gist.github.com/brendano/22764
 

samintz

Scott Mintz
May 20, 2008
1,240
11
Solon, OH, USA
#2
I was going to suggest using VBScript but the Python solution is pretty good. Basically, a XLSX file is a ZIP file. You can see this by renaming it to have a .ZIP extension and opening it as a ZIP file.

Here is a sample:
Code:
1980-01-01 00:00:00 .....         1816          419  [Content_Types].xml
1980-01-01 00:00:00 .....          588          245  _rels\.rels
1980-01-01 00:00:00 .....         1246          293  xl\_rels\workbook.xml.rels
1980-01-01 00:00:00 .....          732          398  xl\workbook.xml
1980-01-01 00:00:00 .....         6995         1666  xl\theme\theme1.xml
1980-01-01 00:00:00 .....          458          217  xl\worksheets\_rels\sheet1.xml.rels
1980-01-01 00:00:00 .....          439          275  xl\worksheets\sheet2.xml
1980-01-01 00:00:00 .....          439          275  xl\worksheets\sheet3.xml
1980-01-01 00:00:00 .....        12335         2190  xl\worksheets\sheet1.xml
1980-01-01 00:00:00 .....         6538         2450  xl\sharedStrings.xml
1980-01-01 00:00:00 .....          953          447  xl\styles.xml
1980-01-01 00:00:00 .....         1154          446  xl\tables\table1.xml
1980-01-01 00:00:00 .....          377          265  xl\connections.xml
1980-01-01 00:00:00 .....         2900          322  xl\printerSettings\printerSettings1.bin
1980-01-01 00:00:00 .....         1145          482  xl\xmlMaps.xml
1980-01-01 00:00:00 .....          635          324  docProps\core.xml
1980-01-01 00:00:00 .....          917          397  docProps\app.xml
The Python code unzips the relevant files parses the XML, removes embedded tabs and newlines, and spits out a tab separated value list. It's about as simple as you'll come by.

This VBScript does a Save As and saves the spreadsheet as a CSV file
Code:
Set objArgs = WScript.Arguments
InputName = objArgs(0)
OutputName = objArgs(1)
Set objExcel = CreateObject("Excel.application")
objExcel.application.visible=false
objExcel.application.displayalerts=false
set objExcelBook = objExcel.Workbooks.Open(InputName)
objExcelBook.SaveAs OutputName, 23
objExcel.Application.Quit
objExcel.Quit
Invoke this as:
Code:
cscript script.vbs C:\foo\file.xlsx C:\foo\file.csv
 
Likes: Schuby