Saturday, September 27, 2008

Converting XLS to CSV with VBS

I'm working on a project that requires me to write some data to Active Directory from a group of excel files. I'm automating a process that requires a human to edit the excel files manually, convert them to CSV, then use MS Access to do some work on them, then run a script on the output. All of this is just basic data manipulation, there is just no good reason to have a human start a process, wait for an output and then start another and so on.

I've never worked with Excel in VBS and in VBS comma separated values are really easy to work with, so I was hoping to find an existing script or executable that I could call to get that out of the way. It turns out that there are lots of bits and piece of code that people sell for what should be a simple task. I don't pay for such things, so I did a bit of digging. There is plenty of VBS code out there for working with Excel data. I got what I needed, and don't really need to convert to CSV.

I figured I'd create the script to convert from XLS or XLSX to CSV anyway, cause clearly this should be free.

I came across Greg Thatcher's page that has a script to output an excel document to the screen once cell at a time in a long column. From there its a short trip to putting the data in to a CSV.

The script can be found here or below. It's a reminder that most of scripting is actually error and exception handling. I have left Greg's comments and highly commented my code as well.

Once caveat, I'm sure the CreateObject("Excel.Application") code only runs if you have Excel installed.

http://s3.amazonaws.com:80/markgamache/xls2csv.vbs

Option Explicit
' We use "Option Explicit" to help us check for coding mistakes

'get the current dir. Excel objects dont read from current dir so must create explicit ref
Dim currentDirectory
currentDirectory = Left(WScript.ScriptFullName,(Len(WScript.ScriptFullName)) - (Len(WScript.ScriptName)))


' the Excel Application
Dim objExcel
' the path to the excel file
Dim excelPath
' how many worksheets are in the current excel file
Dim worksheetCount
Dim counter
' the worksheet we are currently getting data from
Dim currentWorkSheet
' the number of columns in the current worksheet that have data in them
Dim usedColumnsCount
' the number of rows in the current worksheet that have data in them
Dim usedRowsCount
Dim row
Dim column
' the topmost row in the current worksheet that has data in it
Dim top
' the leftmost row in the current worksheet that has data in it
Dim leftm

Dim Cells
' the current row and column of the current worksheet we are reading
Dim curCol
Dim curRow
' the value of the current row and column of the current worksheet we are reading
Dim word
'the cell data

Dim objFSO
Const ForAppending = 8

'make sure there is a cmd line arg an only one
If Wscript.Arguments.Count <> 1 Then
WScript.Echo("Only one argument please. An XLS or xlsx file")
ELSE
'get the command line argument
Dim infile
infile = Wscript.Arguments(0)
'get extension. chars after last period
Dim fileext
fileext = Right(infile, Len(infile) - InStrRev(infile, "."))


'make sure the cmd arg is an excel file
'erros etc for filetypes
Select Case fileext
Case "xls"
ctvnow() 'call the convert function
Case "xlsx"
ctvnow() ' call the convert function
Case ELSE
WScript.Echo("The input file must be an .xls or .xlsx file")
End select
End If
'end main loop all functions below


'this is the main conversion sub
Sub ctvnow()
Dim logfile 'this is the output file
logfile = currentDirectory & Left(infile, InStrRev(infile, ".")) & "csv"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objLogFile
Set objLogFile = objFSO.OpenTextFile(logfile, ForAppending, True)

' where is the Excel file located? currentDirectory lets us locate the file where the script was run
' could add chekc to the infile to see if its includes a path and act accordingly
excelPath = currentDirectory & infile

' Create an invisible version of Excel
Set objExcel = CreateObject("Excel.Application")

' don't display any messages about documents needing to be converted
' from old Excel file formats
objExcel.DisplayAlerts = 0

' open the excel document as read-only
' open (path, confirmconversions, readonly)
objExcel.Workbooks.open excelPath, false, true

' How many worksheets are in this Excel documents
workSheetCount = objExcel.Worksheets.Count


'For counter = 1 to worksheetCount would loop thorough all sheets. We only wan tthe first
For counter = 1 to 1

WScript.Echo "-----------------------------------------------"
WScript.Echo "Reading data from worksheet " & counter & vbCRLF
' set to first worksheet
Set currentWorkSheet = objExcel.ActiveWorkbook.Worksheets(counter)

' how many columns are used in the current worksheet
usedColumnsCount = currentWorkSheet.UsedRange.Columns.Count
' how many rows are used in the current worksheet
usedRowsCount = currentWorkSheet.UsedRange.Rows.Count

' What is the topmost row in the spreadsheet that has data in it
top = currentWorksheet.UsedRange.Row
' What is the leftmost column in the spreadsheet that has data in it
leftm = currentWorksheet.UsedRange.Column


' create Array For the row. this assumes a max of 255 columns in the row. for XLSX this could be higher
'should add check below to throw error if the sheet had too many columns
Dim rowa(255)

Set Cells = currentWorksheet.Cells
' Loop through each row in the worksheet
For row = 0 to (usedRowsCount-1)
' Loop through each column in the worksheet

' the location in the array we are at, corresponds to a column place in the current row
Dim arloc
arloc = 0
For column = 0 to usedColumnsCount-1
' only look at rows that are in the "used" range
curRow = row+top
' only look at columns that are in the "used" range
curCol = column+leftm
' get the value/word that is in the cell
word = Cells(curRow,curCol).Value

'covert " to ' this is because quotes are a text qualifier in CSV so cells can contain a comma
If InStr(word, Chr(34)) > 0 Then
word = Replace(word, Chr(34), Chr(39))
End if
'if the cell contains a comma, wrap the cell in " so that the cell isn't artificially split
If InStr(word,Chr(44)) > 0 Then
word = chr(34) & word & chr(34)
End if

'put the cell in to the array one cell at at time, stripping leading and trailing white space
rowa(arloc) = trim(word)
arloc = arloc + 1 'increment the column to get next cell
Next
Dim rline
Dim rline2
'we turn the array into a line of comma separated values
rline = Join(rowa,",")

'removes extra commas at the end of array as it is 255 wide
rline2 = removecomma(rline) 'calls customer function below
objLogFile.writeline(rline2) ' write the line of CSV
Next

' We are done with the current worksheet, release the memory
Set currentWorkSheet = Nothing
Next

objExcel.Workbooks(1).Close
objExcel.Quit

Set currentWorkSheet = Nothing
' We are done with the Excel object, release it from memory
Set objExcel = Nothing
WScript.Echo("File converted")
End sub


Function removecomma(csvline)
Dim lastchar
lastchar = 1
'we'll check to see if the last char of the line is a comma
Do Until lastchar = 0
If right(csvline, 1) = Chr(44) Then ' check for comma
csvline = Left(csvline, (Len(csvline) - 1)) ' strip a comma from the end
Else
lastchar = lastchar - 1 ' no more commas set lastchar to 0 so we can get out of loop
End if
Loop
removecomma = csvline ' return the new line
End Function

Enjoy it, leverage from it, improve it or ignore it...
Inputting falsified referrals to this site violates the terms of service of this site and is considered unauthorized access (hacking).