Skip to content

Instantly share code, notes, and snippets.

@QNimbus
Last active September 11, 2023 13:14
Show Gist options
  • Save QNimbus/33f7929546e5ab2eee5b8de22a528591 to your computer and use it in GitHub Desktop.
Save QNimbus/33f7929546e5ab2eee5b8de22a528591 to your computer and use it in GitHub Desktop.
Excel macros

Import and merge CSV files

This VBA code library includes several functions and subroutines designed to manage sheet merging and data hashing in an Excel workbook. It also handles the import of CSV files into separate sheets, which are then merged into a single summary sheet.

Constants

  • IMPORT_PREFIX: The prefix used to identify sheets that need to be imported (default is "imp-").
  • IMPORT_SHEET_NAME: The name of the merged worksheet where all other sheets' data will be consolidated (default is "Imported").

Important: Make sure to have Net Framework 3.5 installed on your Windows 10 64-bit Office setup to avoid hash routine errors.

Functions

MD5Hash(strToHash As String) As String

Purpose:
Calculates the MD5 hash of a given string.

Arguments:

  • strToHash: The string you want to hash.

Returns:
A string representing the MD5 hash of the input string.


ColLetterToNumber(colLetter As String) As Long

Purpose:
Converts a column letter to its corresponding column number.

Arguments:

  • colLetter: The column letter (e.g., "A", "Z", "AA").

Returns:
A long representing the column number.


RowHash(rng As Range) As String

Purpose:
Generates an MD5 hash for an entire row of cells within a given range.

Arguments:

  • rng: The range representing a row.

Returns:
A string representing the hash of the entire row's content.


SanitizeSheetName(sheetName As String, Optional customRegexPattern As String = "", Optional customRegexReplace As String = "") As String

Purpose:
Sanitizes sheet names to make them valid Excel worksheet names.

Arguments:

  • sheetName: The original sheet name.
  • customRegexPattern: Custom regular expression pattern to apply.
  • customRegexReplace: Replacement string for the custom regular expression.

Returns:
A sanitized sheet name as a string.


Subroutines

MergeSheets()

Purpose:
Merges data from all worksheets with names that start with IMPORT_PREFIX into a single sheet named IMPORT_SHEET_NAME.

Operation:

  • Identifies the hash column in each worksheet to be merged.
  • Compares hashes to avoid duplicate rows.
  • Appends new rows to the merged sheet.
  • Sorts the merged sheet by date (assumed to be in Column A).
  • Deletes all original sheets with data that has been merged.

ImportCSVs()

Purpose:
Imports data from one or more selected CSV files into new worksheets in the workbook.

Operation:

  • Shows a file picker dialog for selecting CSV files.
  • Creates new worksheets with sanitized names and imports the CSV data into these sheets.
  • Adds a "Hash" column to each sheet to store the MD5 hash of each row.

ImportAndMerge()

Purpose:
Executes the entire operation of importing CSVs and then merging them into a single worksheet.

Operation:

  • Calls ImportCSVs to import CSV files.
  • Calls MergeSheets to merge the newly created sheets.
  • Displays a message indicating that the process is complete.

Example Usage

To import and merge sheets, simply call the ImportAndMerge subroutine.

Call ImportAndMerge

To only perform sheet merging, you can call the MergeSheets subroutine.

Call MergeSheets

To import CSV files without merging, use the ImportCSVs subroutine.

Call ImportCSVs

The complete code

Const IMPORT_PREFIX As String = "imp-"
Const IMPORT_SHEET_NAME As String = "Imported"
Const IMPORT_SHEET_AMOUNT_COLUMN = "C"

' !!!! IMPORTANT !!!!
' It was found that the hash routines errored in a Windows 10, 64 bit Office setup.
' However, subsequent checking revealed the solution.
' The Windows platform must have intalled the Net Framework 3.5 (includes .Net 2 and .Net 3),
' this older version, and not only the Net Framework 4.8 Advanced Services that was enabled in Turn Windows Features on and off.
' When it was selected there, the routines worked perfectly.
'
' MD5 hash using VBA
Function MD5Hash(strToHash As String) As String
    Dim xml As Object
    Set xml = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    Dim bytes() As Byte
    
    bytes = StrConv(strToHash, vbFromUnicode)
    
    With CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
        Dim byteHash() As Byte
        byteHash = .ComputeHash_2((bytes))
        MD5Hash = ""
        For Each b In byteHash
            MD5Hash = MD5Hash & Right("0" & Hex(b), 2)
        Next
    End With
End Function

Public Function CRC32(strToHash As String) As String
    Dim i As Long, j As Long
    Dim crcValue As Long
    Dim byteValue As Byte
    Dim polynomial As Long
    Dim crcTable(0 To 255) As Long

    ' Polynomial used in the CRC-32 calculation
    polynomial = &HEDB88320
    
    ' Generate the CRC table
    For i = 0 To 255
        crcValue = i
        For j = 8 To 1 Step -1
            If (crcValue And 1) <> 0 Then
                crcValue = ((crcValue And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
                crcValue = crcValue Xor polynomial
            Else
                crcValue = ((crcValue And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
            End If
        Next j
        crcTable(i) = crcValue
    Next i
    
    ' Initialize CRC value
    crcValue = &HFFFFFFFF
    
    ' Calculate CRC
    For i = 1 To Len(strToHash)
        byteValue = Asc(Mid(strToHash, i, 1))
        crcValue = crcTable((crcValue And &HFF) Xor byteValue) Xor ((crcValue \ 256&) And &HFFFFFF)
    Next i
    
    ' Finalize CRC value
    crcValue = Not crcValue
    
    ' Convert the CRC value to a hex string
    CRC32 = Right("00000000" & Hex(crcValue), 8)
End Function

Function ColLetterToNumber(colLetter As String) As Long
    ColLetterToNumber = Range(colLetter & "1").Column
End Function

Sub SetCurrencyFormat(wsName As String, col As String)
    Dim lastRow As Long
    Dim colNum As Long
    Dim ws As Worksheet
    
    Set ws = ThisWorkbook.Sheets(wsName)

    colNum = ColLetterToNumber(col)
    lastRow = ws.Cells(Rows.Count, colNum).End(xlUp).Row
    ws.Range(ws.Cells(2, colNum), ws.Cells(lastRow, colNum)).NumberFormat = "$#,##0.00"

End Sub

Function RowHash(rng As Range, Optional ByRef colArray As Variant) As String
    Dim i As Long
    Dim strToHash As String
    Dim ws As Worksheet
    Dim rowNumber As Long
    Dim cell As Range
    
    Set ws = rng.Worksheet
    rowNumber = rng.Row
    
    If IsMissing(colArray) Then
        ' Use all columns if colArray is empty
        For Each cell In rng
            strToHash = strToHash & cell.Value
        Next cell
    Else
        ' Use specified columns to form the string to hash
        For i = LBound(colArray) To UBound(colArray)
            strToHash = strToHash & ws.Cells(rowNumber, colArray(i)).Value
        Next i
    End If
    
    ' Use the CRC32 function to get the hash of the concatenated string
    RowHash = CRC32(strToHash)
    
End Function

Sub MergeSheets()
    ' Variable Declarations
    Dim wsMerged As Worksheet ' Worksheet where all other sheets will be merged into
    Dim ws As Worksheet ' Worksheet currently being processed
    Dim lastRowMerged As Long ' Last row with data in the merged worksheet
    Dim lastRowWs As Long ' Last row with data in the worksheet currently being processed
    Dim wsToDelete As Collection ' Collection to store names of worksheets to be deleted
    Dim wsName As Variant ' Used to loop through wsToDelete collection
    Dim lastCol As Long ' Last column with data in the merged worksheet
    Dim hashColWs As Long ' Column where the hash values are stored in the worksheet being merged
    Dim newRowsMerged As Long ' Counter for the number of new rows added to the merged sheet
    Dim wsHash As Range ' Range object to find hash values in merged worksheet
    Dim sheetsMerged As Integer ' Counter for the number of sheets merged

    ' Initialize counter to 0
    sheetsMerged = 0

    ' Initialize the collection to hold the names of the worksheets to delete
    Set wsToDelete = New Collection
    newRowsMerged = 0

    ' Initialize or reset the merged sheet
    On Error Resume Next
    Set wsMerged = ThisWorkbook.Sheets(IMPORT_SHEET_NAME)
    On Error GoTo 0

    ' Create a new sheet if wsMerged does not exist
    If wsMerged Is Nothing Then
        Set wsMerged = ThisWorkbook.Sheets.Add
        wsMerged.Name = IMPORT_SHEET_NAME
    End If
    
    ' Move the wsMerged to the end
    wsMerged.Move After:=ThisWorkbook.Sheets(ThisWorkbook.Sheets.Count)
    
    ' Check if the first row is empty in the merged sheet
    Dim firstRowIsEmpty As Boolean ' Flag to check if the first row of the merged sheet is empty
    firstRowIsEmpty = IsEmpty(wsMerged.Cells(1, 1).Value)

    ' Find the last column in the merged worksheet
    If firstRowIsEmpty Then
        lastCol = 1 ' Initialize to 1 if first row is empty
    Else
        lastCol = wsMerged.Cells(1, wsMerged.Columns.Count).End(xlToLeft).Column
    End If

    ' Loop through each worksheet
    For Each ws In ThisWorkbook.Sheets
        ' Only process sheets that match the import prefix
        If ws.Name Like IMPORT_PREFIX & "*" Then
            ' Find the hash column in the current worksheet
            hashColWs = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column
            
            ' Copy headers if the merged sheet is empty
            If firstRowIsEmpty Then
                ws.Rows(1).EntireRow.Copy wsMerged.Rows(1)
                firstRowIsEmpty = False ' Mark that the first row is now populated
            End If
            
            ' Determine last populated row for both sheets
            lastRowMerged = wsMerged.Cells(wsMerged.Rows.Count, "A").End(xlUp).Row
            lastRowWs = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

            ' Loop through each row in the current sheet to compare hash and copy if needed
            For rw = 2 To lastRowWs
                ' Search for the hash value in the merged sheet
                Set wsHash = wsMerged.Range(wsMerged.Cells(2, lastCol), wsMerged.Cells(lastRowMerged, lastCol)).Find(ws.Cells(rw, hashColWs).Value, LookAt:=xlWhole)

                ' Copy the row if hash does not exist in the merged sheet
                If wsHash Is Nothing Then
                    ws.Rows(rw).EntireRow.Copy wsMerged.Cells(lastRowMerged + 1, 1)
                    lastRowMerged = lastRowMerged + 1
                    newRowsMerged = newRowsMerged + 1
                End If
            Next rw

            ' Mark this sheet for deletion
            wsToDelete.Add ws.Name
            
            ' Increment counter for the number of sheets processed
            sheetsMerged = sheetsMerged + 1
        End If
    Next ws
    
    ' Final steps: sort, delete sheets, and display message
    If sheetsMerged > 0 Then
        ' Sort by date (assuming it is in Column A)
        wsMerged.Sort.SortFields.Clear
        wsMerged.Sort.SortFields.Add Key:=Range("A2:A" & lastRowMerged), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
        With wsMerged.Sort
            .SetRange Range("A1:Z" & lastRowMerged)
            .Header = xlYes
            .Orientation = xlTopToBottom
            .Apply
        End With
    
        ' Delete the sheets marked for deletion
        Application.DisplayAlerts = False
        For Each wsName In wsToDelete
            ThisWorkbook.Sheets(wsName).Delete
        Next wsName
        Application.DisplayAlerts = True
    
        ' Display message indicating how many new rows were merged
        MsgBox "Total number of new rows merged: " & newRowsMerged
    Else
        ' If no sheets were merged, display a message
        MsgBox "No sheets were merged. Nothing to do."
    End If
End Sub

Function SanitizeSheetName(sheetName As String, Optional customRegexPattern As String = "", Optional customRegexReplace As String = "") As String
    Dim regEx As New RegExp
    Dim validName As String
    
    ' Initialize the RegExp object
    regEx.Global = True
    regEx.IgnoreCase = True
    
    ' Apply custom regex replacement if provided
    If customRegexPattern <> "" Then
        regEx.Pattern = customRegexPattern
        sheetName = regEx.Replace(sheetName, customRegexReplace)
    End If
    
    ' Replace invalid Excel sheet characters with an underscore
    regEx.Pattern = "[\/\\\*\[\]:\?]"
    validName = regEx.Replace(sheetName, "_")
    
    ' Truncate to 31 characters
    If Len(validName) > 31 Then
        validName = Left(validName, 31)
    End If
    
    ' Return the sanitized sheet name
    SanitizeSheetName = validName
End Function

Sub FormatSheet()
    Call SetCurrencyFormat(IMPORT_SHEET_NAME, IMPORT_SHEET_AMOUNT_COLUMN)
End Sub

Sub ImportCSVs()
    Dim ws As Worksheet
    Dim csvFile As String
    Dim csvFilePath As String
    Dim fileDialog As fileDialog
    Dim selectedFiles() As String ' Declare as a dynamic array
    Dim i As Integer
    Dim lastRowWs As Long, hashColWs As Long
    Dim rw As Long

    ' Create and show the file picker dialog
    Set fileDialog = Application.fileDialog(msoFileDialogFilePicker)
    With fileDialog
        .Title = "Select CSV Files"
        .AllowMultiSelect = True
        .Filters.Add "CSV Files", "*.csv", 1
        
        ' Show the dialog and proceed if files are selected
        If .Show = -1 Then
            ReDim selectedFiles(1 To .SelectedItems.Count)
            For i = 1 To .SelectedItems.Count
                selectedFiles(i) = .SelectedItems(i)
            Next i
        Else
            Exit Sub
        End If
    End With

    ' Loop through each selected CSV file
    For i = 1 To UBound(selectedFiles)
        csvFilePath = selectedFiles(i)
        csvFile = Right(csvFilePath, Len(csvFilePath) - InStrRev(csvFilePath, "\"))

        ' Create a new worksheet with sanitized name
        Set ws = ThisWorkbook.Sheets.Add
        ws.Name = SanitizeSheetName(IMPORT_PREFIX & Replace(csvFile, ".csv", ""), "^(.*?)\s\d{4}-\d{2}-\d{2}.*", "$1")

        ' Import the CSV file into the new worksheet
        With ws.QueryTables.Add(Connection:="TEXT;" & csvFilePath, Destination:=ws.Range("A1"))
            .TextFileParseType = xlDelimited
            .TextFileConsecutiveDelimiter = False
            .TextFileTabDelimiter = False
            .TextFileSemicolonDelimiter = False
            .TextFileCommaDelimiter = True
            .TextFileTextQualifier = xlTextQualifierDoubleQuote
            .TextFileDecimalSeparator = ","
            .TextFileThousandsSeparator = "."
            .Refresh BackgroundQuery:=False
        End With
        
        ' Find the last row and the last column in the imported worksheet
        lastRowWs = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
        hashColWs = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column + 1
        
        ' Add "Hash" as the header for the hash column
        ws.Cells(1, hashColWs).Value = "Hash"
        
        ' Calculate hash for each row in the imported worksheet
        For rw = 2 To lastRowWs
            ' The string to hash are Date, Amount, Counterparty & Description columns concatenated
            ws.Cells(rw, hashColWs).Value = RowHash(ws.Rows(rw).Cells(1, 1).Resize(1, hashColWs - 1), Array(1, 3, 5, 7))
        Next rw
    Next i
End Sub

Sub ImportAndMerge()
    Dim originalWsName As String
    
    ' Save the name of the currently active worksheet
    originalWsName = ActiveSheet.Name
    
    ' Call ImportCSVs subroutine to import the CSV files
    ImportCSVs
    
    ' Call MergeSheets subroutine to merge the sheets
    MergeSheets
    
    ' Perform formatting on the merged sheet
    FormatSheet
    
    ' Return to the original worksheet
    ThisWorkbook.Sheets(originalWsName).Activate
    
    ' Display a message indicating that the process is complete
    MsgBox "Import and merge complete."
End Sub

Functional and Technical Design

Function Name

ImportCSVs

Description

The ImportCSVs function is designed to allow users to import one or multiple semicolon-delimited CSV files into an Excel workbook. After importing, it adds a "Hash" column to the end of each imported data set. Each row of the imported data will have a calculated hash value based on specific columns (Date, Amount, Counterparty, & Description). The function will be implemented as a VBA macro in an Excel workbook.

Functional Design

Input

  1. One or more semicolon-delimited CSV files selected by the user via a file picker dialog.

Process

  1. The user initiates the ImportCSVs function.
  2. A file dialog opens allowing the user to select one or multiple CSV files.
  3. For each selected CSV file:
    • Check if the file is semicolon-delimited.
    • If it is:
      • Create a new worksheet with a sanitized name. The name should be prepended with a short string defined by a predefined constant. (Example: Abonnementen en diensten - 2017-11-29 2017-12-31 - Export Statement.csv becomes IMP:Abonnementen en diensten)
        • If the worksheet already exists, append to it
      • Import the CSV data into the new or already existing worksheet.
      • Append a timestamp column.
      • For each row of data, populate the timestamp column with the calculated unix timestamp based on the date in the first column.
      • Append a "Hash" column.
      • For each row of data, calculate a hash value based on specified columns and populate the "Hash" column.
    • If it isn't:
      • Add the file to the list of skipped files.
  4. Once all files are processed, if there were any skipped files, a message is displayed to the user indicating which files were skipped due to improper delimiters.

Output

  1. Data from the selected CSV files is imported into separate worksheets in the workbook.
  2. Each imported data set has an additional "Timestamp" column populated with unix timestamps for each row based on the date in the first column.
  3. Each imported data set has an additional "Hash" column populated with hash values for each row based on specific columns.
  4. If any files were skipped, a message box informs the user.

Technical Design

Variables

  1. ws: Represents a worksheet in Excel.
  2. csvFile: Stores the name of the current CSV file being processed.
  3. csvFilePath: Stores the complete path of the current CSV file being processed.
  4. fileDialog: Represents the file picker dialog.
  5. selectedFiles(): An array storing the paths of the files selected by the user.
  6. wsLastRow & wsHashCol: Store the last row and column positions in the worksheet, respectively.
  7. rw: Counter used for iterating through rows in the worksheet.
  8. skippedFiles: Keeps track of the file names which were skipped during processing.

Procedures & Functions

  1. fileDialog: This dialog box allows users to select multiple .csv files from their computer.
  2. IsSemicolonDelimited: A function that checks if the provided file is semicolon-delimited.
  3. SanitizeSheetName: A function that sanitizes the sheet name according to the following rules:
    • The name should be prepended with a short string defined by a predefined constant. (Example: Abonnementen en diensten - 2017-11-29 2017-12-31 - Export Statement.csv becomes IMP:Abonnementen en diensten)
    • The name should be truncated to 31 characters.
    • The name should not contain any of the following characters: /\*?[]:
  4. ImportCSV: A function that imports a CSV file with UTF-8 encoding. Assume this is implemented in a separate module.
  5. RowHash: A function (not shown in provided code) that calculates the hash value for a given row based on specified columns.

Gross error checking

  1. Count the number of rows in the CSV file and compare this to the number of rows in the worksheet. Consider that there might be a single header row that is only imported once if the import was appended to an existing worksheet. If they don't match, the import is considered a failure.

Error Handling

  1. If the user cancels the file picker dialog, the subroutine exits immediately.
  2. Files which are not semicolon-delimited are skipped, and their names are accumulated in the skippedFiles string. Once all files are processed, if there were any skipped files, a message is displayed to the user.

Conclusion

The ImportCSVs function facilitates the import of semicolon-delimited CSV files into an Excel workbook, ensuring that data integrity is maintained through the addition of a hash value for each row of the imported data. It provides feedback to the user regarding any files that couldn't be imported due to improper delimiters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment