Skip to content

Instantly share code, notes, and snippets.

@julianengel
Last active November 18, 2024 13:49
Show Gist options
  • Save julianengel/b63df6fe7caa097c861bd966e80cf44a to your computer and use it in GitHub Desktop.
Save julianengel/b63df6fe7caa097c861bd966e80cf44a to your computer and use it in GitHub Desktop.
Searching Large CSV Files on MacOS with Pretty Print

πŸ“„ Searching Large CSV Files with ripgrep and Pretty Printing on macOS

πŸ› οΈ Installation

  1. Install ripgrep (rg) and column (comes pre-installed on macOS):
brew install ripgrep

πŸ” Step 1: Basic Search with ripgrep

Use ripgrep to quickly search for a keyword in your CSV file:

rg "your_search_term" yourfile.csv
  • rg is a fast search tool, similar to grep but optimized for large files.
  • "your_search_term" is the keyword or pattern you want to search for.

πŸ§‘β€πŸ’» Step 2: Pretty-Print the Search Results

To format the output into a table-like structure, use column:

rg "your_search_term" yourfile.csv | column -s, -t
  • column -s, -t uses -s, to specify a comma as the delimiter and -t to align the columns.

πŸ“ Step 3: Include the Header Line in the Output

Ensure the header (first line) of the CSV is always included in the output:

{ head -n 1 yourfile.csv && rg "your_search_term" yourfile.csv; } | column -s, -t
  • head -n 1 yourfile.csv grabs the first line (header) of the CSV.
  • rg "your_search_term" yourfile.csv searches the CSV for your term.
  • { ...; } combines the header and search results.
  • column -s, -t formats the combined output into a readable table.

πŸ“‹ Example Output:

If your CSV file looks like this:

ID,ScreenName,Email
1,jdoe,[email protected]
2,asmith,[email protected]
3,jdoe,[email protected]

Running:

{ head -n 1 yourfile.csv && rg "jdoe" yourfile.csv; } | column -s, -t

Would produce:

ID  ScreenName  Email
1   jdoe        [email protected]
3   jdoe        [email protected]

πŸ’‘ Why Use This Approach?

  • ripgrep (rg) is blazingly fast, making it ideal for searching through large files.
  • Including the header ensures you always know what the columns represent.
  • Using column creates a neatly formatted, readable output.

This setup is perfect for quick and efficient exploration of large CSV files directly from the command line.

Let me know if there's anything more you'd like to add!

@julianengel
Copy link
Author

Can also be saved & run as script:


#!/bin/bash

# Prompt for the search term
read -p "Enter the search term: " search_term

# Prompt for the CSV file path
read -p "Enter the CSV file path: " csv_file

# Check if the file exists
if [[ ! -f "$csv_file" ]]; then
  echo "Error: File '$csv_file' not found!"
  exit 1
fi

# Perform the search and include the header, then pretty-print
echo "Searching for '$search_term' in '$csv_file'..."
{ head -n 1 "$csv_file" && rg "$search_term" "$csv_file"; } | column -s, -t

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment