tags: cheatsheet, csv, json, dev
Largeish data wrangling (csv, json)
- Desktop apps
- Web Apps
- CLI Tools
- CLI Snippets
Tad viewer [Win/Linux/MacOS]
- View & analyze data, no exporting.
- Worked (slowly) with 900M / 790k rows file
XTabulator [MacOS] abandoned?
- doesn’t do custom delimiters
- Seems faster than Tad
Table Tool [MacOS]
- A simple CSV editor for OS X
- works on smaller files with comma (,) as delimiter - tested on a 50M file - doesn’t work on a 900M file
- A desktop CSV editor for data publishers [Win/Linux/MacOS]
- crashed on a 50M file
Open Refine [Win/Linux/MacOS]
- write SQL to query and visualize gigabytes of CSV files on your local machine.
Talend Data Preparation Tool [Win/MacOS]
- limit of 30K rows
CSV EASY [Win]
Easy Data Transform [Win/macOS]
DB Browser for SQLite [Win/Linux/MacOS]
Tabmega [Win/Linux/MacOS]
- Row Zero - a Google Sheets alternative that can handle large data, tried with a 2Gb csv file, moves swiftly
Tools for generating CSV and other flat versions of the structured data
- python CLI tools
- A suite of utilities for converting to and working with CSV, the king of tabular file formats
- xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files
-
DerLinkshaender/csv2xlsx Finally: a simple, single file executable, no runtime libs command line tool to convert a CSV file to XLSX
-
mentax/csv2xlsx Convert CSV data to xlsx - especially the big ones.
Convert xlsx to csv in Linux with command line
for i in *.xlsx; do /Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to csv "$i" ; done
- jq – lightweight and flexible command-line JSON processor.
- zed - Zed offers a new approach to data that makes it easier to manipulate and manage your data. With Zed's new super-structured data model, messy JSON data can easily be given the fully-typed precision of relational tables without giving up JSON's uncanny ability to represent eclectic data.
- jello - CLI tool to filter JSON and JSON Lines data with Python syntax. (Similar to jq)
- sqlite-utils - CLI tool and Python utility functions for manipulating SQLite databases
- columnq-cli - Simple CLI to help you query tabular data with support for a rich set of growing formats and data sources.
- htmlq - Like jq, but for HTML. Uses CSS selectors to extract bits of content from HTML files.
- hq - jq, but for HTML
- textql - Execute SQL against structured text like CSV or TSV
- q - Run SQL directly on CSV or TSV files
- simplql - Query csv, xls and json with SQL / Simplql is a private and fast in-browser tool that lets you query your data files with your favourite query language - SQL, without using a database.
- Miller Like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON - johnkerl/miller
gets first line from file1.txt then concatenatest all the files in all.txt
cat *.csv | cut -d, -f1,2 --complement | uniq -u > output.txt
cat $(ls -t) > outputfile
ls -tQ | xargs cat
awk 'FNR > 1' file*.csv > newfile.csv
Replace quotes with ‘qq’ from file.csv, output to newfile.csv
sed -n 's/\"/qq/gpw newfile’ file.csv
Replace all double quotes with ‘qq’ in file.txt
sed -i '' 's/\"/qq/g' file.txt
Replace all double quotes with single quotes tr is only used for one character replace
tr '"' "'"
Replace header (first line) with 'tralala'
sed -i.bak "1 s/^.*$/tralala/" file.csv
for file in *
do
sed -i '' 's/\"/\\"/g' "$file"
done
awk 'NR >= 57890000 && NR <= 57890010' /path/to/file
Split ‘file.txt’ in files with 750k rows
xfile=‘file.csv’; tail -n +2 $xfile | split -l 750000 - split_; for file in split_*; do head -n 1 $xfile > tmp_file; cat $file >> tmp_file; mv -f tmp_file $file; done;
- How to open large .CSV file? 2GB
- How To Open & Manipulate Large >100MB CSV Files On A Mac + HN thread
- Working with CSVs on the Command Line
- Merging Multiple CSV Files without merging the header
- Using Python to Parse Spreadsheet Data
- How can I open large csv file?
- What is the best CSV editor on OS X?
- Lightweight CSV viewer for Mac?
- Keen
- Structured text tools
- Data Science at the Command Line
- HN collection
@pax would you be interested in trying my new desktop app Tabmega for this use case? It's available for Win/Linux/MacOS. You can also read more about the tech stack if you have questions.