Skip to content

Instantly share code, notes, and snippets.

@dugjason
Created October 13, 2023 20:07
Show Gist options
  • Save dugjason/ab60915f67469c65d8cbca91d9b8fa14 to your computer and use it in GitHub Desktop.
Save dugjason/ab60915f67469c65d8cbca91d9b8fa14 to your computer and use it in GitHub Desktop.
JavaScript script to split a CSV file into multiple smaller CSVs, sharing the same header

This script will take a .CSV file and split it into multiple files, each with a maximum specified number of rows. Each output file will have the same header row as the input file.

Usage:

node split-csv.js

You can modify the variables in the CONFIGURATION section below to change the input file path, output directory path, and output filename prefix.

For the sample config values set below, the script will:

  • Read the file at ./input-file.csv (the same directory as you save this script)
  • Split the file into chunks of up to 3000 rows each
  • Write each chunk to a file in the ./output-csvs directory
  • Each file will be named file-chunk-1.csv, file-chunk-2.csv, etc.
/**
* This script will take a .CSV file and split it into multiple files, each
* with a maximum specified number of rows. Each output file will have the
* same header row as the input file.
*
* Usage:
* node split-csv.js
*
* You can modify the variables in the CONFIGURATION section below to change
* the input file path, output directory path, and output filename prefix.
*
* For the sample config values set below, the script will:
* - Read the file at ./input-file.csv (the same directory as you save this script)
* - Split the file into chunks of up to 3000 rows each
* - Write each chunk to a file in the ./output-csvs directory
* - Each file will be named file-chunk-1.csv, file-chunk-2.csv, etc.
*/
const fs = require('fs');
//* CONFIGURATION *//
const inputFilePath = './input-file.csv';
const outputDirectoryPath = './output-csvs';
const outputFilenamePrefix = 'file-chunk';
const MAX_ROWS_PER_FILE = 3000;
//* END-CONFIGURATION *//
fs.readFile(inputFilePath, 'utf8', (err, data) => {
if (err) throw err;
const rows = data.split('\n');
const numChunks = Math.ceil(rows.length / MAX_ROWS_PER_FILE);
for (let i = 0; i < numChunks; i++) {
const start = i * MAX_ROWS_PER_FILE;
const end = start + MAX_ROWS_PER_FILE;
const chunk = rows.slice(start, end);
const outputFilePath = `${outputDirectoryPath}/${outputFilenamePrefix}-${i + 1}.csv`;
const headerRow = rows[0];
const dataRows = i === 0 ? chunk.slice(1) : chunk;
fs.writeFile(outputFilePath, headerRow + '\n' + dataRows.join('\n'), (err) => {
if (err) throw err;
console.log(`Chunk ${i + 1} written to ${outputFilePath}`);
});
}
});
@ianroberts
Copy link

Note for future readers: this will only work if none of your CSV column values contain embedded newlines (yes, that is possible - CSV permits line breaks within a quoted value, so there may be more than one physical line making up a single logical row in the data). If your CSV may contain such values then you’ll need to use a real CSV parser like papaparse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment