Skip to content

Instantly share code, notes, and snippets.

@djedr
Last active August 19, 2021 23:55
Show Gist options
  • Save djedr/8349415de904eeade9bb1cb5e860e3ec to your computer and use it in GitHub Desktop.
Save djedr/8349415de904eeade9bb1cb5e860e3ec to your computer and use it in GitHub Desktop.
A simplistic parser for an imaginary XSV format -- a simplified, configurable variant of CSV
// A simplistic parser for an imaginary XSV format -- a simplified, configurable variant of CSV
// where the first 3 characters specify (in order):
// 1. the escape character (a backslash "\" in the example below)
// 2. the column separator (a comma "," in the example below)
// 3. the row separator (a newline character "\n" in the example below)
const parseXsv = (input) => {
console.assert(input.length >= 3)
const escape = input[0]
const columnSeparator = input[1]
const rowSeparator = input[2]
const rows = []
let row = []
let column = ''
let inEscapeMode = false
for (let i = 3; i < input.length; ++i) {
const current = input[i]
if (inEscapeMode) {
column += current
inEscapeMode = false
}
else if (current === escape) inEscapeMode = true
else if (current === columnSeparator) {
row.push(column)
column = ''
}
else if (current === rowSeparator) {
row.push(column)
rows.push(row)
row = []
column = ''
}
else column += current
}
if (column.length > 0 || row.length > 0) {
throw Error('Last row separator missing!')
}
return rows
}
// the example CSV is from Wikipedia: https://en.wikipedia.org/wiki/Comma-separated_values#Example
console.log(parseXsv(`\\,
Year,Make,Model,Description,Price
1997,Ford,E350,ac\\, abs\\, moon,3000.00
1999,Chevy,Venture "Extended Edition",,4900.00
1999,Chevy,Venture "Extended Edition\\, Very Large",,5000.00
1996,Jeep,Grand Cherokee,MUST SELL!\\
air\\, moon roof\\, loaded,4799.00
`))
/* OUTPUTS:
[
[ 'Year', 'Make', 'Model', 'Description', 'Price' ],
[ '1997', 'Ford', 'E350', 'ac, abs, moon', '3000.00' ],
[ '1999', 'Chevy', 'Venture "Extended Edition"', '', '4900.00' ],
[
'1999',
'Chevy',
'Venture "Extended Edition, Very Large"',
'',
'5000.00'
],
[
'1996',
'Jeep',
'Grand Cherokee',
'MUST SELL!\nair, moon roof, loaded',
'4799.00'
]
]
*/
@djedr
Copy link
Author

djedr commented Aug 19, 2021

The nice thing about this format in comparison to CSV is that the row and column separators can be simply escaped to be interpreted as part of data. Thus no quoting is necessary and there is no need to further escape quotemarks or mess with trimming around them. No trimming is done at the parser level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment