Created
September 30, 2020 21:14
-
-
Save kspurgin/69a757fcad4cc92fe6aa520ce5a2f72e to your computer and use it in GitHub Desktop.
csv_column_splitting_headache
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I use a little awk oneliner derived from https://www.datafix.com.au/cookbook/structure1.html | |
to verify the structure of client-supplied CSVs (that I convert to TSVs) or TSVs. One client's | |
table of object data provided as TSV used CRLF row endings, AND included TAB, CRLF, CR, and LF | |
characters inside individual fields to format multiline notes. | |
The result of my check on this ONE FILE was as follows: | |
292 rows are broken into 82 columns | |
606 rows are broken into 1 columns | |
486 rows are broken into 0 columns | |
152 rows are broken into 25 columns | |
130 rows are broken into 58 columns | |
123 rows are broken into 22 columns | |
108 rows are broken into 19 columns | |
96 rows are broken into 64 columns | |
79 rows are broken into 28 columns | |
76 rows are broken into 55 columns | |
62 rows are broken into 59 columns | |
57 rows are broken into 24 columns | |
40 rows are broken into 3 columns | |
39 rows are broken into 26 columns | |
34 rows are broken into 61 columns | |
32 rows are broken into 4 columns | |
32 rows are broken into 2 columns | |
21 rows are broken into 34 columns | |
19 rows are broken into 6 columns | |
19 rows are broken into 57 columns | |
17 rows are broken into 53 columns | |
17 rows are broken into 32 columns | |
17 rows are broken into 30 columns | |
17 rows are broken into 18 columns | |
15 rows are broken into 39 columns | |
15 rows are broken into 17 columns | |
11 rows are broken into 44 columns | |
10 rows are broken into 66 columns | |
8 rows are broken into 36 columns | |
8 rows are broken into 27 columns | |
7 rows are broken into 9 columns | |
7 rows are broken into 5 columns | |
6 rows are broken into 7 columns | |
6 rows are broken into 15 columns | |
5 rows are broken into 37 columns | |
5 rows are broken into 20 columns | |
4 rows are broken into 56 columns | |
3 rows are broken into 12 columns | |
2 rows are broken into 60 columns | |
2 rows are broken into 10 columns | |
1 rows are broken into 8 columns | |
1 rows are broken into 63 columns | |
1 rows are broken into 43 columns | |
1 rows are broken into 23 columns |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment