You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Multiple Tab Separated Values file format specification
This document specifies the .mtsv file format.
The field separators is one or more tabs. /\t+/
The record separators is a newline. /\n/
The fields are nonempty escaped text strings as specified in the subdocument. /[^\t\n]+/
The main difference between .mtsv and .tsv is that multiple tabs are considered a single separator. This means we cannot put an empty field in a TSV document. Applications may choose a special nonce value to indicate an empty field depending on context.
Supplementary formats
We also specify Commented Multiple Tab Separated Values Format .cmtsv as an extension to .mtsv. This file format is useful for config files similar to /etc/fstab.
The differences are:
Blank lines are ignored.
Any line starting with # is treated as a blank line.
To avoid being treated as comments, If the field in the first column of a record starts with # it should be escaped \#.
Limitations: In .cmtsv documents you cannot express records with no fields.
This subdocument specifies an encoding for escaping text. It was designed for the multiple tab separated values format. It uses backslash escaping and tries to be common to shell and most scripting languages, hopefully this makes the escaped output easy to use in a variety of contexts.
Escape all tabs and newlines, so that the result may be used as a TSV field.
Escape all terminal control sequences so that escaped text will never accidentally affect the terminal state.
Operate correctly on arbitrary text.
Operate correctly on Unicode UTF-8 text.
The input is any string of bytes. When the input is valid UTF-8 text the output will also be valid UTF-8. The output is a backslash escaped string of characters, it will not contain any of the following bytes:
[0x00-0x1F] (this range includes the NUL byte, tab and newline chars as well as terminal control codes)
DEL (0x7F).
Most ascii values are escaped as \xXX. Some special ones have a nicer syntax:
\b (0x08)
\f (0x0C)
\n (0x0A)
\r (0x0D)
\t (0x09)
\v (0x0B)
Furthermore the following bytes will not occur alone. They will be escaped and only occur after a backslash:
\ (0x5C)
" (0x22)
About escaping unicode codepoints
Any byte starting with 1 (i.e. in the range [128-255]) can be passed through unchanged. This means multiple-byte unicode codepoints are passed through unescaped. An implementation may also choose to escape a set of unicode codepoints with \uXXXX. This can only express 16 bit codepoints but unicode goes up to 21 bits. So for those cases you can either escape each of the bytes using \xXX or use \UXXXXXXXX.
About not escaping $
We choose not to escape $ even though it expands to variables inside a shell "-string. This means that one must check for and manually escape $'s in the output when copying and pasting TSV text into a shell script string. It would be unreadable to escape $ as \x24 so you might prefer to write \$ but while perl ruby and shell do, python doesn't treat \$ as an escaped dollar. Also $ is quite rare in filenames and URLs so it wont be a problem often.
About escaping ASCII characters that don't need escaped
Other than the special escape codes above, any escaped character just denotes that character. For example \# denotes #.
Tab Separated Values file format specification version 2.0
This document specifies the .tsv file format.
A TSV file represents a list of lists of strings.
The field separator is /\t/ (tab)
The record separator is /\n/ (newline)
A field is any string not containing tab or newline characters /[^\t\n]*/
Example
For example
Name<TAB>Age<TAB>Address
Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St
represents
(("Name" "Age" "Address")
("Paul" "23" "1115 W Franklin")
("Bessy the Cow" "5 Big Farm Way")
("Zeke" "45" "W Main St"))
Supplementary formats
This specification aims to improve upon the IANA [1] spec by being precise about what a valid field is.
We also specify ascii separated values .asv format using record separator instead of \n and unit separator instead of \t. [2]
One can consider a looser variation of tsv where multiple tabs /\t+/ are considered as a single record separator. This supports proper alignment in a text editor but means that an empty field cannot be expressed. Use .ttsv for this varation.
Implementation notes
A reader or application using tsv may:
choose to treat the first record as field names.
choose to put a limit on field lengths.
choose to enforce tabular format. (all records having the same number of fields)
A serializer must:
error if a field contains a tab or newline
error if a field contains an ascii separator (in the case of .asv only)
error if a field is the empty string (in the case of .ttsv only)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
here's a conforming implementation! https://github.com/jtolds/tsv-tools