I am sheepish to admit a certain type of routine Microsoft Excel use.
Current example: I am marking for STAT 545. I use R to create a comma delimited marking sheet, by joining the official class list and peer reviews. The sheet contains variables, initially set to NA
, where the TAs and I enter official marks and optional comments.
This is where Excel comes in. I like its visual organization of this comma delimited file much more than, say, using a plain text editor. I use the ability to hide columns, resize columns, wrap text, and (gasp!) even fill rows with grey to indicate I am done.
I keep saving the file as comma delimited and I put up with Excel's incessant freak out about "losing features". This is not a one time thing. I need to save and commit this file many times before it is considered done.
But Excel for the Mac 2011 does a really stupid thing: In a misplaced homage to the classic Mac OS, line endings are actually carriage returns -- the character \r
.
This is the orphan disease of line ending problems. Everyone wants to talk to you about the much more prevalent "Windows vs. Unix" line ending problem, where Windows uses carriage return + line feed ("CRLF") or \r\n
and Unix gets it right with line feed only or \n
. Git is even prepared to help you navigate "Windows vs Unix" line ending problem. But that solution does absolutely zilch if you are suffering from naked carriage returns as line endings.
Luckily, Rich Fitzjohn has suffered before me and written a great blog post about this. He explains very well why this is worth solving, so your Git diffs remain informative.
Bottom line from Rich with one slight twist:
-
Add lines like this to
.git/config
[filter "lfnotcr"] clean = LC_CTYPE=C awk '{printf(\"%s\\n\", $0)}' | LC_CTYPE=C tr '\\r' '\\n' smudge = tr '\\n' '\\r'
-
Create a file
.git/info/attributes
like this*.csv filter=lfnotcr
Now your line endings will be corrected before the csv files enter the repository (that's what the clean
filter does) and restored when you work with them locally (that's what the smudge
filter does). Do NOT omit the smudge
filter, even if you think it's unnecessary because Excel seems perfectly able to read files with line feed as line ending. You need both for diffing to work the way you want.
My only twist on Rich's solution is to use .git/info/attributes
instead of .gitattributes
for a completely local solution. Neither of the files mentioned above are actually tracked in the repository, so your Excel shame can remain a relatively private thing.
Notes to self for future line ending headaches ...
English | OS connotation | character | "vibe" |
---|---|---|---|
carriage return | classic Mac, i.e. OS 9 and earlier | \r |
archaic |
line feed | Unix, including Mac OS X | \n |
The Very Best |
carriage return + line feed, "CRLF" | Windows, going back to DOS | \r\n |
Boo! Windows! |
How to check your line endings in the shell. You could use cat -v
. Here's a comma delimited file, fresh after saving with Excel for the Mac 2011:
$ cat -v months.csv
jan,1,january^Mfeb,2,february^Mmar,3,march
You can see the distinctive ^M
s where the carriage returns are. Note that Excel also does not put a line ending at the end the file.
Another option is to use file
. Here's a look at the same file:
$ file months.csv
months.csv: ASCII text, with CR line terminators
How to correct manually? You could use Homebrew to install dos2unix
, i.e. brew install dos2unix
. Then you will have the mac2unix
command:
$ mac2unix months.csv
mac2unix: converting file months.csv to Unix format...
$ file months.csv
months.csv: ASCII text
You could also use the tr
command to translate carriage returns to line feeds.
$ more months.csv
jan,1,january^Mfeb,2,february^Mmar,3,march
$ file months.csv
months.csv: ASCII text, with CR line terminators
$ tr '\r' '\n' < months.csv > months-unix.csv
$ file months-unix.csv
months-unix.csv: ASCII text
$ more months-unix.csv
jan,1,january
feb,2,february
mar,3,march
While I'm on a roll, if you've somehow got Windows-style line endings and you want Unix-style, use tr
to simply delete carriage returns:
tr -d '\r' < file-name > file-name-unix
This will turn \r\n
into just \n
. The dos2unix
command would also work here.
Random: line ending conversion is one of the reasons it's so important to specify FTP file transfer in ascii
mode for text files.
At first, I considered writing a pre-commit Git hook, but the Git attribute and filter approach was easier to get working. Git hooks are not particularly well documented or exampled. But Git Hooks for Fun and Profit actually gave me some hope.
Rich's group has moved on to a different solution:
dfalster/baad@1620ecb