Skip to content

Instantly share code, notes, and snippets.

@shiftkey
Last active July 27, 2022 13:24
Show Gist options
  • Save shiftkey/b51f29301e52a3bc74d9 to your computer and use it in GitHub Desktop.
Save shiftkey/b51f29301e52a3bc74d9 to your computer and use it in GitHub Desktop.
WTF happened to my line endings?

This is a quick guide to debug potential line ending weirdness.

Note: I've thrown a lot of concepts in here around Git data structures without going into depth. If there's things that are unclear or you'd like some more details, just leave a comment and I'll either reply or expand on this post accordingly...

What sort of weirdness am I referring to? Consider this commit: https://github.com/dalefrancis88/Windsor/commit/e2543e5573781c7ded83166932c9c415feef11c0

While it looks like a very large commit, the contents of the file are unchanged. But the diffs are very intimidating.

What happened to this commit?

Note: I'm using git bash for this, because I suspect PowerShell is doing inappropriate things with line endings when I'm piping output between commands.

So something happened in commit e2543e5573 and I'm not sure what.

Let's dig.

You can see in the screenshot the parent commit is 5cb67d91b3, but you can get this detail from the command line:

> git rev-list --parents -n 1 e2543e5573
e2543e5573781c7ded83166932c9c415feef11c0 5cb67d91b3e651d0c36a3cabae7f8a05e2baa20a

And what does each commit actually contain?

> git ls-tree 5cb67d91b3
100644 blob 1ff0c423042b46cb1d617b81efb715defbe8054d    .gitattributes
100644 blob 638179ef1702f06e1dcf5dee8681108fde486fab    .gitignore
100644 blob 9e69ffd8a7357d81cb0c9179fc2bd86d6f03928d    BreakingChanges.txt
100644 blob 614cef8de71ca6508dd5af34e3c2bb749258bcd3    Castle.Windsor-SL.sln
100644 blob 7fbf6d371fa31d8f49466756113f1b2bec576dd3    Castle.Windsor.5.0.ReSharper
100644 blob 30b4e084bae448aca2e829832dfc0e3b0ab143a4    Castle.Windsor.5.1.ReSharper
100644 blob cd389611bb2508f2136a2eb0c9c1dbb5dd1c38c6    Castle.Windsor.6.0.ReSharper
100644 blob 9fcad47026d0652c3cb51be14d7d2514cc22f690    Castle.Windsor.sln
100644 blob 81817cb9425c21315b3a81155f67207a365147e0    Castle.Windsor.sln.DotSettings
100644 blob 6494929363793c63ea1799d757cf2999fa6fe9df    Changes.txt
100644 blob 590107c6b469e1ca93ac958af02f2911bd5c22b5    ClickToBuild.cmd
100644 blob b545ae787628978b014c43f7ec9c12c14bf989fa    License.txt
100644 blob ecf2e8736da50d0a1cb4030d303b7289bc38df8d    Readme.txt
100644 blob 5a75ddbe2ddc3c5644e8e63af865fb9af366728a    Settings.proj
100644 blob a83d604083d22943fbbe6ef83723af8eed045e9a    TODO.txt
100644 blob 758dc052fdf55ce045b3107bd49ef9f13de253ac    build.cmd
040000 tree b55d1cef0eabf65b7395a1b2f7df6f3e5c27fea6    buildscripts
040000 tree ec089d8eb77efb3bf0bc44e7c46755d0e716f9b8    lib
040000 tree fb407731fe1fb93e2a599ef97cffe9bfca6f2c4a    src
040000 tree 4c8062130bcc5cdaedf40f59079b95628bd41ee9    tools

And this is my problem commit:

> git ls-tree e2543e5573
100644 blob 1ff0c423042b46cb1d617b81efb715defbe8054d    .gitattributes
100644 blob a8f0e7ea3dcf7294c71c897a79202ecfd77df577    .gitignore
100644 blob 44f3601962d246e5ae06f975a50bc74fe0f6f86d    BreakingChanges.txt
100644 blob 4e6f62b1e5ea7bc7ccca9a0b037a2aa7a2ff1a4a    Castle.Windsor-SL.sln
100644 blob 71e3d920261f92bc1de99568159577fb17738e08    Castle.Windsor.5.0.ReSharper
100644 blob 62c175d5c75bc219f944d1c938670ed3ff297ee2    Castle.Windsor.5.1.ReSharper
100644 blob 89eb62eb9ec55e0004c1063519142c53dc14deb5    Castle.Windsor.6.0.ReSharper
100644 blob 51246e25f70489ec1080a7023aab7ce19c0a1e8a    Castle.Windsor.sln
100644 blob 81817cb9425c21315b3a81155f67207a365147e0    Castle.Windsor.sln.DotSettings
100644 blob 6494929363793c63ea1799d757cf2999fa6fe9df    Changes.txt
100644 blob fffe071c8f5914ebffb3e5d10361fde5a41610a8    ClickToBuild.cmd
100644 blob b545ae787628978b014c43f7ec9c12c14bf989fa    License.txt
100644 blob 8c209efcc5c3489601775dd37cbb2151475f6176    Readme.txt
100644 blob dda8815dd6561e5771708e627d5c4e51bbfbb18e    Settings.proj
100644 blob d0234a8da9a5776b01b7f11c78a915d67660544c    TODO.txt
100644 blob 28d8b420c1a3ffb7d02d5031ca4200c8ddc890c0    build.cmd
040000 tree c48eb44c6da87f46c9961ee5dd86c6c9cc73d772    buildscripts
040000 tree ddd498718b9d2889629753fd43fe5708b4c59927    lib
040000 tree abebaa3fdc7764e2247db94b488bc69cc9ce5f00    src
040000 tree 97c7b22704d9a35b621db68a5e50cb5b96725139    tools

So blobs are how Git represents data, and we can see a lot of different hashes in these lists.

Let's have a look at the .gitignore file:

  • the older blob is 638179ef1702f06e1dcf5dee8681108fde486fab
  • the newer blob is a8f0e7ea3dcf7294c71c897a79202ecfd77df577

And you can look at the content of each blob:

> git show 638179ef1702f06e1dcf5dee8681108fde486fab

/build
AssemblyInfo.cs

# Standard VS.NET and ReSharper Foo
src/*/obj
src/*/bin
samples/*/obj
samples/*/bin
*.csproj.user
*ReSharper.user
_ReSharper*
*resharper*
*.suo
*.cache
* Thumbs.db

#leftovers from merge
*.orig
*.bak
*.sln.DotSettings.user
*.DotSettings.user

But of course that's rather useless, because it's probably that the control characters have been changed in this commit.

Let's dump out the two blobs to disk:

> git show 638179ef1702f0 >> first-gitignore.txt

> git show a8f0e7ea3dcf72 >> second-gitignore.txt

At this point, open your favourite hex editor and compare the two files. What? You don't have a favourite. I'm using BeyondCompare for this.

So you should see something like this:

So when we talk about line-endings, the summary is:

  • Windows - inserts CRLF to represent the end of a line - which is the characters 0x0D, 0x0A
  • Unix - inserts LF to represent the end of a line - which is the character 0x0A

And from this screenshot, we can see that the commit simply converted the text files from Windows to Unix line endings.

Footnote

This might seem silly (and in this world of .gitattributes it's actually less common than it was) but consider a situation where someone has decided to mix line ending changes with code changes. You're basically left to deal with the noise of the line endings change to understand if there's anything valuable in there.

I'll leave this little GitHub secret for anyone who has been down this hell. When browsing a commit or a PR where you get significant whitespace issues, you can mute them by appending w=1 to the URL, like this:

https://github.com/dalefrancis88/Windsor/commit/e2543e5573781c7ded83166932c9c415feef11c0?w=1

@shiftkey
Copy link
Author

I'm quite surprised why anybody wouldn't use the Unix line endings in the repository.

I can't recommend this approach, because the setting for core.autocrlf doesn't persist with the repository, which is precisely how situations like this arise.

Even for Windows, setting the git option for line endings to auto will do the required conversion (getting Windows line endings when checking out, and converting to unix line endings on commit).

It's 2014 and not all implementations even do core.autocrlf right. Oh, and .gitattributes support still is a way off.

PS: I feel kinda bad for picking on JGit here, but I first got bitten by both of these bugs in early 2012...

Copy link

ghost commented May 17, 2014

because the setting for core.autocrlf doesn't persist with the repository

This is one very good reason.

Without a lot of effort, I can't be sure the file that's in the repo is actually the correct file if git goes and modifies it.

It also leads to problems where (say) my integration tests suddenly stop working because the integration test was reading a file that's been modified by git, or a file that's been cryptographicly signed now no longer passes a signature check.

Moral of the story: Your source control software should never be modifying file content.
And for the nit-pickers, merge tools I don't count here - because they're intended to modify the files.

@AdrianJSClark
Copy link

The tone of your comment, @shiftkey, seems to suggest you should commit "as-is" to the repo (i.e. create a text file with CRLF, store in Git as CRLF) is this correct?

I know this is what @willhughes said in his comment and I certainly would prefer to follow this option myself.

All examples of ".gitattributes" files I've seen, even the suggested one on Github's "Dealing with line endings" page has the following setting:

# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

Correct me if I'm interpreting the setting wrong, but doesn't this configure Git to auto-convert line endings? If so, wouldn't it be better to use this file to configure the "as-is" approach instead? I've looked at the manual pages for the .gitattributes file & never quite been able to work out what the inverse of "* text=auto" is.

(Apologies if this comment is taking this thread away from your original post. Happy to move the discussion to a better forum if this one isn't appropriate.)

@shiftkey
Copy link
Author

@AdrianJSClark

You're right, I haven't suggested a solution to this, yet. I'd rather do that as a separate post (or perhaps try out this new-fangled "blogging" thing the kids these days are raving about) because there's some other things I'd like to touch on around how .gitattributes works...

Yes, .gitattributes is The Right Way to do this, and I recommend the hell out of that. Most of the bizarre stuff I've seen with line endings and .gitattributes were because the user is on an older version of Git (msysgit 1.7.x had some hilarious issues around this space). That might be what's happened here.

# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

Correct me if I'm interpreting the setting wrong, but doesn't this configure Git to auto-convert line endings?

It does, but only to specific files. This wildcard is only applied to files that Git thinks are text files (let's go down that rabbit hole another time). If it were to arbitrarily convert line endings in binary files there'd be chaos.

And when does the configured .gitattributes file actually kick in? There's two general points (citation):

  • when a file stored in the repository is copied to the working tree (e.g. git checkout, git merge, etc)
  • when files in the working tree are prepared and added to the git store (e.g. git add, git commit, etc)

That's why you'll see tutorials ask you to add a "normalization" commit after adding the .gitattributes file - because you need to pass the stored objects through the .gitattributes settings.

But that's not actually what you're asking about.

I've looked at the manual pages for the .gitattributes file & never quite been able to work out what the inverse of "* text=auto" is.

Right, so text=auto means you want CRLF->LF to be normalized. If you don't want this, you can "unset" the value.

*  -text

What does "unset" mean anyway? From that page: Unsetting the text attribute on a path tells git not to attempt any end-of-line conversion upon checkin or checkout.

I'm not a huge fan of this either, but feel free to try this out.

So while @willhughes is totally right and that Git shouldn't care about the file contents that are stored in the repository, we've kinda been painted into a corner because of how computers are basically advanced typewriters.

Oh, and go read this writeup about all the options for working with line endings. BYO hard liquor.

http://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/

@anaisbetts
Copy link

Set core.autocrlf to true, and set * text=auto in your .gitattributes. Anything else will cause you suffering

@mishacucicea
Copy link

@shiftkey Thanks, really helpful (feel free to delete this comment as it's mostly spam) :)

@nshibano
Copy link

Thanks. Your solution,

* -text

helped me a lot.

I wanted LF and CRLF files coexist in the repository. Also I wanted git to inform me when I mistakenly convert LF to CRLF or CRLF to LF. Your solution is the best.

@neman-pcas
Copy link

Just gonna throw this idea into the mix in case anyone else ends up in the same situation. My key repo for a Windows product I configure using Windows machines and Windows tools and run on a Windows server requires nine different XML configuration files. I've just discovered that one single configuration file must have Unix-style LF and not Windows-style CRLF.

Wow.

Since all my tools are Windows, on pull/merge, Git converts those LFs to CRLFs, my deployment tool picks up the now-CRLF XML file, and upon dropping it into the server, it all comes tumbling down.

Right now the solution is manual conversion of CRLF to LF for that one file before running the deployment tool. Having just learned about hooks and DOS2UNIX, the next step is to see if I can write a commit hook that converts CRLF to LF on commit from the dev's machine, and a merge hook to convert the CRLF to LF on merge/pull on the consuming machine.

@DennisGentry-Zoetis
Copy link

@nerman-pcas That's the trickiest situation, where one file (type) needs a specific ending, but .gitattributes has a good solution. Add a line:

*.xyz text eol=LF

to cause .xyz files to always have LF endings. I don't think these instructions existed when this thread started: https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings#per-repository-settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment