This is a quick guide to debug potential line ending weirdness.
Note: I've thrown a lot of concepts in here around Git data structures without going into depth. If there's things that are unclear or you'd like some more details, just leave a comment and I'll either reply or expand on this post accordingly...
What sort of weirdness am I referring to? Consider this commit: https://github.com/dalefrancis88/Windsor/commit/e2543e5573781c7ded83166932c9c415feef11c0
While it looks like a very large commit, the contents of the file are unchanged. But the diffs are very intimidating.
What happened to this commit?
Note: I'm using git bash for this, because I suspect PowerShell is doing inappropriate things with line endings when I'm piping output between commands.
So something happened in commit e2543e5573
and I'm not sure what.
Let's dig.
You can see in the screenshot the parent commit is 5cb67d91b3
, but you can get this detail from the command line:
> git rev-list --parents -n 1 e2543e5573
e2543e5573781c7ded83166932c9c415feef11c0 5cb67d91b3e651d0c36a3cabae7f8a05e2baa20a
And what does each commit actually contain?
> git ls-tree 5cb67d91b3
100644 blob 1ff0c423042b46cb1d617b81efb715defbe8054d .gitattributes
100644 blob 638179ef1702f06e1dcf5dee8681108fde486fab .gitignore
100644 blob 9e69ffd8a7357d81cb0c9179fc2bd86d6f03928d BreakingChanges.txt
100644 blob 614cef8de71ca6508dd5af34e3c2bb749258bcd3 Castle.Windsor-SL.sln
100644 blob 7fbf6d371fa31d8f49466756113f1b2bec576dd3 Castle.Windsor.5.0.ReSharper
100644 blob 30b4e084bae448aca2e829832dfc0e3b0ab143a4 Castle.Windsor.5.1.ReSharper
100644 blob cd389611bb2508f2136a2eb0c9c1dbb5dd1c38c6 Castle.Windsor.6.0.ReSharper
100644 blob 9fcad47026d0652c3cb51be14d7d2514cc22f690 Castle.Windsor.sln
100644 blob 81817cb9425c21315b3a81155f67207a365147e0 Castle.Windsor.sln.DotSettings
100644 blob 6494929363793c63ea1799d757cf2999fa6fe9df Changes.txt
100644 blob 590107c6b469e1ca93ac958af02f2911bd5c22b5 ClickToBuild.cmd
100644 blob b545ae787628978b014c43f7ec9c12c14bf989fa License.txt
100644 blob ecf2e8736da50d0a1cb4030d303b7289bc38df8d Readme.txt
100644 blob 5a75ddbe2ddc3c5644e8e63af865fb9af366728a Settings.proj
100644 blob a83d604083d22943fbbe6ef83723af8eed045e9a TODO.txt
100644 blob 758dc052fdf55ce045b3107bd49ef9f13de253ac build.cmd
040000 tree b55d1cef0eabf65b7395a1b2f7df6f3e5c27fea6 buildscripts
040000 tree ec089d8eb77efb3bf0bc44e7c46755d0e716f9b8 lib
040000 tree fb407731fe1fb93e2a599ef97cffe9bfca6f2c4a src
040000 tree 4c8062130bcc5cdaedf40f59079b95628bd41ee9 tools
And this is my problem commit:
> git ls-tree e2543e5573
100644 blob 1ff0c423042b46cb1d617b81efb715defbe8054d .gitattributes
100644 blob a8f0e7ea3dcf7294c71c897a79202ecfd77df577 .gitignore
100644 blob 44f3601962d246e5ae06f975a50bc74fe0f6f86d BreakingChanges.txt
100644 blob 4e6f62b1e5ea7bc7ccca9a0b037a2aa7a2ff1a4a Castle.Windsor-SL.sln
100644 blob 71e3d920261f92bc1de99568159577fb17738e08 Castle.Windsor.5.0.ReSharper
100644 blob 62c175d5c75bc219f944d1c938670ed3ff297ee2 Castle.Windsor.5.1.ReSharper
100644 blob 89eb62eb9ec55e0004c1063519142c53dc14deb5 Castle.Windsor.6.0.ReSharper
100644 blob 51246e25f70489ec1080a7023aab7ce19c0a1e8a Castle.Windsor.sln
100644 blob 81817cb9425c21315b3a81155f67207a365147e0 Castle.Windsor.sln.DotSettings
100644 blob 6494929363793c63ea1799d757cf2999fa6fe9df Changes.txt
100644 blob fffe071c8f5914ebffb3e5d10361fde5a41610a8 ClickToBuild.cmd
100644 blob b545ae787628978b014c43f7ec9c12c14bf989fa License.txt
100644 blob 8c209efcc5c3489601775dd37cbb2151475f6176 Readme.txt
100644 blob dda8815dd6561e5771708e627d5c4e51bbfbb18e Settings.proj
100644 blob d0234a8da9a5776b01b7f11c78a915d67660544c TODO.txt
100644 blob 28d8b420c1a3ffb7d02d5031ca4200c8ddc890c0 build.cmd
040000 tree c48eb44c6da87f46c9961ee5dd86c6c9cc73d772 buildscripts
040000 tree ddd498718b9d2889629753fd43fe5708b4c59927 lib
040000 tree abebaa3fdc7764e2247db94b488bc69cc9ce5f00 src
040000 tree 97c7b22704d9a35b621db68a5e50cb5b96725139 tools
So blobs are how Git represents data, and we can see a lot of different hashes in these lists.
Let's have a look at the .gitignore file:
- the older blob is
638179ef1702f06e1dcf5dee8681108fde486fab
- the newer blob is
a8f0e7ea3dcf7294c71c897a79202ecfd77df577
And you can look at the content of each blob:
> git show 638179ef1702f06e1dcf5dee8681108fde486fab
/build
AssemblyInfo.cs
# Standard VS.NET and ReSharper Foo
src/*/obj
src/*/bin
samples/*/obj
samples/*/bin
*.csproj.user
*ReSharper.user
_ReSharper*
*resharper*
*.suo
*.cache
* Thumbs.db
#leftovers from merge
*.orig
*.bak
*.sln.DotSettings.user
*.DotSettings.user
But of course that's rather useless, because it's probably that the control characters have been changed in this commit.
Let's dump out the two blobs to disk:
> git show 638179ef1702f0 >> first-gitignore.txt
> git show a8f0e7ea3dcf72 >> second-gitignore.txt
At this point, open your favourite hex editor and compare the two files. What? You don't have a favourite. I'm using BeyondCompare for this.
So you should see something like this:
So when we talk about line-endings, the summary is:
- Windows - inserts CRLF to represent the end of a line - which is the characters 0x0D, 0x0A
- Unix - inserts LF to represent the end of a line - which is the character 0x0A
And from this screenshot, we can see that the commit simply converted the text files from Windows to Unix line endings.
This might seem silly (and in this world of .gitattributes
it's actually less common than it was) but consider a situation where someone has decided to mix line ending changes with code changes. You're basically left to deal with the noise of the line endings change to understand if there's anything valuable in there.
I'll leave this little GitHub secret for anyone who has been down this hell. When browsing a commit or a PR where you get significant whitespace issues, you can mute them by appending w=1
to the URL, like this:
https://github.com/dalefrancis88/Windsor/commit/e2543e5573781c7ded83166932c9c415feef11c0?w=1
The tone of your comment, @shiftkey, seems to suggest you should commit "as-is" to the repo (i.e. create a text file with CRLF, store in Git as CRLF) is this correct?
I know this is what @willhughes said in his comment and I certainly would prefer to follow this option myself.
All examples of ".gitattributes" files I've seen, even the suggested one on Github's "Dealing with line endings" page has the following setting:
Correct me if I'm interpreting the setting wrong, but doesn't this configure Git to auto-convert line endings? If so, wouldn't it be better to use this file to configure the "as-is" approach instead? I've looked at the manual pages for the .gitattributes file & never quite been able to work out what the inverse of "* text=auto" is.
(Apologies if this comment is taking this thread away from your original post. Happy to move the discussion to a better forum if this one isn't appropriate.)