Last active
November 16, 2019 04:45
-
-
Save OhMeadhbh/e0457b6eb09abdaa144699db56ebbeb7 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In Defense of Two Spaces After a Period | |
I spend more time than is healthy worrying about software documentation. Don't get me wrong, I enjoy coding. I enjoy the process of investigating problems, possibly breaking them down into sub-problems and then searching my toolbox of conceptual solutions to find the one that's *just right*. But communicating a developer's intent clearly is an important part of constructing a solution. If you work in a team with other software developers, communicating intent is of vital importance. Even if you work in isolation, documenting your intent is important so that when you eventually come back to your code several years later, you have a chance of understanding what you were trying to do. | |
And it was while I was in deep thought about documenting software that I thought that maybe, just maybe, there's justification for two spaces after a period in the modern world. | |
I frequently use EMACS to edit files; sometimes I use VI, I'm not a zealot. So I like to see text files as just that: text files. Text can be underwhelming visually sometimes, so I also like to use MarkDown, ASCIIDoc or Emacs Org Mode. Editors exist to make these formats easy to edit; but I still like to have access to the original text. It is always useful to be able to diff two text files, and many visual editors make this harder than it has to be, if it is at all possible. | |
But there is a problem with file differs: they often misunderstand the differences in text you're interested in. Many git users will have seen diffs where complete paragraphs have been removed, and then the complete paragraph has been added with small changes. Brandon Rhodes gives a good example on his page regarding Semantic Linefeeds. Go look at it now, it's at: https://rhodesmill.org/brandon/2012/one-sentence-per-line/ . | |
The problem I have with Brian Kernighan's advice of "start each sentence on a new line," is that it looks ugly. But I can't deny it's good advice; just look at the way the differ mangled the update in Rhodes' example. | |
My simple solution is to keep all my paragraphs on a single line, but use a program to break them apart before checking them into source control. But you may be asking, what does this have to do with "two spaces after a period?" In English we use periods for many things other than signifiying the end of a sentence. For example, we use periods to denote abbreviations. If I wanted to talk about science fiction writer A. E. van Vogt, who pretty much only used his initials, it would be annoying if my unlining script put the middle 'E' on it's own line. Not impossible to read, of course, but annoying. | |
So my solution was to assume sentences ended with a period and two spaces instead of a period and one space. My line breaking script will get confused if I ever only use one space after a period, but it won't be the end of the world. And when i say "line breaking script," I really mean "invocation of sed." The `sed` command is a handy utility that will convert a period and two spaces into a period and a line feed with this simple command: | |
sed 's/\.\ \ /\.\n/g' input_file.txt > output_file.xtx | |
Going the other way requires only that you reverse the regular expressions in the sed substitution specification: | |
sed 's/\.\n/\.\ \ /g' input_file.xtx > output_file.txt | |
Now there is a (mostly) automated way to edit text files where paragraphs look "correct" to me, but we split out sentences on individual lines so we have something akin to semantic linefeeds. If you're reading this in a github gist, check out the revision history to see if the diffs look more or less comprehensible to you. Also check out the "xtx" file below, whose content should be the same as this file, but with each sentence broken out into its own line. | |
And after uploading this as a gist on github, I noticed the default diff algorithm deleted the two lines of the previous paragraph and then added three lines. This isn't exactly what I was expecting, but it did remind me that I was once interested in diff algorithms. Having not used that information in a couple decades, I have forgotten most of it but was happy to find git provides quite a few options. Look at the '--diff-algorithm' parameter in the git diff command; documentation can be found here at https://git-scm.com/docs/git-diff . For the advanced student, googling "patience diff" and "myers diff" will reveal interesting discussions if you don't already have a textbook that covers it. | |
Finally, I'm adding a "ztz" file to the gist which is the same content, but with newlines at the end of each line. I suspect it will be easier for humans to read, but the diffs will less easy to follow. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In Defense of Two Spaces After a Period | |
I spend more time than is healthy worrying about software documentation. | |
Don't get me wrong, I enjoy coding. | |
I enjoy the process of investigating problems, possibly breaking them down into sub-problems and then searching my toolbox of conceptual solutions to find the one that's *just right*. | |
But communicating a developer's intent clearly is an important part of constructing a solution. | |
If you work in a team with other software developers, communicating intent is of vital importance. | |
Even if you work in isolation, documenting your intent is important so that when you eventually come back to your code several years later, you have a chance of understanding what you were trying to do. | |
And it was while I was in deep thought about documenting software that I thought that maybe, just maybe, there's justification for two spaces after a period in the modern world. | |
I frequently use EMACS to edit files; sometimes I use VI, I'm not a zealot. | |
So I like to see text files as just that: text files. | |
Text can be underwhelming visually sometimes, so I also like to use MarkDown, ASCIIDoc or Emacs Org Mode. | |
Editors exist to make these formats easy to edit; but I still like to have access to the original text. | |
It is always useful to be able to diff two text files, and many visual editors make this harder than it has to be, if it is at all possible. | |
But there is a problem with file differs: they often misunderstand the differences in text you're interested in. | |
Many git users will have seen diffs where complete paragraphs have been removed, and then the complete paragraph has been added with small changes. | |
Brandon Rhodes gives a good example on his page regarding Semantic Linefeeds. | |
Go look at it now, it's at: https://rhodesmill.org/brandon/2012/one-sentence-per-line/ . | |
The problem I have with Brian Kernighan's advice of "start each sentence on a new line," is that it looks ugly. | |
But I can't deny it's good advice; just look at the way the differ mangled the update in Rhodes' example. | |
My simple solution is to keep all my paragraphs on a single line, but use a program to break them apart before checking them into source control. | |
But you may be asking, what does this have to do with "two spaces after a period?" In English we use periods for many things other than signifiying the end of a sentence. | |
For example, we use periods to denote abbreviations. | |
If I wanted to talk about science fiction writer A. E. van Vogt, who pretty much only used his initials, it would be annoying if my unlining script put the middle 'E' on it's own line. | |
Not impossible to read, of course, but annoying. | |
So my solution was to assume sentences ended with a period and two spaces instead of a period and one space. | |
My line breaking script will get confused if I ever only use one space after a period, but it won't be the end of the world. And when i say "line breaking script," I really mean "invocation of sed." The `sed` command is a handy utility that will convert a period and two spaces into a period and a line feed with this simple command: | |
sed 's/\.\ \ /\.\n/g' input_file.txt > output_file.xtx | |
Going the other way requires only that you reverse the reglar expressions in the sed substitution specification: | |
sed 's/\.\n/\.\ \ /g' input_file.xtx > output_file.txt | |
Now there is a (mostly) automated way to edit text files where paragraphs look "correct" to me, but we split out sentences on individual lines so we have something akin to semantic linefeeds. | |
If you're reading this in a github gist, check out the revision history to see if the diffs look more or less comprehensible to you. | |
Also check out the "bis" file below, whose content should be the same as this file, but with each sentence broken out into its own line. | |
And after uploading this as a gist on github, I noticed the default diff algorithm deleted the two lines of the previous paragraph and then added three lines. | |
This isn't exactly what I was expecting, but it did remind me that I was once interested in diff algorithms. | |
Having not used that information in a couple decades, I have forgotten most of it but was happy to find git provides quite a few options. | |
Look at the '--diff-algorithm' parameter in the git diff command; documentation can be found here at https://git-scm.com/docs/git-diff . | |
For the advanced student, googling "patience diff" and "myers diff" will reveal interesting discussions if you don't already have a textbook that covers it. | |
Finally, I'm adding a "ztz" file to the gist which is the same content, but with newlines at the end of each line. | |
I suspect it will be easier for humans to read, but the diffs will less easy to follow. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In Defense of Two Spaces After a Period | |
I spend more time than is healthy worrying about software documentation. Don't get me wrong, I enjoy | |
coding. I enjoy the process of investigating problems, possibly breaking them down into sub-problems and | |
then searching my toolbox of conceptual solutions to find the one that's *just right*. But communicating | |
a developer's intent clearly is an important part of constructing a solution. If you work in a team with | |
other software developers, communicating intent is of vital importance. Even if you work in isolation, | |
documenting your intent is important so that when you eventually come back to your code several years | |
later, you have a chance of understanding what you were trying to do. | |
And it was while I was in deep thought about documenting software that I thought that maybe, just maybe, | |
there's justification for two spaces after a period in the modern world. | |
I frequently use EMACS to edit files; sometimes I use VI, I'm not a zealot. So I like to see text files | |
as just that: text files. Text can be underwhelming visually sometimes, so I also like to use MarkDown, | |
ASCIIDoc or Emacs Org Mode. Editors exist to make these formats easy to edit; but I still like to have | |
access to the original text. It is always useful to be able to diff two text files, and many visual | |
editors make this harder than it has to be, if it is at all possible. | |
But there is a problem with file differs: they often misunderstand the differences in text you're | |
interested in. Many git users will have seen diffs where complete paragraphs have been removed, and then | |
the complete paragraph has been added with small changes. Brandon Rhodes gives a good example on his page | |
regarding Semantic Linefeeds. Go look at it now, it's at: | |
https://rhodesmill.org/brandon/2012/one-sentence-per-line/ . | |
The problem I have with Brian Kernighan's advice of "start each sentence on a new line," is that it looks | |
ugly. But I can't deny it's good advice; just look at the way the differ mangled the update in Rhodes' | |
example. | |
My simple solution is to keep all my paragraphs on a single line, but use a program to break them apart | |
before checking them into source control. But you may be asking, what does this have to do with "two | |
spaces after a period?" In English we use periods for many things other than signifiying the end of a | |
sentence. For example, we use periods to denote abbreviations. If I wanted to talk about science fiction | |
writer A. E. van Vogt, who pretty much only used his initials, it would be annoying if my unlining script | |
put the middle 'E' on it's own line. Not impossible to read, of course, but annoying. | |
So my solution was to assume sentences ended with a period and two spaces instead of a period and one | |
space. My line breaking script will get confused if I ever only use one space after a period, but it | |
won't be the end of the world. And when i say "line breaking script," I really mean "invocation of sed." | |
The `sed` command is a handy utility that will convert a period and two spaces into a period and a line | |
feed with this simple command: | |
sed 's/\.\ \ /\.\n/g' input_file.txt > output_file.xtx | |
Going the other way requires only that you reverse the regular expressions in the sed substitution | |
specification: | |
sed 's/\.\n/\.\ \ /g' input_file.xtx > output_file.txt | |
Now there is a (mostly) automated way to edit text files where paragraphs look "correct" to me, but we | |
split out sentences on individual lines so we have something akin to semantic linefeeds. If you're | |
reading this in a github gist, check out the revision history to see if the diffs look more or less | |
comprehensible to you. Also check out the "xtx" file below, whose content should be the same as this | |
file, but with each sentence broken out into its own line. | |
And after uploading this as a gist on github, I noticed the default diff algorithm deleted the two lines | |
of the previous paragraph and then added three lines. This isn't exactly what I was expecting, but it did | |
remind me that I was once interested in diff algorithms. Having not used that information in a couple | |
decades, I have forgotten most of it but was happy to find git provides quite a few options. Look at the | |
'--diff-algorithm' parameter in the git diff command; documentation can be found here at | |
https://git-scm.com/docs/git-diff . For the advanced student, googling "patience diff" and "myers diff" | |
will reveal interesting discussions if you don't already have a textbook that covers it. | |
Finally, I'm adding a "ztz" file to the gist which is the same content, but with newlines at the end of | |
each line. I suspect it will be easier for humans to read, but the diffs will less easy to follow. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment