Last active
May 28, 2020 03:06
-
-
Save bbtdev/aa623b4b25902ba925303170a20e1cb1 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In the man page, we find out that word splitting is an expansion: | |
"There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, | |
command substitution, arithmetic expansion, WORD SPLITTING*, and pathname expansion.* | |
Also it describes that this expansion acts on: "results of parameter expansion, command substitution,and arithmetic | |
expansion that did not occur within double quotes". | |
Confirmed also by the BashGuide: "Word splitting is performed on the results of almost all unquoted expansions." | |
In both the guide and man page is specified that Word splitting is dependent on the IFS variable: | |
"The result of the expansion is broken into separate words based on the characters of the IFS variable." | |
Until now, everything is well explained, made sense, it's interesting etc. | |
But in the BashGuide you see the process when the command line is initially split into words based on | |
whitespace described as world splitting on numerous occasions, for example: | |
"The shell takes your line of code and cuts it up into bits wherever there are sequences of syntactical whitespace. | |
The command above would be split up into the following: | |
rm myfile myotherfile | |
^ ^ | |
[rm] [myfile] [myotherfile] | |
As you can see, all syntactical whitespace has been removed. There is no more whitespace left after word splitting | |
is done with your line." | |
Many describe this process as tokenization, and rightfully so, since IFS variable is not involved here and | |
it does not follow the definition of word splitting from man page. | |
Wished this terminology (tokenization) was used instead, because naming the same, | |
two different processes, where the difference are not obvious might be confusing. | |
At least for me it was, it took me a few hours. | |
FROM IRC: | |
the behaviour of word splitting is well documented, but the | |
tokenisation of the shell's input - which is not "word splitting" | |
in any formal sense and certainly does not hinge upon the value of | |
IFS - occurs much earlier and is not so well documented. | |
even the info pages gloss over the details of tokenisation. it's | |
briefly touched upon here: | |
https://www.gnu.org/software/bash/manual/html_node/Shell-Syntax.html#Shell-Syntax | |
that's the "cuts it up into bits" stage, well before expansions may occur. | |
and word splitting, for that matter. | |
on the upside, the node that talks about word splitting is very !!! he is taling about http://mywiki.wooledge.org/WordSplitting | |
specific as to how it works. | |
the key thing to remember is that, if no other forms of documented | |
expansion have occurred up to the point at which word splitting is | |
on the cards, then no splitting will occur. if it does then the | |
value of IFS is, of course, relevant. | |
as an aside, I think it's perhaps not ideal that word splitting is | |
initially presented as if it were a form of expansion. if you look | |
at the explanation of it, it becomes clear that it's something | |
that acts on other expansions and the tone shifts. | |
if you haven't read the documentation for the Shell | |
Command Language, you probably should. it explains the | |
tokenisation process in a satisfactory manner and should apply to | |
bash, for the most part. | |
they also use the term "field splitting" rather than "word | |
splitting" (shrug). | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment