Created
October 16, 2015 03:51
-
-
Save nyarly/7187c109a58118621ffa to your computer and use it in GitHub Desktop.
Parse unparse
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Imagine I've defined a grammar G. It's a ... predicate grammar (I think that's the right term) | |
| such that some of it's rules look like: | |
| KeyValue -> KeyName ': ' ValueString { kv_pair: [key, value] } | |
| KeyName -> [A-Za-z]+ { key: <matchstring> } | |
| ValueString -> [^ ]+ { value: <matchstring> } | |
| A fragment from a string in the language is like | |
| judson: confused | |
| When I parse that string with the grammar: | |
| parse("judson: confused", G) -> ["judson", "confused"] | |
| So, far, reasonably textbook stuff, probably made a hash of by half-rememberedness. | |
| What I'd like to be able to do is: | |
| Given the parsed result ["judson", "confused"] and the grammar G, | |
| unparse(["judson", "confused"], G) -> "judson: confused" // for example | |
| I believe but have not undertaken to prove that, given an appropriately restricted perdicate language | |
| (specifically, every {} would need to be a reversible - not necessarily bijective, though), and for any(?) grammar that | |
| can be parsed, it should be possible to produce such an "unparse()" function. | |
| Mostly, I'm not 100% on the restrictions I'd need to place, but I'm close to certain that for some set of restrictions, | |
| it should be possible. | |
| Ultimately, I'm looking to be able to parse(unparse(data)) =~ data, | |
| accepting that | |
| unparse(parse(string) might not = string | |
| I'd like for unparse(data) to run quickly, and produce short output, but it needn't be the shortest output. | |
| This seems like a straightforward question, and given that the formal grammars domain is described in terms of | |
| productions, it seems like a natural one, but I'm stymied trying to find related work. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
That means your grammar isn't injective because two different strings can result in the same parse tree (e.g.
foo_barandfoo__bar. So both the semantic actions and the grammar need to be injective.WRT white space you could define an equivalence class between strings that are the same without whitespace and then your grammar might be injective for representatives of those equivalence classes.
https://en.wikipedia.org/wiki/Equivalence_class