Skip to content

Instantly share code, notes, and snippets.

@nyarly
Created October 16, 2015 03:51
Show Gist options
  • Save nyarly/7187c109a58118621ffa to your computer and use it in GitHub Desktop.
Save nyarly/7187c109a58118621ffa to your computer and use it in GitHub Desktop.
Parse unparse
Imagine I've defined a grammar G. It's a ... predicate grammar (I think that's the right term)
such that some of it's rules look like:
KeyValue -> KeyName ': ' ValueString { kv_pair: [key, value] }
KeyName -> [A-Za-z]+ { key: <matchstring> }
ValueString -> [^ ]+ { value: <matchstring> }
A fragment from a string in the language is like
judson: confused
When I parse that string with the grammar:
parse("judson: confused", G) -> ["judson", "confused"]
So, far, reasonably textbook stuff, probably made a hash of by half-rememberedness.
What I'd like to be able to do is:
Given the parsed result ["judson", "confused"] and the grammar G,
unparse(["judson", "confused"], G) -> "judson: confused" // for example
I believe but have not undertaken to prove that, given an appropriately restricted perdicate language
(specifically, every {} would need to be a reversible - not necessarily bijective, though), and for any(?) grammar that
can be parsed, it should be possible to produce such an "unparse()" function.
Mostly, I'm not 100% on the restrictions I'd need to place, but I'm close to certain that for some set of restrictions,
it should be possible.
Ultimately, I'm looking to be able to parse(unparse(data)) =~ data,
accepting that
unparse(parse(string) might not = string
I'd like for unparse(data) to run quickly, and produce short output, but it needn't be the shortest output.
This seems like a straightforward question, and given that the formal grammars domain is described in terms of
productions, it seems like a natural one, but I'm stymied trying to find related work.
@johnbender
Copy link

One thing that strikes me as a possible issue: if all the semantic actions in a grammar are bijections, does that make the grammar itself bijective wrt parse/unparse? Well, obviously not, since not every symbol in the string is required to be semantic (e.g. whitespace).

That means your grammar isn't injective because two different strings can result in the same parse tree (e.g. foo_bar and foo__bar. So both the semantic actions and the grammar need to be injective.

WRT white space you could define an equivalence class between strings that are the same without whitespace and then your grammar might be injective for representatives of those equivalence classes.

https://en.wikipedia.org/wiki/Equivalence_class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment