Created
April 4, 2012 02:13
-
-
Save jsanders/b5094ff6698806f165b9 to your computer and use it in GitHub Desktop.
Rust deserialize XML first crack
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| use std; | |
| import io::reader_util; | |
| enum node { | |
| tag_node({ | |
| name: str, | |
| attributes: [attribute], | |
| children: [node] | |
| }), | |
| text_node(str) | |
| } | |
| type attribute = { | |
| name: str, | |
| value: str | |
| }; | |
| fn is_eof(c: char) -> bool { c == -1 as char } | |
| fn parse_tag_name(rdr: io::reader, first_c: char) -> str { | |
| let mut c = rdr.read_char(); | |
| let mut tag_name = str::from_char(first_c); | |
| while !is_eof(c) && c != '>' { | |
| tag_name += str::from_char(c); | |
| c = rdr.read_char(); | |
| } | |
| ret tag_name; | |
| } | |
| #[doc = "Deserializes an xml node value from an io::reader"] | |
| fn from_reader(rdr: io::reader) -> node { | |
| let mut c = rdr.read_char(); | |
| let mut tag_name = ""; | |
| while !is_eof(c) { | |
| if c == '<' { | |
| c = rdr.read_char(); | |
| if c != '/' { | |
| tag_name = parse_tag_name(rdr, c); | |
| } | |
| } | |
| c = rdr.read_char(); | |
| } | |
| ret tag_node({ name: tag_name, attributes: [], children: [] }); | |
| } | |
| #[doc = "Deserializes an xml node value from a string"] | |
| fn from_str(s: str) -> node { | |
| io::with_str_reader(s, from_reader) | |
| } | |
| #[test] | |
| fn test_empty_tag() { | |
| assert from_str("<tag></tag>") == tag_node({ name: "tag", attributes: [], children: [] }); | |
| } |
Author
Author
Note that some of the middle of my last comment is out of date now, as I've changed the node type to be an enum, but you should compare the two approaches. Both of them seem to introduce one extra type that it doesn't feel like I should need - children in the old version and tag_node in the new version.
I see list([json]) in json implementation, is list a builtin type or would that be useful instead of children?
Author
list is actually just the name of the subtype, and the [json] syntax means that the list subtype is an alias for an array (or vector maybe, not clear on the difference yet) of json types. You can reference json in the subtype because it's an enum. If it were a type, it would give you a recursive type error.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Pretty ugly, but actually parses an empty tag "correctly". I wish gist let you make comments like you can on code in repos, I want to annotate some of this:
This is pretty interesting - below where I have an
io::readerobject and want to use#read_char(oops! adopting ruby naming convention!), I can't, because#read_charis not defined on directly onio::readerbut rather on theio::reader_util"implementation" for anio::readerobject. It reminds me a lot of mixing in functionality to a ruby object only when you need it. Not sure yet whether I can do that import only in the functions that actually use#read_char, but I'm guessing I can. Here's where it's implemented - https://github.com/mozilla/rust/blob/master/src/libcore/io.rs#L42-163These are basically C structs, but you can implement behavior for them with the implementation similarly to the
reader_utilsthing from before, so then they behave more like typical objects (though I have no idea yet about polymorphism.This is how you make a type that can be any of the given sub-types (now that I think of it, this is how you do polymorphism). So in this case I've made a type called
nodesthat is an array ofnodes and a subtype ofchildren, and a type callednonethat is basically untyped (not sure what that means), and is also a subtype ofchildren. I didn't actually want or need thischildrentype, except that thenodetype can't contain any fields with type derived from itself. So I can't just have:like I wanted to. Not sure yet whether that is totally lame or somewhat acceptable. It does seem like tree-like structures are less convenient because of that restriction. Here's the json data type, which I've been cribbing off of - https://github.com/mozilla/rust/blob/master/src/libstd/json.rs#L28-35. It feels a little odd that my types are somewhat less simple in code by virtue of being more simple conceptually (in XML, there really is only one type of node, but I have to have this inconvenient separate type for "children", which should really just be an array of nodes. I guess now that I think of it, there is also a text node, so maybe the
nodetype needs to be an enum anyway and things are simpler.This is pretty nifty and built into the language - if you compile with
--testit creates an executable that runs anything annotated with#[test]instead of running themainfunction like it normally would. Much more convenient for library code.