Created
April 4, 2012 02:13
-
-
Save jsanders/b5094ff6698806f165b9 to your computer and use it in GitHub Desktop.
Rust deserialize XML first crack
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
use std; | |
import io::reader_util; | |
enum node { | |
tag_node({ | |
name: str, | |
attributes: [attribute], | |
children: [node] | |
}), | |
text_node(str) | |
} | |
type attribute = { | |
name: str, | |
value: str | |
}; | |
fn is_eof(c: char) -> bool { c == -1 as char } | |
fn parse_tag_name(rdr: io::reader, first_c: char) -> str { | |
let mut c = rdr.read_char(); | |
let mut tag_name = str::from_char(first_c); | |
while !is_eof(c) && c != '>' { | |
tag_name += str::from_char(c); | |
c = rdr.read_char(); | |
} | |
ret tag_name; | |
} | |
#[doc = "Deserializes an xml node value from an io::reader"] | |
fn from_reader(rdr: io::reader) -> node { | |
let mut c = rdr.read_char(); | |
let mut tag_name = ""; | |
while !is_eof(c) { | |
if c == '<' { | |
c = rdr.read_char(); | |
if c != '/' { | |
tag_name = parse_tag_name(rdr, c); | |
} | |
} | |
c = rdr.read_char(); | |
} | |
ret tag_node({ name: tag_name, attributes: [], children: [] }); | |
} | |
#[doc = "Deserializes an xml node value from a string"] | |
fn from_str(s: str) -> node { | |
io::with_str_reader(s, from_reader) | |
} | |
#[test] | |
fn test_empty_tag() { | |
assert from_str("<tag></tag>") == tag_node({ name: "tag", attributes: [], children: [] }); | |
} |
Note that some of the middle of my last comment is out of date now, as I've changed the node
type to be an enum
, but you should compare the two approaches. Both of them seem to introduce one extra type that it doesn't feel like I should need - children
in the old version and tag_node
in the new version.
I see list([json])
in json implementation, is list
a builtin type or would that be useful instead of children
?
list
is actually just the name of the subtype, and the [json]
syntax means that the list
subtype is an alias for an array (or vector maybe, not clear on the difference yet) of json
types. You can reference json
in the subtype because it's an enum
. If it were a type
, it would give you a recursive type error.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Pretty ugly, but actually parses an empty tag "correctly". I wish gist let you make comments like you can on code in repos, I want to annotate some of this:
This is pretty interesting - below where I have an
io::reader
object and want to use#read_char
(oops! adopting ruby naming convention!), I can't, because#read_char
is not defined on directly onio::reader
but rather on theio::reader_util
"implementation" for anio::reader
object. It reminds me a lot of mixing in functionality to a ruby object only when you need it. Not sure yet whether I can do that import only in the functions that actually use#read_char
, but I'm guessing I can. Here's where it's implemented - https://github.com/mozilla/rust/blob/master/src/libcore/io.rs#L42-163These are basically C structs, but you can implement behavior for them with the implementation similarly to the
reader_utils
thing from before, so then they behave more like typical objects (though I have no idea yet about polymorphism.This is how you make a type that can be any of the given sub-types (now that I think of it, this is how you do polymorphism). So in this case I've made a type called
nodes
that is an array ofnode
s and a subtype ofchildren
, and a type callednone
that is basically untyped (not sure what that means), and is also a subtype ofchildren
. I didn't actually want or need thischildren
type, except that thenode
type can't contain any fields with type derived from itself. So I can't just have:like I wanted to. Not sure yet whether that is totally lame or somewhat acceptable. It does seem like tree-like structures are less convenient because of that restriction. Here's the json data type, which I've been cribbing off of - https://github.com/mozilla/rust/blob/master/src/libstd/json.rs#L28-35. It feels a little odd that my types are somewhat less simple in code by virtue of being more simple conceptually (in XML, there really is only one type of node, but I have to have this inconvenient separate type for "children", which should really just be an array of nodes. I guess now that I think of it, there is also a text node, so maybe the
node
type needs to be an enum anyway and things are simpler.This is pretty nifty and built into the language - if you compile with
--test
it creates an executable that runs anything annotated with#[test]
instead of running themain
function like it normally would. Much more convenient for library code.