jenny Sat May 21 15:09:10 2016
Venting about default XML namespaces.
library(xml2)
xml_lines <- c(
"<root ",
"xmlns=\"http://default.com\" ",
"xmlns:foo=\"http://foo.com\">",
" <thing>one</thing>",
" <thing>two</thing>",
" <foo:stuff>three</foo:stuff>",
"</root>"
)
The typical current situation when there is a default namespace. You can't write the "simple" XPath you expect, nor can you presume d1
as alias for default namespace. You have to specify the namespace explicitly, thought I realize xml2
will soon make this less clunky.
x <- read_xml(paste(xml_lines, collapse = ""))
xml_find_all(x, "thing")
#> {xml_nodeset (0)}
xml_find_all(x, "d1:thing") # I realize this will work very soon
#> Warning in xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns,
#> num_results = Inf): Undefined namespace prefix [1219]
#> {xml_nodeset (0)}
xml_find_all(x, "d1:thing", xml_ns(x))
#> {xml_nodeset (2)}
#> [1] <thing>one</thing>
#> [2] <thing>two</thing>
I have fantasies about burning the default namespace with fire. If we make up an arbitrary prefix for it anyway, how is it any worse to simply make it disappear?
x2 <- read_xml(paste(xml_lines[-2], collapse = ""))
xml_find_all(x2, "thing") # OMG YESSSSSS
#> {xml_nodeset (2)}
#> [1] <thing>one</thing>
#> [2] <thing>two</thing>
A similar effect can be achieved by giving it a prefix in the same "real" way as the other namespaces are handled.
xml_lines_patched <- xml_lines
xml_lines_patched[2] <- "xmlns:whatever=\"http://default.com\" "
x3 <- read_xml(paste(xml_lines_patched, collapse = ""))
xml_find_all(x3, "thing") # also deeply satisfying
#> {xml_nodeset (2)}
#> [1] <thing>one</thing>
#> [2] <thing>two</thing>
Weird little example of why this would be handy: The xmlview
package provides an htmlwidget for pretty printing of XML. It even allows you to enter an XPath expression, see the result, and generate the corresponding xml2::xml_find_all(...)
code. But unfortunately the package itself does not finesse the default namespace (hrbrmstr/xmlview#10) and so it can't currently can't be used to access nodes in the default namespace. But if I could exert my will over the default namespace on the R/xml2 side, I could work around this.