Skip to content

Instantly share code, notes, and snippets.

@jennybc
Created May 21, 2016 22:12
Show Gist options
  • Save jennybc/bbe4de369e8d3c9621c2b43949223b3b to your computer and use it in GitHub Desktop.
Save jennybc/bbe4de369e8d3c9621c2b43949223b3b to your computer and use it in GitHub Desktop.
xml default namespace rage

xml-default-namespace-rage.R

jenny Sat May 21 15:09:10 2016

Venting about default XML namespaces.

library(xml2)
xml_lines <- c(
  "<root ",
  "xmlns=\"http://default.com\" ",
  "xmlns:foo=\"http://foo.com\">",
  "  <thing>one</thing>",
  "  <thing>two</thing>",
  "  <foo:stuff>three</foo:stuff>",
  "</root>"
)

The typical current situation when there is a default namespace. You can't write the "simple" XPath you expect, nor can you presume d1 as alias for default namespace. You have to specify the namespace explicitly, thought I realize xml2 will soon make this less clunky.

x <- read_xml(paste(xml_lines, collapse = ""))
xml_find_all(x, "thing")
#> {xml_nodeset (0)}
xml_find_all(x, "d1:thing") # I realize this will work very soon
#> Warning in xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns,
#> num_results = Inf): Undefined namespace prefix [1219]
#> {xml_nodeset (0)}
xml_find_all(x, "d1:thing", xml_ns(x))
#> {xml_nodeset (2)}
#> [1] <thing>one</thing>
#> [2] <thing>two</thing>

I have fantasies about burning the default namespace with fire. If we make up an arbitrary prefix for it anyway, how is it any worse to simply make it disappear?

x2 <- read_xml(paste(xml_lines[-2], collapse = ""))
xml_find_all(x2, "thing") # OMG YESSSSSS
#> {xml_nodeset (2)}
#> [1] <thing>one</thing>
#> [2] <thing>two</thing>

A similar effect can be achieved by giving it a prefix in the same "real" way as the other namespaces are handled.

xml_lines_patched <- xml_lines
xml_lines_patched[2] <- "xmlns:whatever=\"http://default.com\" "
x3 <- read_xml(paste(xml_lines_patched, collapse = ""))
xml_find_all(x3, "thing") # also deeply satisfying
#> {xml_nodeset (2)}
#> [1] <thing>one</thing>
#> [2] <thing>two</thing>

Weird little example of why this would be handy: The xmlview package provides an htmlwidget for pretty printing of XML. It even allows you to enter an XPath expression, see the result, and generate the corresponding xml2::xml_find_all(...) code. But unfortunately the package itself does not finesse the default namespace (hrbrmstr/xmlview#10) and so it can't currently can't be used to access nodes in the default namespace. But if I could exert my will over the default namespace on the R/xml2 side, I could work around this.

#' ---
#' output: github_document
#' ---
#+ setup, include = FALSE
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
error = TRUE
)
#' Venting about default XML namespaces.
library(xml2)
xml_lines <- c(
"<root ",
"xmlns=\"http://default.com\" ",
"xmlns:foo=\"http://foo.com\">",
" <thing>one</thing>",
" <thing>two</thing>",
" <foo:stuff>three</foo:stuff>",
"</root>"
)
#' The typical current situation when there is a default namespace. You can't
#' write the "simple" XPath you expect, nor can you presume `d1` as alias for
#' default namespace. You have to specify the namespace explicitly, thought I
#' realize `xml2` will soon make this less clunky.
x <- read_xml(paste(xml_lines, collapse = ""))
xml_find_all(x, "thing")
xml_find_all(x, "d1:thing") # I realize this will work very soon
xml_find_all(x, "d1:thing", xml_ns(x))
#' I have fantasies about burning the default namespace with fire. If we make up
#' an arbitrary prefix for it anyway, how is it any worse to simply make
#' it disappear?
x2 <- read_xml(paste(xml_lines[-2], collapse = ""))
xml_find_all(x2, "thing") # OMG YESSSSSS
#' A similar effect can be achieved by giving it a prefix in the same "real" way
#' as the other namespaces are handled.
xml_lines_patched <- xml_lines
xml_lines_patched[2] <- "xmlns:whatever=\"http://default.com\" "
x3 <- read_xml(paste(xml_lines_patched, collapse = ""))
xml_find_all(x3, "thing") # also deeply satisfying
#' Weird little example of why this would be handy: The
#' [`xmlview`](https://github.com/hrbrmstr/xmlview) package provides an
#' htmlwidget for pretty printing of XML. It even allows you to enter an XPath
#' expression, see the result, and generate the corresponding
#' `xml2::xml_find_all(...)` code. But unfortunately the package itself does not
#' finesse the default namespace (https://github.com/hrbrmstr/xmlview/issues/10)
#' and so it can't currently can't be used to access nodes in the default
#' namespace. But if I could exert my will over the default namespace on the
#' R/xml2 side, I could work around this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment