Skip to content

Instantly share code, notes, and snippets.

@sguzman
Created March 9, 2022 16:35
Show Gist options
  • Save sguzman/d30cae6bd9586048391c4d4df0dd3d08 to your computer and use it in GitHub Desktop.
Save sguzman/d30cae6bd9586048391c4d4df0dd3d08 to your computer and use it in GitHub Desktop.
Script to parse US Code files
Map[
Cases[
Import[#, "XML"],
XMLElement[tag_,
{___, "identifier" -> id_}, {
___,
XMLElement[
Pattern[container,
"content"
| "heading"
| "chapeau"
], _, {Pattern[content, XMLElement[_, _, {text_}]], ___}],
___
}]
:> {
"id" ->
StringReplace[StringJoin[id, "-", tag, "-", container],
"/" -> "-"],
"categories" -> StringSplit[id, "/"],
"tag" -> tag,
"container" -> container,
"text" -> text
}
, \[Infinity]] &,
FileNames[__ ~~ ".xml", "/home/sguzman/Code/docs/us-code"]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment