Frameworks like llama.cpp support context-free grammars to restrict the output of a large language model to a specific format.
The specification for the xml2rfc format comes with a RELAX NG schema that describes this particular format.
The RELAX NG specification defines its semantics based on a simpler format called the simple syntax. Some more advanced constructs are basically just syntactic sugar in this sense.
There are tools that convert the full format into the simple syntax.
The simple syntax is very easy to work for for all kinds of purposes.
We can make a formal grammar for a concrete XML format easily
html = start-html (head body) final-html
head = start-head (title) final-head
title = start-title "" final-title
body = start-body *(div / p) final-body
div = start-div *(div / p) final-div
p = start-p "" final-p
Throw in attributes and such as appropriate.
So, take the RNC schema from the xml2rfc RFC. Convert it from compact syntax to XML-based simple syntax. Transform that into a context-free grammar in the form as above. Write a system prompt for the large language model tasking it with the conversion. Priorities would be to preserve the wording exactly and the formatting of ascii art diagrams and similar constructs exactly.
Constrain its output with the CFG for the XML format. If the grammar mechanism works properly, the result should be a valid xml2rfc file which (modulo some issues like that there might not be an ID for each IDREF or whatever internal linking mechanism exists in xml2rfc). That can then be put through the converter to generate plain text files again.
Diff the results, possibly with re-wrapping tolerant settings.