Skip to content

Instantly share code, notes, and snippets.

@flbuddymooreiv
Last active November 21, 2018 22:18
Show Gist options
  • Save flbuddymooreiv/6e3fc80c9c118178d809 to your computer and use it in GitHub Desktop.
Save flbuddymooreiv/6e3fc80c9c118178d809 to your computer and use it in GitHub Desktop.
Logical XML Diff

If you need to compare two XML documents, but certain elements are not in the same order between the two documents, and order of elements does not matter (i.e. the elements contain collections rather than lists), here is a nice way to do it.

Consider the following files:

user@host:~/xmldiff$ cat a.xml
<?xml version="1.0"?>
<Element xmlns="https://mynamespace.com/path">
  <SubElement>
    <Items>
      <Item>
        <Field1>Value 1</Field1>
        <Field2>Value 2</Field2>
        <Field3>Value 3</Field3>
      </Item>
    </Items>
  </SubElement>
</Element>

user@host:~/xmldiff$ cat b.xml 
<?xml version="1.0"?>
<Element xmlns="https://mynamespace.com/path">
  <SubElement>
    <Items>
      <Item>
        <Field1>Value 1</Field1>
        <Field3>Value 3</Field3>
        <Field2>Value 2</Field2>
      </Item>
    </Items>
  </SubElement>
</Element>

They do not diff cleanly:

user@host:~/xmldiff$ diff a.xml b.xml 
7d6
<         <Field2>Value 2</Field2>
8a8
>         <Field2>Value 2</Field2>

But what if the order of Field1, Field2, and Field3 within Item does not matter. This is problematic.

We can use this XSL template to reformat the XML:

user@host:~/xmldiff$ cat xmlsort.xsl 
<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" />

  <xsl:template match="*">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates select="*">
        <xsl:sort select="(@name | text())[1]" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Apply it with xsltproc:

user@host:~/xmldiff$ xsltproc xmlsort.xsl a.xml 
<?xml version="1.0"?>
<Element xmlns="https://mynamespace.com/path"><SubElement><Items><Item><Field1/><Field2/><Field3/></Item></Items></SubElement></Element>

user@host:~/xmldiff$ xsltproc xmlsort.xsl b.xml 
<?xml version="1.0"?>
<Element xmlns="https://mynamespace.com/path"><SubElement><Items><Item><Field1/><Field2/><Field3/></Item></Items></SubElement></Element>

Now feed the outputs to diff:

user@host:~/xmldiff$ diff <(xsltproc xmlsort.xsl a.xml)  <(xsltproc xmlsort.xsl b.xml)
user@host:~/xmldiff$ echo $?
0

And we're done! Remember, in certain XML documents, order might matter, and this will not be a valid diff, but if all of your data is a collection data structure rather than a list, this may work well for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment