Skip to content

Instantly share code, notes, and snippets.

@btray77
Last active April 15, 2021 23:56
Show Gist options
  • Save btray77/5264672 to your computer and use it in GitHub Desktop.
Save btray77/5264672 to your computer and use it in GitHub Desktop.
RegEX to parse XML to JSON
// Original from: http://killzonekid.com/worlds-smallest-fastest-xml-to-json-javascript-converter/
// Thanks to Loamhoof for helping get this working!
// http://stackoverflow.com/questions/15675352/regex-convert-xml-to-json/15680000
//Load XML into XML variable
var regex = /(<\w+[^<]*?)\s+([\w-]+)="([^"]+)">/;
while (xml.match(regex)) xml = xml.replace(regex, '<$2>$3</$2>$1>');
xml = xml.replace(/\s/g, ' ').
replace(/< *\?[^>]*?\? *>/g, '').
replace(/< *!--[^>]*?-- *>/g, '').
replace(/< *(\/?) *(\w[\w-]+\b):(\w[\w-]+\b)/g, '<$1$2_$3').
replace(/< *(\w[\w-]+\b)([^>]*?)\/ *>/g, '< $1$2>').
replace(/(\w[\w-]+\b):(\w[\w-]+\b) *= *"([^>]*?)"/g, '$1_$2="$3"').
replace(/< *(\w[\w-]+\b)((?: *\w[\w-]+ *= *" *[^"]*?")+ *)>( *[^< ]*?\b.*?)< *\/ *\1 *>/g, '< $1$2 value="$3">').
replace(/< *(\w[\w-]+\b) *</g, '<$1>< ').
replace(/> *>/g, '>').
replace(/"/g, '\\"').
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":"$2",').
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":[{$2}],').
replace(/< *(\w[\w-]+\b) *>(?=("\w[\w-]+\b)":\{.*?\},\2)(.*?)< *\/ *\1 *>/, '"$1":{}$3},').
replace(/],\s*?".*?": *\[/g, ',').
replace(/< \/(\w[\w-]+\b)\},\{\1>/g, '},{').
replace(/< *(\w[\w-]+\b)[^>]*?>/g, '"$1":{').
replace(/< *\/ *\w[\w-]+ *>/g, '},').
replace(/\} *,(?= *(\}|\]))/g, '}').
replace(/] *,(?= *(\}|\]))/g, ']').
replace(/" *,(?= *(\}|\]))/g, '"').
replace(/ *, *$/g, '');
xml = '{' + xml + '}';
Copy link

ghost commented Aug 13, 2020

I guess this only works in very specific/rare cases where the XML is build only of nodes...
since values and attributes gets jumble-up with this algorithm...

var xml = '<xml><liquids><liquid color="white">milk</liquid></liquids></xml>'

gives:

"{"xml":{"liquids":[{"color":"white","liquid":"milk"}]}}"

there is a need to distinguish both the node's value,
and attributes (this requires adding some more structure to the tree... starting names with @ helps to know they have a "special job" rather than being a node name )

{xml: liquids: [ {liquid:{"@value":"milk"
                                  ,"@attributes":{color:"white"}
                                  }
                        }
                      ]
}

I took the idea regarding the attributes from the way php does it (https://stackoverflow.com/questions/8830599/php-convert-xml-to-json/19391553#19391553).


eventually I've created a php script for that, that includes both @attributes for inline-attributes and @text for inline-text of a tag, workaround php built-in stuff related to json encoding and got this: https://gist.github.com/eladkarako/6047cffb825a067524f9dcb65536c23f#file-xml2json-that-always-add-attributes-to-json-using-simplexml_load_string-and-json_encode-with-no-external-files-workaround-php

@vryzhevsky
Copy link

Propose small change. If theare nodes like , it will not work. Shoud add this:
First line:
var regex1 = /<(\w+)\/>/;

And this:
while (xml.match(regex1)) xml = xml.replace(regex1, '<$1></$1>');
before
while (xml.match(regex)) xml = xml.replace(regex, '<$2>$3</$2>$1>');

@addvisor-app
Copy link

Attach with ASCI character..

First line

        var rx = new RegExp("&#13;  ", 'g');
        xml = xml.replace(rx, "");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment