Last active
April 15, 2021 23:56
-
-
Save btray77/5264672 to your computer and use it in GitHub Desktop.
RegEX to parse XML to JSON
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Original from: http://killzonekid.com/worlds-smallest-fastest-xml-to-json-javascript-converter/ | |
// Thanks to Loamhoof for helping get this working! | |
// http://stackoverflow.com/questions/15675352/regex-convert-xml-to-json/15680000 | |
//Load XML into XML variable | |
var regex = /(<\w+[^<]*?)\s+([\w-]+)="([^"]+)">/; | |
while (xml.match(regex)) xml = xml.replace(regex, '<$2>$3</$2>$1>'); | |
xml = xml.replace(/\s/g, ' '). | |
replace(/< *\?[^>]*?\? *>/g, ''). | |
replace(/< *!--[^>]*?-- *>/g, ''). | |
replace(/< *(\/?) *(\w[\w-]+\b):(\w[\w-]+\b)/g, '<$1$2_$3'). | |
replace(/< *(\w[\w-]+\b)([^>]*?)\/ *>/g, '< $1$2>'). | |
replace(/(\w[\w-]+\b):(\w[\w-]+\b) *= *"([^>]*?)"/g, '$1_$2="$3"'). | |
replace(/< *(\w[\w-]+\b)((?: *\w[\w-]+ *= *" *[^"]*?")+ *)>( *[^< ]*?\b.*?)< *\/ *\1 *>/g, '< $1$2 value="$3">'). | |
replace(/< *(\w[\w-]+\b) *</g, '<$1>< '). | |
replace(/> *>/g, '>'). | |
replace(/"/g, '\\"'). | |
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":"$2",'). | |
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":[{$2}],'). | |
replace(/< *(\w[\w-]+\b) *>(?=("\w[\w-]+\b)":\{.*?\},\2)(.*?)< *\/ *\1 *>/, '"$1":{}$3},'). | |
replace(/],\s*?".*?": *\[/g, ','). | |
replace(/< \/(\w[\w-]+\b)\},\{\1>/g, '},{'). | |
replace(/< *(\w[\w-]+\b)[^>]*?>/g, '"$1":{'). | |
replace(/< *\/ *\w[\w-]+ *>/g, '},'). | |
replace(/\} *,(?= *(\}|\]))/g, '}'). | |
replace(/] *,(?= *(\}|\]))/g, ']'). | |
replace(/" *,(?= *(\}|\]))/g, '"'). | |
replace(/ *, *$/g, ''); | |
xml = '{' + xml + '}'; |
Propose small change. If theare nodes like , it will not work. Shoud add this:
First line:
var regex1 = /<(\w+)\/>/;
And this:
while (xml.match(regex1)) xml = xml.replace(regex1, '<$1></$1>');
before
while (xml.match(regex)) xml = xml.replace(regex, '<$2>$3</$2>$1>');
Attach with ASCI character..
First line
var rx = new RegExp(" ", 'g');
xml = xml.replace(rx, "");
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I guess this only works in very specific/rare cases where the XML is build only of nodes...
since values and attributes gets jumble-up with this algorithm...
gives:
there is a need to distinguish both the node's value,
and attributes (this requires adding some more structure to the tree... starting names with @ helps to know they have a "special job" rather than being a node name )
I took the idea regarding the attributes from the way php does it (https://stackoverflow.com/questions/8830599/php-convert-xml-to-json/19391553#19391553).
eventually I've created a php script for that, that includes both @attributes for inline-attributes and @text for inline-text of a tag, workaround php built-in stuff related to json encoding and got this: https://gist.github.com/eladkarako/6047cffb825a067524f9dcb65536c23f#file-xml2json-that-always-add-attributes-to-json-using-simplexml_load_string-and-json_encode-with-no-external-files-workaround-php