Command Line that print entities (like & or ") in way that can be paste into <!DOCTYPE if xml file don't define them and parser return error, that entity is not defined. Command check if Entity is not already defined.
Created
September 16, 2012 13:30
-
-
Save jcubic/3732462 to your computer and use it in GitHub Desktop.
Print missing html entities from XML file in DOCTYPE ready to paste format
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ALIGN=18;grep -oE '&[^#][^;]+;' foo.xml | sort | uniq | while read entity; do name=$(echo -n $entity | sed -e 's/[;&]//g');grep ENTITY foo.xml | grep " $name " > /dev/null || echo '<!ENTITY '$(echo -n $entity | html2text | perl -e "printf '$name %'.($ALIGN-(length '$name')).'s;\'>','\'&#'.(ord <>);"); done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!ENTITY aacute 'Ã'> | |
<!ENTITY acute 'Â'> | |
<!ENTITY aelig 'Ã'> | |
<!ENTITY agrave 'Ã'> | |
<!ENTITY alpha 'Î'> | |
<!ENTITY amp '&'> | |
<!ENTITY apos '&'> | |
<!ENTITY auml 'Ã'> | |
<!ENTITY eacute 'Ã'> | |
<!ENTITY Eacute 'Ã'> | |
<!ENTITY ecirc 'Ã'> | |
<!ENTITY egrave 'Ã'> | |
<!ENTITY empty '&'> | |
<!ENTITY gt '>'> | |
<!ENTITY hArr '&'> | |
<!ENTITY iota 'Î'> | |
<!ENTITY iuml 'Ã'> | |
<!ENTITY kappa 'Î'> | |
<!ENTITY lambda 'Î'> | |
<!ENTITY Lambda 'Î'> | |
<!ENTITY lang '&'> | |
<!ENTITY larr 'â'> | |
<!ENTITY ldquo '&'> | |
<!ENTITY lsquo '&'> | |
<!ENTITY lt '<'> | |
<!ENTITY mdash '&'> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment