Created
January 27, 2011 14:06
-
-
Save beniwohli/798546 to your computer and use it in GitHub Desktop.
XSLT to convert http://www.w3.org/Math/characters/unicode.xml into Python Dictionary
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="ISO-8859-1"?> | |
<xsl:stylesheet | |
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" | |
xmlns:xs="http://www.w3.org/2001/XMLSchema" | |
version="1.0"> | |
<xsl:template match="/charlist"> | |
<xsl:text> | |
unicode_to_latex = { | |
</xsl:text> | |
<xsl:for-each select="character"> | |
<xsl:variable name="codepoint" select="./@id"/> | |
<xsl:if test="string-length(latex)>1"> | |
<xsl:text> u"\u</xsl:text><xsl:value-of select="substring($codepoint, 3)" /><xsl:text>": "</xsl:text><xsl:value-of select="replace(replace(latex, '\\', '\\\\'), '"', '\\"')"/><xsl:text>", | |
</xsl:text> | |
</xsl:if> | |
</xsl:for-each> | |
<xsl:text>} | |
</xsl:text> | |
</xsl:template> | |
</xsl:stylesheet> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
FYI, this gives incorrect unicode chars for codepoints >= 0x10000 due to substring($codepoint, 3) removing a digit. For example:
should instead be:
It also fails for composed characters with more than one codepoint. I don't know enough XSLT to fix it, unfortunately. I may just have to use xml.etree.ElementTree to do this.
-Geoff