Skip to content

Instantly share code, notes, and snippets.

@edouard-lopez
Last active December 22, 2015 05:49
Show Gist options
  • Save edouard-lopez/6426512 to your computer and use it in GitHub Desktop.
Save edouard-lopez/6426512 to your computer and use it in GitHub Desktop.
Slow processing when using saxon9HE.jar (20 files generated on ~40s, 1 every 2s).
scriptDir="$(dirname "$0")" # emplacement du script
. "$scriptDir"/envrc # project variables
inputFile="${2:-"$HPF_UNIHAN_READING_SHORT"}"
while IFS=';' read -r unicode hanzi pinyin;
do
outputFile="$HPF_SVGTEXT_DIR/$hanzi-x${unicode#U+*}.svg"
printf "creating: %s\n" "$outputFile"
xsltproc -o "$outputFile" \
--stringparam unicode "$unicode" \
--stringparam hanzi "$hanzi" \
--stringparam pinyin "$pinyin" \
"$HPF_XSLT_CSV2SVG" "$HPF_TPL_SVGTEXT"
done < "$inputFile"
<?xml version="1.0" encoding="UTF-8"?>
<!--
@description
Place hanzi and pinyin in the correct element
@upstream: true
-->
<xsl:stylesheet version="2.0"
xmlns="http://www.w3.org/2000/svg"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#default xsl xs"
>
<!-- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<!-- ++++++++++++++++++++++++++ CONSTANT +++++++++++++++++++++++++ -->
<!-- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<xsl:variable name="emptyString" select="''" />
<!-- When we want to strip content using a CLASS -->
<xsl:param name="hanzi" select="NO-HANZI" />
<xsl:param name="pinyin" select="NO-PINYIN" />
<!-- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<!-- ++++++++++++++++++++++++++ TEMPLATE +++++++++++++++++++++++++ -->
<!-- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<xsl:template match="/">
<xsl:message>WIP: <xsl:value-of select="$hanzi" />/<xsl:value-of select="$pinyin" />/</xsl:message>
<!-- font/glyph/@glyph-name -->
<!-- font/glyph/@unicode -->
<xsl:apply-templates />
</xsl:template>
<xsl:template match="*[@id='hanzi-glyph']/text()">
<xsl:value-of select="$hanzi" />
</xsl:template>
<xsl:template match="*[@id='pinyin-text']">
<xsl:value-of select="$pinyin" />
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
U+3400;㐀;qiū
U+3401;㐁;tiàn
U+3404;㐄;kuà
U+3405;㐅;wǔ
U+3406;㐆;yǐn
U+340C;㐌;yí
U+3416;㐖;xié
U+341C;㐜;chóu
U+3421;㐡;nuò
U+3424;㐤;dān
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@edouard-lopez
Copy link
Author

I end up using xsltproc instead of saxon:

xsltproc -o "$outputFile" \
  --stringparam unicode "$unicode" \
  --stringparam hanzi "$hanzi" \
  --stringparam pinyin "$pinyin" \
  "$HPF_XSLT_CSV2SVG" "$HPF_TPL_SVGTEXT"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment