Skip to content

Instantly share code, notes, and snippets.

@patrickmj
Created March 5, 2015 19:37
Show Gist options
  • Save patrickmj/b8709a7fb0a03b7d481e to your computer and use it in GitHub Desktop.
Save patrickmj/b8709a7fb0a03b7d481e to your computer and use it in GitHub Desktop.
Extract text from ALTO xml files
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:a="http://www.loc.gov/standards/alto/ns-v2#"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
>
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates select="/a:alto/a:Layout/a:Page/a:PrintSpace"></xsl:apply-templates>
</xsl:template>
<xsl:template match="a:PrintSpace">
<xsl:apply-templates></xsl:apply-templates>
</xsl:template>
<xsl:template match="a:ComposedBlock">
<xsl:text>&#xa;</xsl:text>
<xsl:apply-templates></xsl:apply-templates>
</xsl:template>
<xsl:template match="a:TextBlock">
<xsl:text>&#xa;</xsl:text>
<xsl:text>&#xa;</xsl:text>
<xsl:apply-templates></xsl:apply-templates>
</xsl:template>
<xsl:template match="a:TextLine">
<xsl:text>&#xa;</xsl:text>
<xsl:apply-templates></xsl:apply-templates>
</xsl:template>
<xsl:template match="a:String">
<xsl:value-of select="@CONTENT"/>
<xsl:apply-templates select="@SUBS_CONTENT"></xsl:apply-templates>
</xsl:template>
<xsl:template match="a:SP">
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="a:HYP">
<xsl:value-of select="@CONTENT"/>
</xsl:template>
<xsl:template match="@SUBS_CONTENT">
<xsl:text> </xsl:text>[ <xsl:value-of select="."/> ]<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment