Skip to content

Instantly share code, notes, and snippets.

@kanemu
Created July 1, 2011 03:57
Show Gist options
  • Select an option

  • Save kanemu/1057843 to your computer and use it in GitHub Desktop.

Select an option

Save kanemu/1057843 to your computer and use it in GitHub Desktop.
[groovy]jdomでnekohtmlを使う
@Grab(group='nekohtml', module='nekohtml', version='1.9.6.2')
@Grab(group='org.jdom', module='jdom', version='1.1')
import org.cyberneko.html.HTMLConfiguration
import org.apache.xerces.parsers.DOMParser
import org.xml.sax.InputSource
import org.jdom.input.DOMBuilder
import org.jdom.Document
import org.jdom.Element
import org.jdom.filter.ElementFilter
File htmlFile = new File('/path/to/form.html')
//xercesを準備、nekohtmlのconfigをセット
DOMParser parser = new DOMParser(new HTMLConfiguration())
parser.setProperty("http://cyberneko.org/html/properties/names/elems","lower")
//xercesに食わせる
InputSource source = new InputSource(htmlFile.newReader('UTF8'))
parser.parse(source)
//jdomでDocumentを作る
DOMBuilder builder = new DOMBuilder()
Document doc = builder.build(parser.getDocument())
//form要素を抜き出してeach
Element root = doc.rootElement
ElementFilter formFilter = new ElementFilter('form')
root.getDescendants(formFilter).each{form->
println form
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment