Skip to content

Instantly share code, notes, and snippets.

@osima
Created October 11, 2010 23:43
Show Gist options
  • Select an option

  • Save osima/621419 to your computer and use it in GitHub Desktop.

Select an option

Save osima/621419 to your computer and use it in GitHub Desktop.
ウェブページのタイトルを取得
@Grab(group='nekohtml', module='nekohtml', version='1.9.6')
import org.cyberneko.html.parsers.SAXParser
def title = {
def t = null
new XmlSlurper(new SAXParser()).parse(it).'**'.findAll{
if( it.name() == 'TITLE' ){ t = it.toString() }
}
t.trim()
}
def encode = { URL url->
def encode = null
def r = new BufferedReader(new InputStreamReader(url.openStream()) )
while( true ){
def line = r.readLine()
if( line == null ){
break
}
if( (line =~ /meta/ || line =~ /META/) && line =~ /charset/ ){
//println line
def pat = java.util.regex.Pattern.compile('charset=(.*)"')
def m = pat.matcher(line)
if( m.find() ){
encode = m.group(1)
}
break
}
}
r.close()
encode
}
def url = new URL(args[0])
def enc = encode(url)
if( enc==null ){ enc = 'UTF-8' }
def r = new BufferedReader(new InputStreamReader(url.openStream(),enc) )
println title( r )
r.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment