Skip to content

Instantly share code, notes, and snippets.

@mpfund
Created April 14, 2015 05:31
Show Gist options
  • Save mpfund/d4538234765f3edb6ea0 to your computer and use it in GitHub Desktop.
Save mpfund/d4538234765f3edb6ea0 to your computer and use it in GitHub Desktop.
parse html & links href
import Network.HTTP
import Text.HTML.TagSoup
openURL x = getResponseBody =<< simpleHTTP (getRequest x)
extractLinks ((TagOpen "a" as):xs) = as:extractLinks(xs)
extractLinks (x:xs) = extractLinks(xs)
extractLinks _ = []
extractHref (("href",k):xs) = k:extractHref(xs)
extractHref (x:xs) = extractHref(xs)
extractHref _ = []
main = do
src <- openURL "http://www.heise.de"
let tags = parseTags src
let attrs = concat (extractLinks tags)
let hrefs = extractHref attrs
writeFile "hrefs.txt" (show hrefs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment