Skip to content

Instantly share code, notes, and snippets.

@juanbono
Forked from snoyberg/html-cleanup.hs
Created December 27, 2017 07:48
Show Gist options
  • Save juanbono/934c228c7a9ae97ea558fb339f888bf1 to your computer and use it in GitHub Desktop.
Save juanbono/934c228c7a9ae97ea558fb339f888bf1 to your computer and use it in GitHub Desktop.
Small example of xml-conduit for cleaning up some HTML: remove unneeded <span>s and convert <br>s to \n
#!/usr/bin/env stack
-- stack --resolver lts-10.0 script
{-# LANGUAGE OverloadedStrings #-}
import Text.XML
import qualified Data.Map.Strict as Map
main :: IO ()
main = do
Document x (Element n a nodes) y <- Text.XML.readFile def "foo.html"
Text.XML.writeFile def "foo2.html" $ Document x (Element n a $ concatMap goN nodes) y
goN :: Node -> [Node]
goN (NodeElement (Element "br" _ [])) = [NodeContent "\n"]
goN (NodeElement (Element "span" attrs inner))
| attrs == Map.singleton "style" "font-weight: 400;" = concatMap goN inner
goN (NodeElement (Element name attrs inner)) = [NodeElement $ Element name attrs $ concatMap goN inner]
goN n = [n]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment