Skip to content

Instantly share code, notes, and snippets.

@ashfurrow
Created January 14, 2014 02:19
Show Gist options
  • Save ashfurrow/8411942 to your computer and use it in GitHub Desktop.
Save ashfurrow/8411942 to your computer and use it in GitHub Desktop.
Docbook parser I wrote in my universities days.
import Text.XML.Light.Input
import Text.XML.Light.Types
{-
Dear Markers:
Some known issues:
1. if an ordered list item contains a sublist, that list item doesn't get numbered
2. given an itemized list, the asterisks align with the start of the letters, not two spaces before them
3. there's an extra newline inserted at the end of the file
Those are all the issues I can think of and they are illustrated in the test cases, which are included at the end of this file. After 10 hours of working on this assignment, I'm calling it "done enough".
Cheers,
Ash
-}
--sets up the rest of the program
formatFile name = do
str <- readFile name
let xml = parseXML str
putStrLn $ printElements "" 0 xml
main = formatFile "xml"
--prefix is a list marker like "* " or "2 "
--level is the indentation level
--this method loops through a Content list and calls printElement on each item, if it's not empty.
printElements :: String -> Int -> [Content] -> String
printElements prefix level (n:xs) | isEmpty (printElement prefix level n) = printElements prefix level xs
| otherwise = printElement prefix level n ++ printElements "" level xs
printElements _ _ [] = ""
--ditto on this for prefix and level, they cannonical throughout my program
--sorts into different cases depending on the tag's qualified name and calls appropriate functions to display the tag's contents
printElement :: String -> Int -> Content -> String
printElement prefix level (Elem (Element (QName "para" qURI qPrefix) elAttribs ns elLine)) = printElements prefix level ns ++ "\n"
printElement prefix level (Elem (Element (QName "itemizedlist" qURI qPrefix) elAttribs ns elLine)) = printItemizedListElements prefix (level +1) ns
printElement prefix level (Elem (Element (QName "orderedlist" qURI qPrefix) elAttribs ns elLine)) = printOrderedListElements prefix (level +1) 1 ns
printElement prefix level (Elem (Element (QName "listitem" qURI qPrefix) elAttribs ns elLine)) = printElements prefix level ns -- ++ "\n"
printElement prefix level (Text (CData cdVerbatim (n) cdLine)) | isEmpty n = ""
| otherwise = gimmeSomeTabs level ++ prefix ++ removeLeadingTabs n ++ "\n"
printElement _ _ _ = "Something went wrong"
printItemizedListElements :: String -> Int -> [Content] -> String
printItemizedListElements prefix level (n:xs) = printItemizedListElement prefix level n ++ printItemizedListElements prefix level xs
printItemizedListElements _ _ [] = "" --don't know why it's sometimes called with an empty [Content], but whatever
printItemizedListElement :: String -> Int -> Content -> String
printItemizedListElement prefix level (Elem (Element (QName "listitem" qURI qPrefix) elAttribs ns elLine)) = printElements "* " level ns -- ++ "\n"
printItemizedListElement prefix level (Text (CData cdVerbatim (n) cdLine)) = ""
printOrderedListElements :: String -> Int -> Int -> [Content] -> String
printOrderedListElements prefix level k (n:xs) | isOrderedListElementEmpty n = (printOrderedListElements "" level k xs)
| otherwise = (printOrderedListElement prefix level k n) ++ (printOrderedListElements prefix level (k+1) xs)
printOrderedListElements _ _ k [] = "" --don't know why it's sometimes called with an empty [Content], but whatever
printOrderedListElement :: String -> Int -> Int -> Content -> String
printOrderedListElement prefix level k (Elem (Element (QName "listitem" qURI qPrefix) elAttribs ns elLine)) = printElements (show k ++ " ") level ns -- ++ "\n"
printOrderedListElement _ _ k (Text (CData cdVerbatim (n) cdLine)) = ""
--returns if a Content item has empty text
isOrderedListElementEmpty :: Content -> Bool
isOrderedListElementEmpty (Elem (Element (QName "listitem" qURI qPrefix) elAttribs ns elLine)) = isEmpty (printElements "" 0 ns)
isOrderedListElementEmpty (Text (CData cdVerbatim (n) cdLine)) = isEmpty n
--returns if a String is empty (contains only newlines, tabs, or spaces)
isEmpty :: String -> Bool
isEmpty (n:xs) | n == '\n' = isEmpty xs
| n == '\t' = isEmpty xs
| n == ' ' = isEmpty xs
| otherwise = False
isEmpty [] = True
--removes leading tabs that are present in non-flat XML files
removeLeadingTabs :: String -> String
removeLeadingTabs (n:xs) | n == '\t' = removeLeadingTabs xs
| n == '\n' = (removeLeadingTabs xs)
| n == ' ' = removeLeadingTabs xs
| otherwise = (n:xs)
removeLeadingTabs a = a
--for indenting to a specified level, we pass an integer in and it gives us that many tabs in a String
gimmeSomeTabs :: Int -> String
gimmeSomeTabs 0 = ""
gimmeSomeTabs n = "\t" ++ gimmeSomeTabs (n-1)
{-
Your Test Cases:
*Main> formatFile "xml0"
The component suffered from three failings:
* It was slow
* It ran hot
* It didn't actually work
Of these three, the last was probably the most important.
*Main> formatFile "xml1"
Level 0
* Level 1
*Main> formatFile "xml2"
Level 0
* Level 1
* Level 2
*Main> formatFile "xml3"
Level 0
* Level 1
* Level 2
Level 2
* Level 1
*Main> formatFile "xml4"
Level 0
* Level 1
1 Level 2
*Main> formatFile "xml5"
Level 0
1 Level 1
* Level 2
Level 2
* Even more Level 2
3 Level 1
Level 0
*Main> formatFile "xml6"
The component suffered from three failings:
* It was slow
1 It made molasses look fast.
2 I mean, so slow, we hired geologists to watch it.
3 Fundamentally, this thing was not fast.
* It ran hot.
* It didn't actually work.
Of these three, the last was probably the most important.
*Main>
*Main> formatFile "myxml0"
Ash
*Main> formatFile "myxml1"
Ash
*Main> formatFile "myxml2"
* it seems superfluous to have all these test cases
*Main> formatFile "myxml3"
Ash
is my name
*Main>
BadWolf:~ ash$ cat myxml*
<para><para>Ash</para></para>
<para>Ash</para>
<para>
<itemizedlist>
<listitem><para>it seems superfluous to have all these test cases</para></listitem>
</itemizedlist>
</para>
<para>Ash</para>
<para>is my name</para>
BadWolf:~ ash$
-}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment