Created
October 13, 2012 21:44
-
-
Save kirelagin/3886243 to your computer and use it in GitHub Desktop.
Very simple implementation of Huffman coding in Haskell
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> module Huffman where | |
> import Control.Arrow | |
> import Data.List | |
> import qualified Data.Map as M | |
> import Data.Function | |
This typeclass is supposed to make life _a bit_ easier. | |
> class Eq a => Bits a where | |
> zer :: a | |
> one :: a | |
> | |
> instance Bits Int where | |
> zer = 0 | |
> one = 1 | |
> | |
> instance Bits Bool where | |
> zer = False | |
> one = True | |
Codemap is generated from a Huffman tree. It is used for fast encoding. | |
> type Codemap a = M.Map Char [a] | |
Huffman tree is a simple binary tree. Each leaf contains a Char and its weight. | |
Fork (node with children) also has weight = sum of weights of its children. | |
> data HTree = Leaf Char Int | |
> | Fork HTree HTree Int | |
> deriving (Show) | |
> | |
> weight :: HTree -> Int | |
> weight (Leaf _ w) = w | |
> weight (Fork _ _ w) = w | |
The only useful operation on Huffman trees is merging, that is we take | |
two trees and make them children of a new Fork-node. | |
> merge t1 t2 = Fork t1 t2 (weight t1 + weight t2) | |
`freqList` is an utility function. It takes a string and produces a list | |
of pairs (character, number of occurences of this character in the string). | |
> freqList :: String -> [(Char, Int)] | |
> freqList = M.toList . M.fromListWith (+) . map (flip (,) 1) | |
`buildTree` builds a Huffman tree from a list of character frequencies | |
(obtained, for example, from `freqList` or elsewhere). | |
It sorts the list in ascending order by frequency, turns each (char, freq) pair | |
into a one-leaf tree and keeps merging two trees with the smallest frequencies | |
until only one tree is remaining. | |
> buildTree :: [(Char, Int)] -> HTree | |
> buildTree = bld . map (uncurry Leaf) . sortBy (compare `on` snd) | |
> where bld (t:[]) = t | |
> bld (a:b:cs) = bld $ insertBy (compare `on` weight) (merge a b) cs | |
The next function traverses a Huffman tree to obtain a list of codes for | |
all characters and converts this list into a `Map`. | |
> buildCodemap :: Bits a => HTree -> Codemap a | |
> buildCodemap = M.fromList . buildCodelist | |
> where buildCodelist (Leaf c w) = [(c, [])] | |
> buildCodelist (Fork l r w) = map (addBit zer) (buildCodelist l) ++ map (addBit one) (buildCodelist r) | |
> where addBit b = second (b :) | |
Simple functions to get a Huffman tree or a `Codemap` from a `String`. | |
> stringTree :: String -> HTree | |
> stringTree = buildTree . freqList | |
> | |
> stringCodemap :: Bits a => String -> Codemap a | |
> stringCodemap = buildCodemap . stringTree | |
Time to do the real encoding and decoding! | |
Encoding function just represents each character of a string by corresponding | |
sequence of `Bit`s. | |
> encode :: Bits a => Codemap a -> String -> [a] | |
> encode m = concat . map (m M.!) | |
> | |
> encode' :: Bits a => HTree -> String -> [a] | |
> encode' t = encode $ buildCodemap t | |
Decoding is a little trickier. We have to traverse the tree until | |
we reach a leaf which means we've just finished reading a sequence | |
of `Bit`s corresponding to a single character. | |
We keep doing this to process the whole list of `Bit`s. | |
> decode :: Bits a => HTree -> [a] -> String | |
> decode tree = dcd tree | |
> where dcd (Leaf c _) [] = [c] | |
> dcd (Leaf c _) bs = c : dcd tree bs | |
> dcd (Fork l r _) (b:bs) = dcd (if b == zer then l else r) bs |
How do you use encode and decode?
In case someone will stumble accross this nice gist, you can use ghci (interactive prompt) to work with the encode
and decode
functions:
*> let greeting = "Hello World"
*> let tree = stringTree greeting
*> (encode' tree greeting) :: [Int] -- or you can use the Bool-type
[1,1,0,1,0,0,0,1,0,1,0,1,1,1,1,1,0,0,0,1,0,1,1,1,0,0,1,1,0,0,1,1]
If you want to compile the file, you have to add a main-function and rename the module to "Main", e.g.
> module Main where
.....
> main :: IO ()
> main = putStrLn . concatMap show $ ((encode' tree greeting) :: [Int])
> where greeting = "Hello World"
> tree = stringTree greeting
and compile the file into an executable ghc <filename>
, so you can execute it afterwards.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How does decode work?