Skip to content

Instantly share code, notes, and snippets.

@adusak
Last active August 29, 2015 14:18
Show Gist options
  • Select an option

  • Save adusak/d3bd46bb48ef476a912c to your computer and use it in GitHub Desktop.

Select an option

Save adusak/d3bd46bb48ef476a912c to your computer and use it in GitHub Desktop.
du utility implemented in haskell
{- | This is the third assignment for IB016, semester spring 2015.
Name: Adam Melkus
UID: 374010
== Implementing du
In this assignment you will be implementing simplified version of Unix
utility @du@. This utility can be used to detect filesystem usage by
files and directories. Your task is for the first time to implement whole
compilable module, including commandline handling.
=== du usage / commandline options
Your program usage should be:
@
./du [options] [files-or-directories]
@
You should implement following options:
@
-a, --all write counts for all files, not just directories
-h, --human-readable print sizes in human readable format (e.g., 1KiB 234MiB 2GiB)
--si like -h, but use powers of 1000 not 1024 (e.g., 1KB, 245MB, 2.1GB)
-s, --summarize display only a total for each argument
-c, --total produce a grand total
--help display help and exit
@
You don't need to handle invalid options, you can ignore them.
Optionally, as a bonus, you can also implement:
@
-d, --max-depth print the total for a directory (or file, with --all)
only if it is N or fewer levels below the command line
argument; --max-depth=0 is the same as --summarize
@
In this case, you can assume that option and its value is not separated by space
— for short version number follows immediately, while in long option it
is separated by @'='@: @-d0 -d10 --max-depth=0 --max-depth=10@.
=== du behaviour
When @du@ is run without any options, it prints sizes of all its commandline
arguments: for files size is printed directly, for directories their
size is summarized recursively. Files inside directories are not printed
by default. If any options are given, they must precede all files and
directories. If no files or directories are given, @du@ should work with
current working directory (./).
File size can be obtained by the function 'hFileSize' from @System.IO@
(alternatively by 'fileSize' from @System.Posix.Files@, but this is not
multiplatform). Without --human-readable or -h or --si sizes are printed
in kilobytes without unit (1 KiB = 1024 B).
With --human-readable or -h sizes are printed with appropriate
unit (using binary prefixes <https://en.wikipedia.org/wiki/Binary_prefix>)
such that the value is between 1 and 1023. If --si is given, sizes
are handled similarly but using 1000-based SI prefixes (and value
should be between 1 and 999). Sizes should be integral, as a bonus
you can add one decimal place if value is less then 10, as done by
@du@ on Linux.
All the other commandline options should be handled according to usage given
above. If --help is given a short help summarizing options and usage should
be given and all other options should be ignored.
Combination of @-s@ and @-a@ is not valid, and need not be handled, the same
holds for --max-depth=0 and --all.
For recursive traversal you will probably need functions from @System.Directory@
module. Feel free to use any other modules in standard Haskell distribution
you find suitable (that means you can use any packages from Hackage working with
GHC-4.8 and GHC-4.6).
==== Notes
* in basic execution (without @-s@) subdirectories are printed.
* in case of error (such as permission error or directory vanishing before
it can be explored) program should not stop but print an error message
(on 'stderr'), you should handle only 'IOException' and you can use
that it is an instance of 'Show'
* you can use module @Text.Printf@ for formating
* you can ignore anything for which is neither file, nor directory (such
as devices, symlinks, pipes,…)
* you should not ignore hidden files (on Unix beginning with '.')
* on Linux @du@ is calculating file sizes based on disk allocation,
sizes reported by 'hFileSize' can differ
==== Examples
Order of files and directories on same level in hierarchy is not relevant and can
differ on your system. Also the output in case of error need not match literally.
> $ ./du --help
> usage: du [options] [files]
> -a, --all write counts for all files, not just directories
> -h, --human-readable print sizes in human readable format (e.g., 1K 234M 2G)
> --si like -h, but use powers of 1000 not 1024
> -s, --summarize display only a total for each argument
> -d, --max-depth print the total for a directory (or file, with --all) only if it is N or fewer levels below the command line argument; --max-depth=0 is the same as --summarize
> -c, --total produce a grand total
> --help display this help and exit
>
> $ mkdir test; cd test
> $ mkdir -p first/second third
> $ dd if=/dev/zero of=a bs=1024 count=100 &> /dev/null
> $ dd if=/dev/zero of=first/b bs=1024 count=200 &> /dev/null
> $ dd if=/dev/zero of=first/c bs=1024 count=300 &> /dev/null
> $ dd if=/dev/zero of=first/second/d bs=1024 count=1024 &> /dev/null
>
> $ ../du
> 0 ./third
> 1024 ./first/second
> 1524 ./first
> 1624 .
>
> $ ../du first third
> 1024 first/second
> 1524 first
> 0 third
>
> $ ../du -c first third
> 1024 first/second
> 1524 first
> 0 third
> 1524 total
>
> $ ../du first
> 1024 first/second
> 1524 first
>
> $ ../du -s first
> 1524 first
>
> $ ../du --summarize first
> 1524 first
>
> $ ../du -h first
> 1.0 MiB first/second
> 1.4 MiB first
>
> $ ../du --si first
> 1.1 MB first/second
> 1.5 MB first
>
> $ ./du -h -s -c first a
> 1.4 MiB first
> 100 KiB a
> 1.5 MiB total
>
> $ ./du -a -h first
> 200 KiB first/b
> 300 KiB first/c
> 1.0 MiB first/second/d
> 1.0 MiB first/second
> 1.4 MiB first
>
> $ mkdir fourth && chmod -r fourth
> $ ../du fourth first
> error: fourth: getDirectoryContents: permission denied (Permission denied)
> 1024 first/second
> 1524 first/
>
> $ ./du fifth first
> error: fifth: openFile: does not exist (No such file or directory)
> 1024 first/second
> 1524 first
>
> $ ../du -d1
> error: ./fourth: getDirectoryContents: permission denied (Permission denied)
> 0 ./third
> 1524 ./first
> 1624 .
>
> $ ../du --max-depth=1 --human-readable
> error: ./fourth: getDirectoryContents: permission denied (Permission denied)
> 0.0 B ./third
> 1.4 MiB ./first
> 1.5 MiB
-}
module Main ( main ) where
import Control.Exception
import Control.Monad (foldM, forM_, liftM, when)
import Data.List
import Data.Prefix.Units
import Numeric
import System.Console.GetOpt
import System.Directory
import System.Environment
import System.FilePath
import System.IO
import Text.Printf
------------------------------------- Option handling -------------------------------------------
data ObjectType = Directory | File | Error deriving (Eq, Show)
data Options = Options
{ optHelp :: Bool
, optAll :: Bool
, optReadable :: Bool
, optReadableSi :: Bool
, optSumarize :: Bool
, optTotal :: Bool
} deriving Show
defaultOptions = Options
{ optHelp = False
, optAll = False
, optReadable = False
, optReadableSi = False
, optSumarize = False
, optTotal = False
}
options :: [OptDescr (Options -> Options)]
options =
[ Option "a" ["all"]
(NoArg (\ opts -> opts { optAll = True }))
"write counts for all files, not just directories"
, Option "h" ["human-readable"]
(NoArg (\ opts -> opts { optReadable = True }))
"print sizes in human readable format (e.g., 1KiB 234MiB 2GiB)"
, Option "" ["si"]
(NoArg (\ opts -> opts { optReadableSi = True }))
"like -h, but use powers of 1000 not 1024 (e.g., 1KB, 245MB, 2.1GB)"
, Option "s" ["summarize"]
(NoArg (\ opts -> opts { optSumarize = True }))
"display only a total for each argument"
, Option "c" ["total"]
(NoArg (\ opts -> opts { optTotal = True }))
"produce a grand total"
, Option "" ["help"]
(NoArg (\opts -> opts { optHelp = True }))
"display help and exit"
]
header = "usage: du [options] [files]\n"
compilerOpts :: [String] -> IO (Options, [String])
compilerOpts argv =
case getOpt RequireOrder options argv of
(o, n, [] ) -> return (foldl (flip id) defaultOptions o, n)
(_, _, errs) -> ioError (userError (concat errs ++ usageInfo header options))
------------------------------------- Main ----------------------------------------------
main :: IO ()
main = do
argv <- getArgs
(opts, args) <- compilerOpts argv
let folders = if null args then ["."]
else reverse args
if optHelp opts then putStr (usageInfo header options)
else do result <- buildTree folders
if optSumarize opts then printSumarization opts folders result
else printNormal opts folders result
when (optTotal opts) $ printTotal opts folders result
printSumarization :: Options -> [String] -> [(String, Integer, ObjectType)] -> IO ()
printSumarization opts folders result = forM_ (filterByNames folders result) $ \(name, size, kind) ->
if kind == Error
then putStrLn name
else putStrLn (formatSize opts size ++ "\t\t" ++ name)
printNormal :: Options -> [String] -> [(String, Integer, ObjectType)] -> IO ()
printNormal opts folders result = forM_ result $ \(name, size, kind) ->
if kind == Error
then putStrLn name
else when (optAll opts || kind == Directory || elem name folders) $
putStrLn (formatSize opts size ++ "\t\t" ++ name)
printTotal :: Options -> [String] -> [(String, Integer, ObjectType)] -> IO ()
printTotal opts folders result = putStrLn (formatSize opts (total folders folders result) ++ "\t\tTotal")
formatSize :: Options -> Integer -> String
formatSize opts size
| not (optReadable opts || optReadableSi opts) = show (size `div` 1024)
| optReadable opts = showValue (Left FormatBinary) size ++"B"
| optReadableSi opts = showValue (Left FormatSiKMGT) size ++"B"
{-- | Calculates and formats file siz to human readable form
formatSize :: Options -> Integer -> String
formatSize opts size
| not (optReadable opts || optReadableSi opts) = show (size `div` 1024)
| optReadable opts = humanReadableByteCount False size
| optReadableSi opts = humanReadableByteCount True size
-- | Calculates and formats file siz to human readable form
humanReadableByteCount :: Bool -> Integer -> String
humanReadableByteCount si size
| size < unit = show size ++ " B"
| otherwise = stripChars "\"'" (round' 1 formula ++ " " ++ pre ++ "B")
where
unit = if si then 1000 else 1024
exp = round (logBase (fI unit) (fI size))
chars = if si then "kMGTPE" else "KMGTPE"
theI = if si then "" else "i"
pre = show (chars !! (exp - 1)) ++ theI
formula = fI size / (fI unit ^ exp)
-- | Rounds a float to specified number of decimal places
round' :: RealFloat a => Int -> a -> String
round' numDecimals num = showGFloat (Just numDecimals) num ""
-- | Removes specified characters from string
stripChars :: String -> String -> String
stripChars = filter . flip notElem
-- | Shortcut for fromIntegral
fI :: (Num a) => Integer -> a
fI = fromIntegral
-}
-- | Calculates total size of specified files/directories
total :: [String] -> [String] -> [(String, Integer, ObjectType)] -> Integer
total _ _ [] = 0
total [] _ _ = 0
total (a:as) args list = case findByName a list of
Nothing -> 0 + total as args list
(Just (n, s, _)) -> if not (isInFolder n args) then s + total as args list
else 0 + total as args list
-- | Gets a tupples from result set based on the name
findByName :: String -> [(String, Integer, ObjectType)] -> Maybe (String, Integer, ObjectType)
findByName _ [] = Nothing
findByName name ((x,y,z):xs) = if name == x then Just (x,y,z)
else findByName name xs
-- | Filters result set with specified files/directories
filterByNames :: [String] -> [(String, Integer, ObjectType)] -> [(String, Integer, ObjectType)]
filterByNames [] _ = []
filterByNames (n:ns) list = case findByName n list of
Nothing -> [] ++ filterByNames ns list
Just something -> something : filterByNames ns list
-- | Checks if file is contained in specified folder
isInFolder :: String -> [String] -> Bool
isInFolder _ [] = False
isInFolder object (f:fs) = (isPrefixOf f object && f /= object) || isInFolder object fs
-- | Recursively goes through the directory structure, calucating file and directory sizes
-- Returns an IO list of tupples in the following format (File/Directory name, size, indicator wheather it is a directory or file)
buildTree :: [String] -> IO [(String, Integer, ObjectType)]
buildTree = foldM f []
where
f acc object = handle handler $ do
isDir <- doesDirectoryExist object
if isDir
then do
entries <- liftM (filter isNormalFolder) (getDirectoryContents object)
moreRes <- buildTree (map (object </>) entries)
let folderSize = countSize object moreRes
return (moreRes ++ [(object, folderSize, Directory)] ++ acc )
else do
fileHandle <- openFile object ReadMode
intSize <- hFileSize fileHandle
hClose fileHandle
return (acc ++ [(object, intSize, File)])
handler :: SomeException -> IO [(String, Integer, ObjectType)]
handler e = return [(show e, 0, Error)]
-- | Test to check if the object is not . or ..
isNormalFolder :: String -> Bool
isNormalFolder f = f /= "." && f /= ".."
-- | Calcualtes directory size from a list
countSize :: String -> [(String, Integer, ObjectType)] -> Integer
countSize _ [] = 0
countSize root ((x,y,_):xs)
| root == takeDirectory x = y + countSize root xs
| otherwise = 0 + countSize root xs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment