In slack today, I said
i’ve never been a fan of using integers to indicate ordering. there are billions of values in an int32 that all map onto the same meaning (greater than) or (less than). so much possibility for bugs
i like the way Haskell does it — use an enum (LT, EQ, GT) — that way there’s no confusion and the compiler can ensure you’ve handled all possibilities.
I thought I should explain this.
I reworked the ruby sort code to look like this:
def sort_file_entries
file_entries.sort do |a, b|
if (a.container? && b.container?) || (!a.container? && !b.container?)
a.name.downcase <=> b.name.downcase
elsif a.container? && !b.container?
-1
else
1
end
end
end
Ruby's array sort function takes an optional block that specifies how to sort a pair of elements. This function returns -1 if a < b, 0 if a == b, or 1 if a > b. This is a common pattern; Java's Comparable interface has the same contact.
Here's why I don't like this. A 32-bit integer has 4.2 billion values. This interface says that [1, MaxInt] all have the same meaning (a > b). Likewise, [MinInt, -1] have the same meaning (that a < b). Having billions of values map to the same semantic meaning is asking for bugs. It also means the compiler can't help you if you make a mistake.
I very much prefer how Haskell does it. Haskell's base library defines an enum for Ordering with three values: LT, EQ, GT (less-than, equal, greater-than). I think these are easier to understand ("LT" is more easily understood that "-1"). Also, there is only one symbol for each value. Combined with Haskell's exhaustive pattern matching, the compiler works with us to catch errors.
The Ruby code above sorts files and directories. It sorts directories before files. Directories and files are both sorted alphabetically. I don't think that pattern is immediately obvious from the Ruby code (at least it took me several readings to figure it out). Here's how I'd model the problem in Haskell:
module CompareDemo where
-- a 'Path' is either a File or a Dir(ectory)
data Path a = File a | Dir a deriving (Eq, Show)
-- make Path comparable
instance Ord a => Ord (Path a) where
compare (Dir a ) (Dir b ) = compare a b
compare (File a) (File b) = compare a b
compare (Dir _ ) (File _) = LT
compare (File _) (Dir _ ) = GT
We define two data constructors, File
and Dir
, so immediately we know what we're dealing with. The compare
function pattern matches on those. If we have two Dir
's or File
's, we just compare the values. If we have a Dir
and a File
, the Dir
always sorts first.
Let's look at some examples in the REPL:
% /opt/ghc/bin/ghci -Wall
GHCi, version 8.0.2: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/derek/.ghci
λ> :load CompareDemo.hs
[1 of 1] Compiling CompareDemo ( CompareDemo.hs, interpreted )
Ok, modules loaded: CompareDemo.
λ> compare (Dir "/tmp") (Dir "/etc")
GT
λ> compare (File "/etc/resolv.conf") (File "/tmp/stuff")
LT
So far, so good. Let's try sorting some data.
λ> import Data.List (sort)
λ> sort [Dir "/tmp", Dir "/etc"]
[Dir "/etc", Dir "/tmp"]
λ> sort [Dir "/tmp", Dir "/etc", File "/tmp/foo", File "/home/derek/stuff"]
[Dir "/etc", Dir "/tmp", File "/home/derek/stuff", File "/tmp/foo"]
Yup, that's what we want. Directories first, sorted alphabetically, then files, sorted alphabetically.
Now, let's pretend we made a mistake. Let's edit the file and comment out a line:
instance Ord a => Ord (Path a) where
compare (Dir a ) (Dir b ) = compare a b
compare (File a) (File b) = compare a b
-- compare (Dir _ ) (File _) = LT -- Woops, we forgot this one!
compare (File _) (Dir _ ) = GT
If we save this and try to load it in the REPL, we get a compiler error:
λ> :reload
[1 of 1] Compiling CompareDemo ( CompareDemo.hs, interpreted )
CompareDemo.hs:8:3: warning: [-Wincomplete-patterns]
Pattern match(es) are non-exhaustive
In an equation for ‘compare’:
Patterns not matched: (Dir _) (File _)
This is actually A Good Thing. The compiler has prevented us from deploying unsafe code. It analyzed all possible inputs to the compare
function and found one was missing, and it even tells us exactly what we forgot: if a Dir
is compared to a File
. By modelling our problem this way (with Files and Dirs) the compiler has our back. We know we haven't forgotten anything because it won't let us.
Tests are great. They're necessary for robust systems. But I don't think they're sufficient. Tests pair up nicely with a strong type system. The type system lets the compiler catch a lot of bugs for you, and you use tests to catch bugs it's not feasible to prevent via types.
Nice write up! You could define your own LT and GT constants in Ruby, if wanted to. I like the Haskell approach too. Just say no to Magic Numbers.