Skip to content

Instantly share code, notes, and snippets.

@derekmorr
Created March 11, 2017 02:39
Show Gist options
  • Save derekmorr/b6863e50edc0e69714b93c0c15ec109f to your computer and use it in GitHub Desktop.
Save derekmorr/b6863e50edc0e69714b93c0c15ec109f to your computer and use it in GitHub Desktop.
Thoughts on types & tests in light of a sorting problem.

In slack today, I said

i’ve never been a fan of using integers to indicate ordering. there are billions of values in an int32 that all map onto the same meaning (greater than) or (less than). so much possibility for bugs

i like the way Haskell does it — use an enum (LT, EQ, GT) — that way there’s no confusion and the compiler can ensure you’ve handled all possibilities.

I thought I should explain this.

I reworked the ruby sort code to look like this:

def sort_file_entries
  file_entries.sort do |a, b|
    if (a.container? && b.container?) || (!a.container? && !b.container?)
      a.name.downcase <=> b.name.downcase
    elsif a.container? && !b.container?
      -1
    else
      1
    end
  end
end

Ruby's array sort function takes an optional block that specifies how to sort a pair of elements. This function returns -1 if a < b, 0 if a == b, or 1 if a > b. This is a common pattern; Java's Comparable interface has the same contact.

Here's why I don't like this. A 32-bit integer has 4.2 billion values. This interface says that [1, MaxInt] all have the same meaning (a > b). Likewise, [MinInt, -1] have the same meaning (that a < b). Having billions of values map to the same semantic meaning is asking for bugs. It also means the compiler can't help you if you make a mistake.

I very much prefer how Haskell does it. Haskell's base library defines an enum for Ordering with three values: LT, EQ, GT (less-than, equal, greater-than). I think these are easier to understand ("LT" is more easily understood that "-1"). Also, there is only one symbol for each value. Combined with Haskell's exhaustive pattern matching, the compiler works with us to catch errors.

The Ruby code above sorts files and directories. It sorts directories before files. Directories and files are both sorted alphabetically. I don't think that pattern is immediately obvious from the Ruby code (at least it took me several readings to figure it out). Here's how I'd model the problem in Haskell:

module CompareDemo where

-- a 'Path' is either a File or a Dir(ectory)
data Path a = File a | Dir a deriving (Eq, Show)

-- make Path comparable
instance Ord a => Ord (Path a) where
  compare (Dir a ) (Dir b ) = compare a b
  compare (File a) (File b) = compare a b
  compare (Dir _ ) (File _) = LT
  compare (File _) (Dir _ ) = GT

We define two data constructors, File and Dir, so immediately we know what we're dealing with. The compare function pattern matches on those. If we have two Dir's or File's, we just compare the values. If we have a Dir and a File, the Dir always sorts first.

Let's look at some examples in the REPL:

% /opt/ghc/bin/ghci -Wall
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/derek/.ghci
λ> :load CompareDemo.hs
[1 of 1] Compiling CompareDemo      ( CompareDemo.hs, interpreted )
Ok, modules loaded: CompareDemo.
λ> compare (Dir "/tmp") (Dir "/etc")
GT
λ> compare (File "/etc/resolv.conf") (File "/tmp/stuff")
LT

So far, so good. Let's try sorting some data.

λ> import Data.List (sort)
λ> sort [Dir "/tmp", Dir "/etc"]
[Dir "/etc", Dir "/tmp"]
λ> sort [Dir "/tmp", Dir "/etc", File "/tmp/foo", File "/home/derek/stuff"]
[Dir "/etc", Dir "/tmp", File "/home/derek/stuff", File "/tmp/foo"]

Yup, that's what we want. Directories first, sorted alphabetically, then files, sorted alphabetically.

Now, let's pretend we made a mistake. Let's edit the file and comment out a line:

instance Ord a => Ord (Path a) where
  compare (Dir a ) (Dir b ) = compare a b
  compare (File a) (File b) = compare a b
--   compare (Dir _ ) (File _) = LT   -- Woops, we forgot this one!
  compare (File _) (Dir _ ) = GT

If we save this and try to load it in the REPL, we get a compiler error:

λ> :reload
[1 of 1] Compiling CompareDemo      ( CompareDemo.hs, interpreted )

CompareDemo.hs:8:3: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In an equation for ‘compare’:
        Patterns not matched: (Dir _) (File _)

This is actually A Good Thing. The compiler has prevented us from deploying unsafe code. It analyzed all possible inputs to the compare function and found one was missing, and it even tells us exactly what we forgot: if a Dir is compared to a File. By modelling our problem this way (with Files and Dirs) the compiler has our back. We know we haven't forgotten anything because it won't let us.

Tests are great. They're necessary for robust systems. But I don't think they're sufficient. Tests pair up nicely with a strong type system. The type system lets the compiler catch a lot of bugs for you, and you use tests to catch bugs it's not feasible to prevent via types.

@awead
Copy link

awead commented Mar 13, 2017

Nice write up! You could define your own LT and GT constants in Ruby, if wanted to. I like the Haskell approach too. Just say no to Magic Numbers.

@derekmorr
Copy link
Author

Is there a standard way to make enums in Ruby ? I always thought that enums were an anti-pattern in Ruby -- since a enum limits the number of values for a type. Ruby doesn't really put types front-and-center, and the idea of limiting the values for something seems anti-ruby (you can always make a new class or monkey-patch behavior in).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment