Why should programmers care about Monoids? Because Monoids are a common pattern that shows up over and over in programming. And when patterns show up, we can abstract them and leverage work we've done in the past. This allows us to quickly develop solutions on top of proven, stable code.
Add Commutative Property to a Monoid (Commutative Monoid) and you have something that can be executed in parallel. With the end of Moore's Law, parallelism is our only hope to increasing processing speeds.
What follows is what I've learned after studying Monoids. It is hardly complete, but hopefully will prove to be helpful as an introduction for others.
A Monoid is from mathematics and is in a lineage of other constructs. So it helps to start at the beginning and work our way up to Monoids. (Actually, we'll go one step further to Groups)
A Magma is a Set and a single Binary Operator which must be closed:
∀ a, b ∈ M : a • b ∈ M
A binary operator is closed if when applied to any 2 elements of the set, it produces another member of the set. (Here · is the binary operator)
An example of a Magma is the set of Boolean and the AND operator.
A Semigroup is a Magma with 1 addition requirement. The Binary Operator must be Associative for all members of the set:
∀ a, b, c ∈ S : a · (b · c) = (a · b) · c
An example of a Semigroup is the set of Non-empty Strings and the Concatenation operator.
A Monoid is a Semigroup with 1 addition requirement. There exists a Neutral Element in the set that can be combined using the binary operator with any member of the set resulting in that same member of the set.
e ∈ M : ∀ a ∈ M, a · e = e · a = a
An example of a Monoid is the set of Strings and the Concatenation operator. Note the addition of the empty string to the set is the Neutral Element and turns the Semigroup into a Monoid.
Another example of a Monoid is the set of Non-negative Integers and the Addition operator. The Neutral Element is 0.
A Group is Monoid with 1 addition requirement. There exist Inverses in the set such that:
∀ a, b, e ∈ G : a · b = b · a = e
Here e
is the Neutral Element.
An example of a Group is the set of Integers and the Addition operator. The Inverses are Negative Numbers and the Neutral Element is 0.
By allowing Negative Numbers, we've turned the second example of a Monoid above into a Group.
Reference: Math StackExchange question: What's the difference between a monoid and a group?
In Haskell's Prelude (via GHC.Base), the Monoid typeclass is defined:
class Monoid a where
mempty :: a
-- ^ Identity of 'mappend'
mappend :: a -> a -> a
-- ^ An associative operation
mconcat :: [a] -> a
-- ^ Fold a list using the monoid.
-- For most types, the default definition for 'mconcat' will be
-- used, but the function is included in the class definition so
-- that an optimized version can be provided for specific types.
mconcat = foldr mappend mempty
Here mempty
is the Neutral Element, mappend
is the Associative Binary operator.
This is enough to be a Monoid, but mconcat
is added as a convenience. It has a default implementation to fold over the list using the Binary operator, mappend
, starting with the Neutral Element, mempty
.
Instances can override this default implementation as we shall see later.
A trivial example of a set which only contains ()
:
instance Monoid () where
mempty = ()
_ `mappend` _ = ()
mconcat _ = ()
Here the set contains only the Neutral Element, ()
. So mappend
doesn't really care about the parameters and will just return ()
. Turns out that the only valid parameters are always ()
since our set only contains ()
.
Also, mconcat
is overridden for efficiency to ignore the list of elements of the set since they're all ()
and so it just returns ()
. Note that if mconcat
was omitted here, the default implementation would produce the same result thanks to mappend
's implementation.
You cannot really do much with this Monoid by itself.
n :: ()
n = () `mappend` ()
ns :: ()
ns = mconcat [(), (), ()]
The Monoid of all possible lists:
instance Monoid [a] where
mempty = []
mappend = (++)
mconcat xss = [x | xs <- xss, x <- xs]
mappend
is the concatenation operator which means the Neutral Element, mempty
can only be the empty list, []
.
It's important to realize that mconcat
takes a List of elements from the set which here are Lists. Therefore, it takes a List of Lists, hence the parameter name xss
.
I suspect that List Comprehensions are more efficient than foldr
otherwise there's no reason to implement mconcat
.
If we think about it, foldr
will call mappend
with 2 lists repeatedly which isn't efficient due to repeated processing of elements in the intermediate list returned each interation.
Using a List Comprehension will be a low-level operation that most likely visits each element of each sublist only once.
as :: [Int]
as = [1, 2, 3]
bs :: [Int]
bs = [4, 5, 6]
asbs :: [Int]
asbs = mconcat [as, bs] -- [1, 2, 3, 4, 5, 6]
The Monoid of all possible 2-tuples of Monoids:
instance (Monoid a, Monoid b) => Monoid (a,b) where
mempty = (mempty, mempty)
(a1,b1) `mappend` (a2,b2) = (a1 `mappend` a2, b1 `mappend` b2)
At first mempty
's definition can seem confusing. At initial glance, the definition can be misunderstood as a recursive definition.
Turns out the the first mempty
of the pair is the mempty
of the a
type.
The second mempty
is the mempty
of the b
type.
Imagine a
is ()
and b
is [Int]
. Then mempty
would be ( (), [] )
, i.e. the first is the mempty
for ()
and the second is the mempty
for [Int]
.
mappend
's implemenation is pretty straightforward. It performs a mappend
for the a
's and the b
's returning a 2-tuple of (a, b)
. Since a
and b
are both Monoids, the closure requirement of Magmas and hence Monoids is maintained.
p1 :: ((), [Int])
p1 = ((), [1, 2, 3])
p2 :: ((), [Int])
p2 = ((), [4, 5, 6])
p1p2 :: ((), [Int])
p1p2 = mconcat [p1, p2] -- ((), [1, 2, 3, 4, 5, 6])
The Monoid of all possible functions of 1 or many parameters that returns a Monoid:
instance Monoid b => Monoid (a -> b) where
mempty _ = mempty
mappend f g x = f x `mappend` g x
It isn't obvious how this definition handles functions that take multiple parameters. A little reminder may help.
Function annotations are right associative, i.e. they associate on the right:
f :: Int -> (Bool -> String) -- unnecessary parenthesis
f s1 s2 = s1 ++ s2
Int -> (Bool -> String)
is equivalent to Int -> Bool -> String
, which is why we don't include the parenthesis. Right associativity implies them.
Remembering that String
is equivalent to [Char]
, we know that f
eventually returns a Monoid since we've seen Monoid [a]
above.
But not so fast. We first must breakdown the annotation in terms of a -> b
as it's defined in the Monoid instance:
Int -> (Bool -> String)
a -> b
Here b
must be a Monoid and thanks to Monoid (a -> b)
, it is.
Now looking into b
we get:
(Bool -> String)
( a -> b )
So reapplication of Monoid (a -> b)
takes care of functions with multiple parameters, e.g.:
Int -> (String -> (Int -> String))
a -> ( b )
a -> (a' -> ( b' ))
a -> (a' -> (a'' -> b'' )
Here b
is a Monoid because b'
is a Monoid because b''
is String
which is a Monoid because String
is [Char]
and we saw earlier that all lists are Monoids.
Once again, the definition:
instance Monoid b => Monoid (a -> b) where
mempty _ = mempty
mappend f g x = f x `mappend` g x
Hopefully, mempty
's definition makes more sense now. mempty
is of type a -> b
which is why it takes a single parameter. It disregards the parameter and simply returns the mempty
of type b
.
In the case of a function of type Bool -> String
, mempty
is []
, i.e. the mempty
for Monoid [a]
.
In the case of a function of type Int -> Bool -> String
, mempty
is recursive, i.e. it first returns itself with the type Bool -> String
, which in turn returns []
.
Notice that a
is immaterial here. In fact, all input types to the function are immaterial. The only thing that matters here is the type of the return value. That's why only b
must be a Monoid.
So the following functions types will have mempty
eventually return []
since they all return String
:
Int -> String
Int -> Int -> String
Int -> Bool -> Int -> Double -> String
Similarly, mappend
applies a single parameter to BOTH functions then calls b
's mappend
.
In the case of a function of type String -> String
, mappend
calls BOTH functions with the input String
and then calls mappend
for String
which is Monoid [a]
, i.e. (++)
.
In the case of a function of type String -> String -> String
, mappend
calls BOTH functions with the FIRST input String
and then calls mappend
for String -> String
which is Monoid (a -> b)
, i.e. itself.
Which in turn, calls BOTH functions with the SECOND input String
and then calls mappend
for String
which is Monoid [a]
, i.e. (++)
.
import Data.Monoid ((<>))
parens :: String -> String
parens str = "(" ++ str ++ ")"
curlyBrackets :: String -> String
curlyBrackets str = "{" ++ str ++ "}"
squareBrackets :: String -> String
squareBrackets str = "[" ++ str ++ "]"
pstr :: String -> String
pstr = parens <> curlyBrackets <> squareBrackets
astr :: String
astr = pstr "abc"
Notice the <>
operator is used in pstr
. This operator is imported from Data.Monoid
and is the mappend
operation.
If you look back at the class
definition of Monoid, you'll see that mappend
has the type a -> a -> a
.
Since parens
and curlyBrackets
both have type -> String -> String
, then parens <> curlyBrackets
will have the type String -> String
as will parens <> curlyBrackets <> squareBrackets
.
pstr
will take a String
and apply it to parens
, curlyBrackets
and squareBrackets
concatenating the results of those calls.
Therefore, astr
is (abc){abc}[abc]
.
If the number of functions to be applied gets large, using the <>
approach can get tedious. This is why the Monoid class has the helper function mconcat
.
We can rewrite the code as such:
pstr :: String -> String
pstr = mconcat [parens, curlyBrackets, squareBrackets]
astr :: String
astr = pstr "abc"
Looking back at the Monoid definition, we have to choose an Associative Binary operator, but for numbers it can be either Addition or Multiplication.
If we pick Addition then there will be times where Multiplication would be missed and visa versa.
Unfortunately, we can only have 1 Monoid per type.
The way to solve this is to create a new type that contains a Num
for Addition and another type for Multiplication.
These types can be found in Data.Monoid
:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import GHC.Generics
newtype Sum a = Sum { getSum :: a }
deriving (Eq, Ord, Read, Show, Bounded, Generic, Generic1, Num)
newtype Product a = Product { getProduct :: a }
deriving (Eq, Ord, Read, Show, Bounded, Generic, Generic1, Num)
Now we can make Monoids for each.
{-# LANGUAGE ScopedTypeVariables #-}
import Data.Coerce
instance Num a => Monoid (Sum a) where
mempty = Sum 0
mappend = coerce ((+) :: a -> a -> a)
mempty
is 0
wrapped in Sum
.
Here coerce
is used to safely coerce Sum a
to its Representational type, e.g. Sum Integer
will be coerced to Integer
and use the appropriate +
operator.
ScopedTypeVariables
pragma allows us to equate a
in a -> a -> a
to the scope of the instance
and hence the a
in Num a
.
sum :: Sum Integer
sum = mconcat [Sum 1, Sum 2] -- Sum 3
{-# LANGUAGE ScopedTypeVariables #-}
import Data.Coerce
instance Num a => Monoid (Product a) where
mempty = Product 1
mappend = coerce ((*) :: a -> a -> a)
mempty
is 0
wrapped in Product
.
Here coerce
is used to safely coerce Product a
to its Representational type, e.g. Product Integer
will be coerced to Integer
and use the appropriate *
operator.
ScopedTypeVariables
pragma allows us to equate a
in a -> a -> a
to the scope of the instance
and hence the a
in Num a
.
product :: Product Integer
product = mconcat [Product 2, Product 3] -- Product 6
Before we look at the Monoid, let's review Ordering and comparisons:
data Ordering = LT | EQ | GT
This type is used when using compare
from class Ord
, e.g.:
compare :: a -> a -> Ordering
An example of its use:
compare "abcd" $ "abed" -- LT
Now there's a great helper function in Data.Ord
for comparisons, called comparing
:
comparing :: (Ord a) => (b -> a) -> b -> b -> Ordering
comparing p x y = compare (p x) (p y)
This helper applies a function to each element BEFORE comparing. This is great for things like tuples:
comparing fst (1, 2) (1, 3) -- EQ
comparing snd (1, 2) (1, 3) -- LT
Now for the Monoid:
-- lexicographical ordering
instance Monoid Ordering where
mempty = EQ
LT `mappend` _ = LT
EQ `mappend` y = y
GT `mappend` _ = GT
This implemenation looks random. Why would anyone implement Monoid Ordering
this way?
Well, if you want to append comparisons as part of a sortBy
then you want this implementation.
Looking at sortBy:
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
Notice that the first parameter is the same type as compare
, comparing fst
, comparing snd
andcomparing fst `mappend` comparing snd
.
Why? Because mappend
's type is a -> a -> a
, here a
is (a, b) -> (a, b) -> Ordering
.
So we can combine or mappend
comparision functions and we will have an overall comparison function.
Remember that Monoid (a -> b)
requires b to also be a Monoid
.
So if we want to be able to mappend
our comparison functions, we'll have to make Ordering
a Monoid
, which we did above.
But we still haven't answered why it has this seemingly bizzare definition.
Well, the comment is a bit of a clue, i.e. lexicographical ordering
. What this essentially means is alphabetical order
, or left precedence
, i.e. if the leftmost is GT
or LT
then all comparisons to the right no longer matter.
If, however, the leftmost is EQ
then we need to look to the right to determine the final result of the combined comparison.
And that's exactly what the implementation does. Here it is again with some extra comments to illustrate this:
-- lexicographical ordering
instance Monoid Ordering where
mempty = EQ -- EQ as far left or far right has no effect on final result
LT `mappend` _ = LT -- if left is LT then ignore right
EQ `mappend` y = y -- if left is EQ then use right
GT `mappend` _ = GT -- if left is GT then ignore right
Take a minute to really understand this. Once you do, this will be easier to understand:
sortBy (comparing fst <> comparing snd) [(1,0),(2,1),(1,1),(2,0)]
-- [(1,0),(1,1),(2,0),(2,1)]
To understand how this works, you have to remember Monoid (a -> b)
.
We are mappend
ing functions with type (a, b) -> (a, b) -> Ordering
. And once both functions have executed, we mappend
the 2 Ordering
s that were returned following our lexicographical ordering
.
This means that comparing fst
has precidence over comparing snd
which is why all (1, x)
will precede all (2, y)
even when x > y
.
We can do a different comparison where we only care about comparing snd
:
sortBy (comparing snd) [(1,0),(2,1),(1,1),(2,0)]
-- [(1,0),(2,0),(2,1),(1,1)]
Here fst
is in an unpredictable order while snd
is in ascending order.
Just for fun we can control ascending and descending order independently. First lets define some helper functions:
asc, desc :: Ord b => (a -> b) -> a -> a -> Ordering
asc = comparing
desc = flip . asc
Now we can sort fst
descending and snd
ascending:
sortBy (desc fst <> asc snd) [(1,0),(2,1),(1,1),(2,0)]
-- [(2,0),(2,1),(1,0),(1,1)]
The example sorts all use only a few comparisons. In fact, most sorts will only use a small handful of comparisons.
Even so, the mappend
s all have to be evaluated even if the first one returns LT
or GT
. This may not seem like a big deal when there's only few comparisons. But it can add up over a large list.
What we want our comparisons to do is short-circuit
as is commonly done with Boolean Binary operators, &&
and ||
.
Short-circuting is not possible with the current definition for Monoid Ordering
because it relies on the default mconcat
implementation which uses foldr
which visits each list element.
If we write our own Moniod Ordering
and implement an mconcat
that exits early, we will have a more efficient sort.
import Prelude hiding (Monoid, mempty, mappend, mconcat)
import Data.List
import Data.Maybe
import Control.Arrow
instance Monoid Ordering where
mempty = EQ
LT `mappend` _ = LT
EQ `mappend` y = y
GT `mappend` _ = GT
mconcat = find (/= EQ) >>> fromMaybe EQ
This implementation allows us to rewrite our previous sort:
sortBy (mconcat [desc fst, asc snd]) [(1,0),(2,1),(1,1),(2,0)]
-- [(2,0),(2,1),(1,0),(1,1)]
Same result, but any time dest fst
returns LT
or GT
, then asc snd
will be skipped.
N.B. our implementation requires Data.List
, Data.Maybe
and Control.Arrow
which if implemented in the standard would unnecessarily couple Data.Monoid
. This limitation could be overcome by writing a specialized function (not very DRY).
But, the biggest problem with overriding the standard implementation is that we have to hide all Monoid definitions.
These are some pretty hefty downsides to optimizing for an edge case. But it's a nice exercise just the same. Also, if the list we're trying to sort was huge, then it just might be worth it.
References:
As stated at the beginning, if we add just 1 more constraint to Monoid
(or a Group
), we can do things in parallel.
That restriction is the Commutative Property.
∀ a, b ∈ M : a · b = b · a
By requiring this law, we can process the list in ANY order. This can get parallelized by the compiler, via a library or even doled out to other machines.
Here is the definition:
class Monoid m => CommutativeMonoid m
It may seem strange that there are no functions, but it's interface is the same as Monoid
with only the requirement that Commutativity is supported by the Binary operator.
Unfortunately, there is no way to require these laws in Haskell.
Here's the definition:
instance Num a => CommutativeMonoid (Sum a)
The reasons to use CommutativeMonoid
over Monoid
for Sum
(or `Product):
- Better communicate how the
Monoid
is to be used - Call functions that require a
CommutativeMonoid
Monoids are great abstractions for combining like things and these abstractions can be found over and over again in programming.
Hopefully, this proved to be a good introduction to Monoids
. There are lots of other types of Monoid
s but once you have the general understanding, investigating these other speciality Monoids
should be much easier.
Great!
I think your quantifiers in the axioms for Monoids and Groups are slightly off.
Monoid identity: Ǝ e ∈ M : ∀ a ∈ M, a · e = e · a = a
I.e. "There is an 'identity' element e in M such that for all a in M, a · e = e · a = a"
Group inverses: Ǝ e ∈ G: ∀ a ∈ G: Ǝ b ∈ G : a · b = b · a = e
I.e. "there is an 'identity' element e in G such that for all a in G there exists its "inverse" b such that a · b = b · a = e"
Also for groups, you would have to link the identity element in the two axioms.