Version: 2022.03.02
This is the “Effective Scala Case Class Patterns” guide I wished I had read years ago when starting my Scala journey. I’ve spent many hundreds of hours on futile tangents to earn the simplified actionable nuggets described below.
Because it is a fantastic integration of both OOP and FP, the case class is a key workhorse in any Scala software engineering project. Within Scala, it is primarily designed and intended (but not exclusively) to be used as an (immutable) FP Product Type.
Unfortunately, the DCCP (Default Case Class Pattern)…
case class Longitude(value: Double)
…while terse, flexible, convenient, and extensively utilized, suffers from a number of issues which fall into the following categories:
- Extension confusion
- Elevated reasoning complexity
- Poor design for FP
- Future technical debt
- Security vulnerabilities
- Performance impact
This article aims to propose several new boilerplate patterns that you can use to replace the defaults provided by the Scala compiler to address the above categories. I’ll step through an evolutionary process that will produce patterns of increasing detail, any of which can be an “upgrade” stage you might prefer.
DELAYED (bug in IntelliJ’s Template engine):
Additionally, I will be providing the patterns as IntelliJ File Templates making them as easy to instantiate while coding as using the DCCP.
After many years of using case classes, it has become obvious extending it via inheritance is just a bad idea.
Thus, our very first pattern is to merely prepend final
to the DCCP:
final case class Longitude(value: Double)
This ensures no descendants, well-intended or malicious, can be defined inheriting from the case class where they inadvertently or purposefully abuse the “Liskov Substitution Principle”.
While some people might not bother with this, it is possible to configure the Scala compiler to issue a warning or error when encountering a non-final case class (still trying to find the compiler option[s]). IntelliJ offers something similar. I prefer to trust the compiler to point out my mistakes over my own capacity to always remember this rule.
An additional performance reason exists for marking a (case) class as final
. There are optimizations, both compiler and at JVM runtime, which fire only in the presence of the (case) class being marked final
.
Why?
Because when a (case) class isn't marked final
, the JVM runtime must assume a class loader could eventually load a class that extends the non-final (case) class.
More recent JVMs have more sophisticated strategies in this area. IOW, when the classloader is invoked, any current code that could be impacted has all of its optimizations rolled back and then reoptimized. The rollback and re-optimization also have a (hopefully smaller) performance impact.
Implementing this desirably affects the Overview Categories…
- 2: Elevated reasoning complexity
- 3: Poor design for FP
- 4: Future technical debt
- 5: Security vulnerabilities
- 6: Performance impact
One of the biggest frustrations a Scala newcomer faces is when they want to “enhance” the DCCP by making the companion object explicit. For example, if one was to naively prepend the object
as so…
object Longitude {
}
final case class Longitude(value: Double)
The compiler-generated code extended the FunctionN
trait on the “default companion object”. The code above does not. And any code dependent upon the missing trait will now fail to compile (ex: the tupled
method).
As a software engineer new to Scala, this can be quite confronting and disorienting. There isn’t any “official guidance” on how one might go about making the default companion object explicit. Googling for this isn’t trivial. Here’s how I explored this problem space on StackOverflow in 2014.
So, the solution is to extend the companion object with FunctionN
so it looks like this:
object Longitude extends (Double => Longitude) {
def apply(value: Double): Longitude =
new Longitude(value)
}
final case class Longitude(value: Double)
Scastie Snippet and IntelliJ Code Template Gist
To better understand how FunctionN
is represented by (Double => Longitude)
, here’s a 2013 post on StackOverflow where I had to explore this in more depth myself.
And this pattern will now be the basis upon which we fill out the remainder of the template.
Implementing this desirably affects the Overview Category:
- 1: Extension confusion
One of the maxims in OOP’s DbC (Design by Contract - Eiffel originated) and FP is to prevent invalid states from being representable. If one successfully aims at and achieves this goal, it dramatically reduces the “guard” code (i.e. preconditions checks) for any clients making use of the case class instances.
A first pass naive implementation (which is also found and recommended in almost all Scala textbooks) is to use the require
functionality within the case class’s constructor. It looks like this:
final case class Longitude(value: Double) {
require(value >= -180.0d, s"value [$value] must greater than or equal to -180.0d")
require(value <= 180.0d, s"value [$value] must be less than or equal to 180.0d")
}
Scastie Snippet and IntelliJ Code Template Gist
This implementation throws an exception at the first require
that fails.
There are three problems with this approach:
- If there are other parameters that also need to be validated, it requires multiple passes to check other parameters which could have been correctly checked in a prior pass if multiple errors were allowed to be returned.
- The
require
implementation forces the client to deal with exceptions (avoid using exceptions for expected errors, like really). In most cases, the performance overhead of the exception infrastructure (both CPU effort and memory pressure/churn) is both significant and essentially unoptimizable (this remains contested). And even outside the poor performance reasons, FP implementations strongly prefer “error by value” as opposed to “error by exception”. - It doesn’t allow the client to “check the preconditions” prior to instantiation. This mixing of concerns prevents optimization opportunities where the constructor, and therefore the memory allocation for the instance, is never invoked because the preconditions are already known to have not been met.
We can address all of these concerns in one fell swoop using a standard separation of concerns pattern. In this case, it’s OOP’s Factory (a.k.a. Builder, Smart Constructor, etc.) Pattern.
First, we move all of the validation logic into its own method, generateInvalidStateErrors
. Then, we ensure the apply
method invokes the new
operator and accepts/rejects the instantiation by first validating the passed parameter value(s). It should now look like this…
object Longitude extends (Double => Longitude) {
def generateInvalidStateErrors(value: Double): List[String] =
if (value < -180.0d)
List(s"value of value [$value] must be not be less than -180.0d")
else
if (value > 180.0d)
List(s"value of value [$value] must be not be greater than 180.0d")
else
Nil
def apply(value: Double): Longitude =
generateInvalidStateErrors(value) match {
case Nil =>
new Longitude(value)
case invalidStateErrors =>
throw new IllegalStateException(invalidStateErrors.mkString("|"))
}
}
final case class Longitude(value: Double)
Scastie Snippet and IntelliJ Code Template Gist
While the pattern is now in place, there is still a hole where a client can just use the new
operator to bypass the apply
method in the companion object. That is fixed by marking the case class constructor as private
. That looks like this…
final case class Longitude private(value: Double)
Scastie Snippet and IntelliJ Code Template Gist
It looks like we’re done, right?
Oops! Sneaky attack vectors ahead!
It turns out there are two other compiler-generated constructor pathways we must address
readResolve
method - Supports the compiler-generatedSerializable
interface. This is especially pernicious it instantiates the memory for the case class, and then directly injects the (possibly malicious) deserialized contents into the instance’s memory. This completely bypasses both theapply
method and the object constructor. And this means no validation takes place whatsoever.copy
method - Uses thenew
operator, and can do so because the method is within the private scope of the constructor. This bypasses the validation we moved into the companion object and invoke via theapply
method.
In each of these cases, we want to reroute the method to the companion object’s apply
method. It should look like this…
final case class Longitude private(value: Double) {
private def readResolve(): Object =
Longitude(value)
def copy(value: Double = value): Longitude =
Longitude(value)
}
Scastie Snippet and IntelliJ Code Template Gist
If you know the case class will never be used anywhere that utilizes Java serialization, then feel free to remove the readResolve
method.
While I, too, hate Java Serialization, remember that some platforms such as Kafka and Spark continue to depend upon Java serialization. (You might also encounter old Akka code that uses it, though it isn't Akka's default and isn't recommended.) And this means when they do so, if the readResolve
method is missing, you’ve left your case class open to a malicious attack that bypasses your case class’s immutable invariant encoded in the precondition check implemented in the generateInvalidStateErrors
method.
We have now ensured there are no reasonable ways to instantiate this case class without going through the precondition check (validation of state prior to invoking instantiation overhead). There are pathological pathways that can be used that involve the illicit use of the Java reflection API, and there is no real way for us to protect against those.
The fully expressed pattern should now look like this…
object Longitude extends (Double => Longitude) {
def generateInvalidStateErrors(value: Double): List[String] =
if (value < -180.0d)
List(s"value of value [$value] must be not be less than -180.0d")
else
if (value > 180.0d)
List(s"value of value [$value] must be not be greater than 180.0d")
else
Nil
def apply(value: Double): Longitude =
generateInvalidStateErrors(value) match {
case Nil =>
new Longitude(value)
case invalidStateErrors =>
throw new IllegalStateException(invalidStateErrors.mkString("|"))
}
}
final case class Longitude private(value: Double) {
private def readResolve(): Object =
Longitude(value)
def copy(value: Double = value): Longitude =
Longitude(value)
}
Scastie Snippet and IntelliJ Code Template Gist
Implementing this desirably affects the Overview Categories…
- 2: Elevated reasoning complexity
- 4: Future technical debt)
- 5: Security vulnerabilities
- 6: Performance impact
The default strategy with case classes is to use “error by exception”. It is what using require
is. If the Boolean
condition is false, it throws an exception wrapping the error string you provide.
From a proper FP design perspective, exceptions are considered a poor way to manage known error conditions, like a case class’s preconditions. Exceptions are acceptable for exceptional things like running out of memory or opening a database connection. However, they should be avoided when the error is just part of the method’s domain.
For example, it is an inappropriate use of exceptions for a square root method to use an exception when passing a negative number. The square root method should be defined to return either an error (String
) if the input number is negative, or the actual result if the number is positive.
To add “error by value”, we will an additional applyE
method (where E is for Error) which uses an Either
to cover both the correct and the erred input parameter cases. The method looks like this…
def applyE(value: Double): Either[List[String], Longitude] =
generateInvalidStateErrors(value) match {
case Nil =>
Right(new Longitude(value))
case invalidStateErrors =>
Left(invalidStateErrors)
}
This looks remarkably similar to the apply
method. In fact, it is so similar, it is essentially code duplication. So, to remove code duplication, we will reimplement the apply
method to use the applyE
method which now looks like this…
def apply(value: Double): Longitude =
applyE(value) match {
case Right(longitude) =>
longitude
case Left(invalidStateErrors) =>
throw new IllegalStateException(invalidStateErrors.mkString("|"))
}
Scastie Snippet and IntelliJ Code Template Gist
Implementing this desirably affects the Overview Category…
- 3: Poor design for FP
With this new pattern in place, we have now ensured all precondition checking travels through a single method. And the same with instantiation. Assuming immutability has been retained, it has made trivial adding a memoization (a.k.a. caching) strategy.
Here’s an example of the companion object modified to incorporate memoization.
object Longitude extends (Double => Longitude) {
private var cachedInvalidStateErrorss: Map[Double, List[String]] = Map.empty
private var cachedInstances: Map[Double, Longitude] = Map.empty
def generateInvalidStateErrors(value: Double): List[String] = {
cachedInvalidStateErrorss.get(value) match {
case Some(invalidStateErrors) => invalidStateErrors
case None =>
val invalidStateErrors =
if (value < -180.0d)
List(s"value of value [$value] must be not be less than -180.0d")
else if (value > 180.0d)
List(s"value of value [$value] must be not be greater than 180.0d")
else
Nil
val newItem = (value, invalidStateErrors)
cachedInvalidStateErrorss = cachedInvalidStateErrorss + newItem
invalidStateErrors
}
}
…
def applyE(value: Double): Either[List[String], Longitude] =
generateInvalidStateErrors(value) match {
case Nil =>
Right(
cachedInstances.get(value) match {
case Some(longitude) => longitude
case None =>
val longitude = new Longitude(value)
val newItem = (value, longitude)
cachedInstances = cachedInstances + newItem
longitude
}
)
case invalidStateErrors =>
Left(invalidStateErrors)
}
}
Scastie Snippet and IntelliJ Code Template Gist
The memoization strategy shown in the above code snippet is for EXAMPLE PURPOSES ONLY because it’s a terrible default strategy.
Please use one of the many other options available. And specifically, investigate ScalaCache. It is a great generalized caching library that allows choosing between different specialized backing implementations.
Implementing this desirably affects the Overview Category…
- 6: Performance impact
To reduce the amount of text to read through, I’ve kept all of the above case class examples limited to a single property (a.k.a. member). Below is a version generalized to three properties.
object GeoCoordinate3d extends ((Double, Double, Double) => GeoCoordinate3d) {
val equatorialRadiusInMeters: Double = 6378137.0d
private var cachedInvalidStateErrorss: Map[(Double, Double, Double), List[String]] = Map.empty
private var cachedInstances: Map[(Double, Double, Double), GeoCoordinate3d] = Map.empty
def generateInvalidStateErrors(
longitude: Double
, latitude: Double
, altitudeInMeters: Double
): List[String] = {
val tuple3 = (longitude, latitude, altitudeInMeters)
cachedInvalidStateErrorss.get(tuple3) match {
case Some(invalidStateErrors) => invalidStateErrors
case None =>
val invalidStateErrors = {
List(
if (longitude < -180.0d)
s"value of longitude [$longitude] must be not be less than -180.0d"
else if (longitude > 180.0d)
s"value of longitude [$longitude] must be not be greater than 180.0d"
else
""
, if (latitude < -90.0d)
s"value of latitude [$latitude] must be not be less than -90.0d"
else if (latitude > 90.0d)
s"value of latitude [$latitude] must be not be greater than 90.0d"
else
""
, if (altitudeInMeters < -equatorialRadiusInMeters)
s"value of altitudeInMeters [$altitudeInMeters] must be not be less than -${equatorialRadiusInMeters}d"
else
""
).filter(_.nonEmpty)
}
val newItem = (tuple3, invalidStateErrors)
cachedInvalidStateErrorss = cachedInvalidStateErrorss + newItem
invalidStateErrors
}
}
def apply(
longitude: Double
, latitude: Double
, altitudeInMeters: Double
): GeoCoordinate3d =
applyE(longitude, latitude, altitudeInMeters) match {
case Right(geoCoordinate3d) =>
geoCoordinate3d
case Left(invalidStateErrors) =>
throw new IllegalStateException(invalidStateErrors.mkString("|"))
}
def applyE(
longitude: Double
, latitude: Double
, altitudeInMeters: Double
): Either[List[String], GeoCoordinate3d] =
generateInvalidStateErrors(longitude, latitude, altitudeInMeters) match {
case Nil =>
val tuple3 = (longitude, latitude, altitudeInMeters)
Right(
cachedInstances.get(tuple3) match {
case Some(geoCoordinate3d) => geoCoordinate3d
case None =>
val geoCoordinate3d = new GeoCoordinate3d(longitude, latitude, altitudeInMeters)
val newItem = (tuple3, geoCoordinate3d)
cachedInstances = cachedInstances + newItem
geoCoordinate3d
}
)
case invalidStateErrors =>
Left(invalidStateErrors)
}
}
final case class GeoCoordinate3d private(
longitude: Double
, latitude: Double
, altitudeInMeters: Double
) {
private def readResolve(): Object =
GeoCoordinate3d(longitude, latitude, altitudeInMeters)
def copy(
longitude: Double = longitude
, latitude: Double = latitude
, altitudeInMeters: Double = altitudeInMeters
): GeoCoordinate3d =
GeoCoordinate3d(longitude, latitude, altitudeInMeters)
}
Scastie Snippet and IntelliJ Code Template Gist
Again, the memoization strategy shown in the above code snippet is for EXAMPLE PURPOSES ONLY because it’s a terrible default strategy.
As suggested previously, please use one of the many other options available, like ScalaCache.
- Never override the equals and hashCode methods
- When you think you need to do so, use a normal class, and then ensure you very carefully follow the non-trivial full override pattern captured in this StackOverflow post
- Avoid using a case class if any state mutability is needed because the default assumption is the case class is representing a concurrency safe immutable value
- When state mutability is required, use a normal class and be sure to specify if it is concurrency safe
- Avoid using the sealed case objects/classes pattern for enumerations (a.k.a. FP Sum Type) and instead, use automated code generation, as the more code that is generated by the compiler, the smaller the number of defects, the accumulation of technical debt, and the security vulnerability surface areas
- In Scala 2.x, prefer using the Enumeratum library
- In Scala 3.x, prefer using the new Enum type
- All enumerations are an FP Sum Type, but not all FP Sum Types are an enumeration
Even if you find some of the above “boilerplate” undesirable, I hope you enjoyed and learned something about case classes such that it makes them more useful to you in your future Scala software engineering challenges.
- v2022.03.02
- Changed section “Updates” to “Version History” and moved to end
- Added new section “Multiple Property Example”
- Clarified “Overview Category” lists appending each section
- Added new “Performance impact” category
- Clarified use of and reference to FP Sum Type for enumerations
- Added Seth’s clarification around Akka’s use of Serialization
- Introduced acronym DCCP (Default Case Class Pattern)
- v2022.02.28
- Adjust based on feedback from various forums
- Added “Comments” section
- v2022.02.25a
- Initial version
@chaotic3quilibrium I don't think the new wording you chose is clear, and I also don't think it's accurate, as it still states that Akka "depends" on Java serialization, and that simply isn't the case.
I think you could simply omit mention of Akka entirely.
If you feel it's important to mention Akka, I feel this wording would be accurate: