Effective Scala Case Class Patterns

Version: 2022.03.02

Available As

Overview

This is the “Effective Scala Case Class Patterns” guide I wished I had read years ago when starting my Scala journey. I’ve spent many hundreds of hours on futile tangents to earn the simplified actionable nuggets described below.

Because it is a fantastic integration of both OOP and FP, the case class is a key workhorse in any Scala software engineering project. Within Scala, it is primarily designed and intended (but not exclusively) to be used as an (immutable) FP Product Type.

Unfortunately, the DCCP (Default Case Class Pattern)…

    case class Longitude(value: Double)

…while terse, flexible, convenient, and extensively utilized, suffers from a number of issues which fall into the following categories:

Extension confusion
Elevated reasoning complexity
Poor design for FP
Future technical debt
Security vulnerabilities
Performance impact

This article aims to propose several new boilerplate patterns that you can use to replace the defaults provided by the Scala compiler to address the above categories. I’ll step through an evolutionary process that will produce patterns of increasing detail, any of which can be an “upgrade” stage you might prefer.

DELAYED (bug in IntelliJ’s Template engine):

~~Additionally, I will be providing the patterns as IntelliJ File Templates making them as easy to instantiate while coding as using the DCCP.~~

Always Mark as Final

After many years of using case classes, it has become obvious extending it via inheritance is just a bad idea.

Thus, our very first pattern is to merely prepend final to the DCCP:

    final case class Longitude(value: Double)

This ensures no descendants, well-intended or malicious, can be defined inheriting from the case class where they inadvertently or purposefully abuse the “Liskov Substitution Principle”.

While some people might not bother with this, it is possible to configure the Scala compiler to issue a warning or error when encountering a non-final case class (still trying to find the compiler option[s]). IntelliJ offers something similar. I prefer to trust the compiler to point out my mistakes over my own capacity to always remember this rule.

An additional performance reason exists for marking a (case) class as final. There are optimizations, both compiler and at JVM runtime, which fire only in the presence of the (case) class being marked final.

Why?

Because when a (case) class isn't marked final, the JVM runtime must assume a class loader could eventually load a class that extends the non-final (case) class.

More recent JVMs have more sophisticated strategies in this area. IOW, when the classloader is invoked, any current code that could be impacted has all of its optimizations rolled back and then reoptimized. The rollback and re-optimization also have a (hopefully smaller) performance impact.

Implementing this desirably affects the Overview Categories…

2: Elevated reasoning complexity
3: Poor design for FP
4: Future technical debt
5: Security vulnerabilities
6: Performance impact

Reproducing Compiler Generated Code

One of the biggest frustrations a Scala newcomer faces is when they want to “enhance” the DCCP by making the companion object explicit. For example, if one was to naively prepend the object as so…

    object Longitude {
    }
    final case class Longitude(value: Double)

The compiler-generated code extended the FunctionN trait on the “default companion object”. The code above does not. And any code dependent upon the missing trait will now fail to compile (ex: the tupled method).

As a software engineer new to Scala, this can be quite confronting and disorienting. There isn’t any “official guidance” on how one might go about making the default companion object explicit. Googling for this isn’t trivial. Here’s how I explored this problem space on StackOverflow in 2014.

So, the solution is to extend the companion object with FunctionN so it looks like this:

    object Longitude extends (Double => Longitude) {
      def apply(value: Double): Longitude =
        new Longitude(value)
    }
    final case class Longitude(value: Double)

Scastie Snippet and ~~IntelliJ Code Template Gist~~

To better understand how FunctionN is represented by (Double => Longitude), here’s a 2013 post on StackOverflow where I had to explore this in more depth myself.

And this pattern will now be the basis upon which we fill out the remainder of the template.

Implementing this desirably affects the Overview Category:

1: Extension confusion

Preventing the Creation of Instances Containing an Invalid State

One of the maxims in OOP’s DbC (Design by Contract - Eiffel originated) and FP is to prevent invalid states from being representable. If one successfully aims at and achieves this goal, it dramatically reduces the “guard” code (i.e. preconditions checks) for any clients making use of the case class instances.

A first pass naive implementation (which is also found and recommended in almost all Scala textbooks) is to use the require functionality within the case class’s constructor. It looks like this:

    final case class Longitude(value: Double) {
      require(value >= -180.0d, s"value [$value] must greater than or equal to -180.0d")
      require(value <= 180.0d, s"value [$value] must be less than or equal to 180.0d")
    }

Scastie Snippet and ~~IntelliJ Code Template Gist~~

This implementation throws an exception at the first require that fails.

There are three problems with this approach:

If there are other parameters that also need to be validated, it requires multiple passes to check other parameters which could have been correctly checked in a prior pass if multiple errors were allowed to be returned.
The require implementation forces the client to deal with exceptions (avoid using exceptions for expected errors, like really). In most cases, the performance overhead of the exception infrastructure (both CPU effort and memory pressure/churn) is both significant and essentially unoptimizable (this remains contested). And even outside the poor performance reasons, FP implementations strongly prefer “error by value” as opposed to “error by exception”.
It doesn’t allow the client to “check the preconditions” prior to instantiation. This mixing of concerns prevents optimization opportunities where the constructor, and therefore the memory allocation for the instance, is never invoked because the preconditions are already known to have not been met.

We can address all of these concerns in one fell swoop using a standard separation of concerns pattern. In this case, it’s OOP’s Factory (a.k.a. Builder, Smart Constructor, etc.) Pattern.

First, we move all of the validation logic into its own method, generateInvalidStateErrors. Then, we ensure the apply method invokes the new operator and accepts/rejects the instantiation by first validating the passed parameter value(s). It should now look like this…

    object Longitude extends (Double => Longitude) {
      def generateInvalidStateErrors(value: Double): List[String] =
        if (value < -180.0d)
          List(s"value of value [$value] must be not be less than -180.0d")
        else
          if (value > 180.0d)
            List(s"value of value [$value] must be not be greater than 180.0d")
          else
            Nil

      def apply(value: Double): Longitude =
        generateInvalidStateErrors(value) match {
          case Nil =>
            new Longitude(value)
          case invalidStateErrors =>
            throw new IllegalStateException(invalidStateErrors.mkString("|"))
        }
    }
    final case class Longitude(value: Double)

Scastie Snippet and ~~IntelliJ Code Template Gist~~

While the pattern is now in place, there is still a hole where a client can just use the new operator to bypass the apply method in the companion object. That is fixed by marking the case class constructor as private. That looks like this…

    final case class Longitude private(value: Double)

Scastie Snippet and ~~IntelliJ Code Template Gist~~

It looks like we’re done, right?

Oops! Sneaky attack vectors ahead!

It turns out there are two other compiler-generated constructor pathways we must address

readResolve method - Supports the compiler-generated Serializable interface. This is especially pernicious it instantiates the memory for the case class, and then directly injects the (possibly malicious) deserialized contents into the instance’s memory. This completely bypasses both the apply method and the object constructor. And this means no validation takes place whatsoever.
copy method - Uses the new operator, and can do so because the method is within the private scope of the constructor. This bypasses the validation we moved into the companion object and invoke via the apply method.

In each of these cases, we want to reroute the method to the companion object’s apply method. It should look like this…

    final case class Longitude private(value: Double) {
      private def readResolve(): Object =
        Longitude(value)

      def copy(value: Double = value): Longitude =
        Longitude(value)
    }

Scastie Snippet and ~~IntelliJ Code Template Gist~~

If you know the case class will never be used anywhere that utilizes Java serialization, then feel free to remove the readResolve method.

While I, too, hate Java Serialization, remember that some platforms such as Kafka and Spark continue to depend upon Java serialization. (You might also encounter old Akka code that uses it, though it isn't Akka's default and isn't recommended.) And this means when they do so, if the readResolve method is missing, you’ve left your case class open to a malicious attack that bypasses your case class’s immutable invariant encoded in the precondition check implemented in the generateInvalidStateErrors method.

We have now ensured there are no reasonable ways to instantiate this case class without going through the precondition check (validation of state prior to invoking instantiation overhead). There are pathological pathways that can be used that involve the illicit use of the Java reflection API, and there is no real way for us to protect against those.

The fully expressed pattern should now look like this…

    object Longitude extends (Double => Longitude) {
      def generateInvalidStateErrors(value: Double): List[String] =
        if (value < -180.0d)
          List(s"value of value [$value] must be not be less than -180.0d")
        else
          if (value > 180.0d)
            List(s"value of value [$value] must be not be greater than 180.0d")
          else
            Nil

      def apply(value: Double): Longitude =
        generateInvalidStateErrors(value) match {
          case Nil =>
            new Longitude(value)
          case invalidStateErrors =>
            throw new IllegalStateException(invalidStateErrors.mkString("|"))
        }
    }
    final case class Longitude private(value: Double) {
      private def readResolve(): Object =
        Longitude(value)

      def copy(value: Double = value): Longitude =
        Longitude(value)
    }

Scastie Snippet and ~~IntelliJ Code Template Gist~~

Implementing this desirably affects the Overview Categories…

2: Elevated reasoning complexity
4: Future technical debt)
5: Security vulnerabilities
6: Performance impact

Adding an Error-By-Value Constructor

The default strategy with case classes is to use “error by exception”. It is what using require is. If the Boolean condition is false, it throws an exception wrapping the error string you provide.

From a proper FP design perspective, exceptions are considered a poor way to manage known error conditions, like a case class’s preconditions. Exceptions are acceptable for exceptional things like running out of memory or opening a database connection. However, they should be avoided when the error is just part of the method’s domain.

For example, it is an inappropriate use of exceptions for a square root method to use an exception when passing a negative number. The square root method should be defined to return either an error (String) if the input number is negative, or the actual result if the number is positive.

To add “error by value”, we will an additional applyE method (where E is for Error) which uses an Either to cover both the correct and the erred input parameter cases. The method looks like this…

    def applyE(value: Double): Either[List[String], Longitude] =
      generateInvalidStateErrors(value) match {
        case Nil =>
          Right(new Longitude(value))
        case invalidStateErrors =>
          Left(invalidStateErrors)
      }

This looks remarkably similar to the apply method. In fact, it is so similar, it is essentially code duplication. So, to remove code duplication, we will reimplement the apply method to use the applyE method which now looks like this…

    def apply(value: Double): Longitude =
      applyE(value) match {
        case Right(longitude) =>
          longitude
        case Left(invalidStateErrors) =>
          throw new IllegalStateException(invalidStateErrors.mkString("|"))
      }

Scastie Snippet and ~~IntelliJ Code Template Gist~~

Implementing this desirably affects the Overview Category…

3: Poor design for FP

Adding Memoization/Caching

With this new pattern in place, we have now ensured all precondition checking travels through a single method. And the same with instantiation. Assuming immutability has been retained, it has made trivial adding a memoization (a.k.a. caching) strategy.

Here’s an example of the companion object modified to incorporate memoization.

    object Longitude extends (Double => Longitude) {
      private var cachedInvalidStateErrorss: Map[Double, List[String]] = Map.empty
      private var cachedInstances: Map[Double, Longitude] = Map.empty

      def generateInvalidStateErrors(value: Double): List[String] = {
        cachedInvalidStateErrorss.get(value) match {
          case Some(invalidStateErrors) => invalidStateErrors
          case None =>
            val invalidStateErrors =
              if (value < -180.0d)
                List(s"value of value [$value] must be not be less than -180.0d")
              else if (value > 180.0d)
                List(s"value of value [$value] must be not be greater than 180.0d")
              else
                Nil
            val newItem = (value, invalidStateErrors)
            cachedInvalidStateErrorss = cachedInvalidStateErrorss + newItem
            invalidStateErrors
        }
      }

      …

      def applyE(value: Double): Either[List[String], Longitude] =
        generateInvalidStateErrors(value) match {
          case Nil =>
            Right(
              cachedInstances.get(value) match {
                case Some(longitude) => longitude
                case None =>
                  val longitude = new Longitude(value)
                  val newItem = (value, longitude)
                  cachedInstances = cachedInstances + newItem
                  longitude
              }
            )
          case invalidStateErrors =>
            Left(invalidStateErrors)
        }
    }

Scastie Snippet and ~~IntelliJ Code Template Gist~~

The memoization strategy shown in the above code snippet is for EXAMPLE PURPOSES ONLY because it’s a terrible default strategy.

Please use one of the many other options available. And specifically, investigate ScalaCache. It is a great generalized caching library that allows choosing between different specialized backing implementations.

Implementing this desirably affects the Overview Category…

6: Performance impact

Multiple Property Example

To reduce the amount of text to read through, I’ve kept all of the above case class examples limited to a single property (a.k.a. member). Below is a version generalized to three properties.

    object GeoCoordinate3d extends ((Double, Double, Double) => GeoCoordinate3d) {
      val equatorialRadiusInMeters: Double = 6378137.0d

      private var cachedInvalidStateErrorss: Map[(Double, Double, Double), List[String]] = Map.empty
      private var cachedInstances: Map[(Double, Double, Double), GeoCoordinate3d] = Map.empty

      def generateInvalidStateErrors(
          longitude: Double
        , latitude: Double
        , altitudeInMeters: Double
      ): List[String] = {
        val tuple3 = (longitude, latitude, altitudeInMeters)
        cachedInvalidStateErrorss.get(tuple3) match {
          case Some(invalidStateErrors) => invalidStateErrors
          case None =>
            val invalidStateErrors = {
              List(
                  if (longitude < -180.0d)
                    s"value of longitude [$longitude] must be not be less than -180.0d"
                  else if (longitude > 180.0d)
                    s"value of longitude [$longitude] must be not be greater than 180.0d"
                  else
                    ""
                , if (latitude < -90.0d)
                    s"value of latitude [$latitude] must be not be less than -90.0d"
                  else if (latitude > 90.0d)
                    s"value of latitude [$latitude] must be not be greater than 90.0d"
                  else
                    ""
                , if (altitudeInMeters < -equatorialRadiusInMeters)
                    s"value of altitudeInMeters [$altitudeInMeters] must be not be less than -${equatorialRadiusInMeters}d"
                   else
                     ""
              ).filter(_.nonEmpty)
            }
            val newItem = (tuple3, invalidStateErrors)
            cachedInvalidStateErrorss = cachedInvalidStateErrorss + newItem
            invalidStateErrors
        }
      }

      def apply(
          longitude: Double
        , latitude: Double
        , altitudeInMeters: Double
      ): GeoCoordinate3d =
        applyE(longitude, latitude, altitudeInMeters) match {
          case Right(geoCoordinate3d) =>
            geoCoordinate3d
          case Left(invalidStateErrors) =>
            throw new IllegalStateException(invalidStateErrors.mkString("|"))
        }

      def applyE(
          longitude: Double
        , latitude: Double
        , altitudeInMeters: Double
      ): Either[List[String], GeoCoordinate3d] =
        generateInvalidStateErrors(longitude, latitude, altitudeInMeters) match {
          case Nil =>
            val tuple3 = (longitude, latitude, altitudeInMeters)
            Right(
              cachedInstances.get(tuple3) match {
                case Some(geoCoordinate3d) => geoCoordinate3d
                case None =>
                  val geoCoordinate3d = new GeoCoordinate3d(longitude, latitude, altitudeInMeters)
                  val newItem = (tuple3, geoCoordinate3d)
                  cachedInstances = cachedInstances + newItem
                  geoCoordinate3d
              }
            )
          case invalidStateErrors =>
            Left(invalidStateErrors)
        }
    }

    final case class GeoCoordinate3d private(
        longitude: Double
      , latitude: Double
      , altitudeInMeters: Double
    ) {
      private def readResolve(): Object =
        GeoCoordinate3d(longitude, latitude, altitudeInMeters)

      def copy(
          longitude: Double = longitude
        , latitude: Double = latitude
        , altitudeInMeters: Double = altitudeInMeters
      ): GeoCoordinate3d =
        GeoCoordinate3d(longitude, latitude, altitudeInMeters)
    }

Scastie Snippet and ~~IntelliJ Code Template Gist~~

Again, the memoization strategy shown in the above code snippet is for EXAMPLE PURPOSES ONLY because it’s a terrible default strategy.

As suggested previously, please use one of the many other options available, like ScalaCache.

Tips & Tricks

Never override the equals and hashCode methods
- When you think you need to do so, use a normal class, and then ensure you very carefully follow the non-trivial full override pattern captured in this StackOverflow post
Avoid using a case class if any state mutability is needed because the default assumption is the case class is representing a concurrency safe immutable value
- When state mutability is required, use a normal class and be sure to specify if it is concurrency safe
Avoid using the sealed case objects/classes pattern for enumerations (a.k.a. FP Sum Type) and instead, use automated code generation, as the more code that is generated by the compiler, the smaller the number of defects, the accumulation of technical debt, and the security vulnerability surface areas
- In Scala 2.x, prefer using the Enumeratum library
- In Scala 3.x, prefer using the new Enum type
- All enumerations are an FP Sum Type, but not all FP Sum Types are an enumeration

Summary

Even if you find some of the above “boilerplate” undesirable, I hope you enjoyed and learned something about case classes such that it makes them more useful to you in your future Scala software engineering challenges.

Comments

Reddit
Gist (at the bottom)
Twitter

Version History

v2022.03.02
- Changed section “Updates” to “Version History” and moved to end
- Added new section “Multiple Property Example”
- Clarified “Overview Category” lists appending each section
  - Added new “Performance impact” category
- Clarified use of and reference to FP Sum Type for enumerations
- Added Seth’s clarification around Akka’s use of Serialization
- Introduced acronym DCCP (Default Case Class Pattern)
v2022.02.28
- Adjust based on feedback from various forums
- Added “Comments” section
v2022.02.25a
- Initial version

Contact

[email protected]

chaotic3quilibrium/Effective Scala Case Class Patterns.md

Effective Scala Case Class Patterns

Available As

Overview

Always Mark as Final

Reproducing Compiler Generated Code

Preventing the Creation of Instances Containing an Invalid State

Adding an Error-By-Value Constructor

Adding Memoization/Caching

Multiple Property Example

Tips & Tricks

Summary

Comments

Version History

Contact

rodobarcaaa commented Mar 4, 2022

Uh oh!