Skip to content

Instantly share code, notes, and snippets.

@frenchy64
Created March 20, 2026 05:27
Show Gist options
  • Select an option

  • Save frenchy64/2de3a8405a76fdeec9bc0ef7ac5ceefe to your computer and use it in GitHub Desktop.

Select an option

Save frenchy64/2de3a8405a76fdeec9bc0ef7ac5ceefe to your computer and use it in GitHub Desktop.
Malli: ref schema Performance Optimizations (Clojurists Together 2026 proposal)

Malli: ref schema Performance Optimizations

In my previous Clojurists Together work for malli I improved the performance of validating recursive refs, bounding the amount of memory required for validation regardless of the depth of input values. This is implemented by eagerly expanding recursive schemas until recursive points are discovered, instead of lazily realizing and caching (an unbounded number of) new levels of recursion as inputs require.

While this increases the reliability of long-running systems by preventing memory leaks caused by validating large inputs, it came with the drawback that more memory was required upfront during validator compilation. This is an obstacle Metabase has been navigating when testing this optimization. While they are excited to see validation of nested structures now uses constant memory, the amount of upfront memory required was uncomfortably high.

There are two main ways to tackle this problem.

The first is to discover recursion points lazily. This will reduce the initial memory use of recursive validators, however, while the maximum memory use is bounded, it will still grow over time as large inputs are validated. This makes it difficult to predict how much memory to reserve for the JVM, negatively impacting reliability of long-running systems.

The second is to find ways to reduce the maximum memory usage. In fact, the optimization interfered with a custom optimization Metabase introduced to share schemas across their usages. This points to the problem and solution: the reason for the increased memory was that identical schemas were using distinct validators each time it was used, and the solution is to ensure that references to the same schema point to the same Schema object and validator.

This problem can be quite severe in the general case (probably why Metabase needed to patch it). In this reproduction, the schema [:tuple] could be compiled over 4 million times when creating a single validator:

(def registry {::creates-1-validator [:tuple]
               ::creates-2-validators [:tuple ::creates-1-validator ::creates-1-validator ::creates-1-validator ::creates-1-validator]
               ::creates-16-validators [:tuple ::creates-2-validators ::creates-2-validators ::creates-2-validators ::creates-2-validators]
               ::creates-64-validators [:tuple ::creates-16-validators ::creates-16-validators ::creates-16-validators ::creates-16-validators]
               ::creates-256-validators [:tuple ::creates-64-validators ::creates-64-validators ::creates-64-validators ::creates-64-validators]
               ::creates-1024-validators [:tuple ::creates-256-validators ::creates-256-validators ::creates-256-validators ::creates-256-validators]
               ::creates-4096-validators [:tuple ::creates-1024-validators ::creates-1024-validators ::creates-1024-validators ::creates-1024-validators]
               ::creates-16384-validators [:tuple ::creates-4096-validators ::creates-4096-validators ::creates-4096-validators ::creates-4096-validators]
               ::creates-65536-validators [:tuple ::creates-16384-validators ::creates-16384-validators ::creates-16384-validators ::creates-16384-validators]
               ::creates-262144-validators [:tuple ::creates-65536-validators ::creates-65536-validators ::creates-65536-validators ::creates-65536-validators]
               ::creates-1048576-validators [:tuple ::creates-262144-validators ::creates-262144-validators ::creates-262144-validators ::creates-262144-validators]
               ::creates-4194304-validators [:tuple ::creates-1048576-validators ::creates-1048576-validators ::creates-1048576-validators ::creates-1048576-validators]})

With this registry, each level of depth N compiles (m/validator ::creates-1-validator) 4^N times.

e.g., (m/validator ::creates-4194304-validators) compiles (m/validator ::creates-1-validator) 4,194,304 (4^11) times.

Reproduction: https://github.com/frenchy64/malli/pull/36/files

Plumatic Schema would only compile it once. It's not so trivial to achieve with dynamically scoped refs like in Malli, but it's the same idea as detecting ref cycles, which we can now do reliably.

In this project, I intend to investigate ways to improve this situation so that systems like Metabase can both take advantage of the constant memory usage of recursive ref validators without incurring prohibitive amounts of upfront memory, hopefully in such a way that custom work-arounds addressing Malli's high memory use can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment