I've read about the subject, but I want to look closer. The following analysis will be ECA (Exploratory Clojure Analysis) on data readers and I hope more people can benefit from it.
Functions used to read specific tagged literals. For example,
#inst "2020-10-05" is a built-in literal in Clojure and if you
type it in the REPL, you will get a java.util.Date
instance back.
The magic happens because clojure has a mechanism where we can define a loose contract between the tagged literal and its implementation which is the actual code that will be executed in order to transform the self-describing data into something else.
For example, if you want to change the default implementation that
handles the #inst
literal you can proceed like
(binding [*data-readers* {'inst #'clojure.instant/read-instant-timestamp}]
(type (read-string "#inst \"2020-10-05\"")))
;; => java.sql.Timestamp
(binding [*data-readers* {'inst #'clojure.instant/read-instant-calendar}]
(type (read-string "#inst \"2020-10-05\"")))
;; => java.util.GregorianCalendar
You can also add one of these options to the data_readers.clj
file
or even create a completely new one. If you do so, your whole
application will use the new definition for the conversion.
We will look how to create custom data readers.
Clojure enable you to create your own data readers too. This can be interesting to share context between services.
Currently, I work in financial industry with Private Credit and there are some special values that I wish I could encode differently to add more context and validations around them. Let's see some examples and how to achieve it.
(All these rules to create the context are purely illustrative...)
-
Internal Rate of Return (IRR)
- Always BigDecimals
- Never negative
-
Credit Score
- Always Integer
- Never larger than 1000
-
Credit Risk
- Always a Single Letter
- Letter in set {A,B,C,D,E}
First create the following file src/data_readers.clj
{company/irr wand.util.readers/irr
company/credit-score wand.util.readers/credit-score
company/credit-risk wand.util.readers/credit-risk}
(wand is the name of my prototype project)
As best practice, you should always use qualified namespaces literals and reserve the unqualified for Clojure.
The content of the file src/wand/util/readers.clj
(ns wand.util.readers)
(defn irr [form]
form)
(defn credit-score [form]
form)
(defn credit-risk [form]
form)
You can now connect to your REPL and at the src/wand/core.clj
namespace you can try
(ns wand.core
(:require [wand.util.readers]))
#company/credit-score 100
;; => 100
Do not forget to require the implementations. Now, we can implement the rules behind the tagged literals.
(defn irr
[form]
(if (pos? form)
(BigDecimal. form)
(throw (ex-info (str form " is negative!") {}))))
(defn credit-score
[form]
(if (<= form 1000)
(Integer. form)
(throw (ex-info (str form " is larger than 1000!") {}))))
(defn credit-risk
[form]
(if (contains? #{"A", "B", "C", "D", "E"} form)
form
(throw (ex-info (str form " is not in the set{A,B,C,D,E}!") {}))))
Cool!! Now, if you need to create a map with some custom data, you can offload the burden to check the validity of that number to the tagged literal.
{:score 232
:risk "V"
:irr 2.1}
Is a perfect valid map. However,
{:score #company/credit-score 232
:risk #company/credit-risk "V"
:irr #company/irr 2.1}
Is not!
1. Caused by clojure.lang.ExceptionInfo
V is not in the set{A,B,C,D,E}!
{}
Great!!
Now, let's say that we want to read an EDN file that was produced by other service that also shares the same meaning for our custom literals.
The content of our credit.edn
file
{:score #company/credit-score 232
:risk #company/credit-risk "B"
:irr #company/irr 2.1}
All great! Let's read it
(edn/read-string (slurp (io/resource "credit.edn")))
And... boom!
1. Unhandled java.lang.RuntimeException
No reader function for tag company/credit-score
What? We defined the data readers and they all work just fine. Yes, but life is not so easy... There are two nice materials talking about the problem with safety of reading data and evaluating untrusted code.
The first is the great
Documentation entry
provided at the clojure.edn/read
function and the second is this
great talk
from Steve Miner (The Data-Reader's Guide to the Galaxy)
So, I will consider you at least read the documentation entry, and let's be confident that no malicious code was added in the implementation of our three custom literals.
We should explicit pass our readers to the edn/read-string
(edn/read-string {:readers *data-readers*}
(slurp (io/resource "credit.edn")))
;; => {:score 232, :risk "B", :irr 2.100000000000000088817841970012523233890533447265625M}
And if someone tried to cheat on you and handed over the following EDN file
{:score #company/credit-score 2000
:risk #company/credit-risk "B"
:irr #company/irr 2.1}
You will spot right on reading
1. Unhandled clojure.lang.ExceptionInfo
2000 is larger than 1000!
{}
Let's get fancy! Let's say we want to provide special printing to our
Score, Risk, and IRR values. After all, when you print a #inst
literal, you get a nice #inst ....
value back at your REPL. Also, we
will encode these company-specific values into their own types.
To enable such functionality I will encode our values into custom
records and extend the print-method
interface to them.
(ns wand.util.readers)
(defrecord Irr [value])
(defrecord CreditScore [value])
(defrecord CreditRisk [value])
(defn irr
[form]
(if (pos? form)
(->Irr (BigDecimal. form))
(throw (ex-info (str form " is negative!") {}))))
(defn credit-score
[form]
(if (<= form 1000)
(->CreditScore (Integer. form))
(throw (ex-info (str form " is larger than 1000!") {}))))
(defn credit-risk
[form]
(if (contains? #{"A", "B", "C", "D", "E"} form)
(->CreditRisk form)
(throw (ex-info (str form " is not in the set{A,B,C,D,E}!") {}))))
(defmethod print-method wand.util.readers.Irr [irr ^java.io.Writer w]
(.write w (format "#company/irr %s%%" (:value irr))))
(defmethod print-method wand.util.readers.CreditRisk [credit-risk ^java.io.Writer w]
(.write w (format "#company/credit-risk %s Level" (:value credit-risk))))
(defmethod print-method wand.util.readers.CreditScore [credit-score ^java.io.Writer w]
(.write w (format "#company/credit-score %s Points" (:value credit-score))))
And our example in the core
namespace become
{:score #company/credit-score 232
:risk #company/credit-risk "C"
:irr #company/irr 2.1}
;; => {:score #company/credit-score 232 Points,
;; :risk #company/credit-risk C Level,
;; :irr #company/irr 2.100000000000000088817841970012523233890533447265625%}
I don't know if I find this sooo useful now. But we can do that. :)
I can see the benefits of data readers to secure regular understanding about a piece of data between (and inner) services. And to enable different contexts to handle the data transformation in their own way.
However, as we currently do not leverage this approach too much at work, I don't know exactly the shortcomings of this decision in the long run.
Would love to hear more people presenting their experiences with a codebase that heavily leverages custom tag literals.