This is a demo of experimental work into ClojureScript parameter type inference.
The work is in a branch, and if you'd like to try out any of this yourself, you can start up a REPL based on the experimental branch using the following command, and play along at home:
clj -Srepro -Sdeps '{:deps {github-mfikes/gist-1e2341b48b882587500547f6ba19279d {:git/url "https://gist.github.com/mfikes/1e2341b48b882587500547f6ba19279d" :sha "55d6c6e9a1c4fc9b58fec74aef1af4aba57bad2a"}}}' -m cljs.main -co @compile-opts.edn -re node -r
To date, all ClojureScript inference "flows" in a certain direction, essentially from primitive values or type-hinted locals to their use sites. For example, consider this expression:
(let [x 1
y "a"]
(+ x y))
The locals x
and y
are inferred to be of type number
and string
, so when they are passed to the +
macro, a type check is done and you see a warning:
WARNING: cljs.core/+, all arguments must be numbers, got [number string] instead at line 3 <cljs repl
This is also true for a new feature that has landed on master that makes it so that function return types are infered by having them "flow" out of the function body. For example, with
(defn foo [x]
"abc")
then (+ 1 (foo))
will emit the same kind of warning indicating that you are adding a number
and a string
. In this case, the types are flowing in the same direction: From the place where they are known, to the place where they are used.
The work in the experimental branch adds the ability for things to "flow" in the opposite direction by inferring the types of function parameters base on their use.
For example, in
(defn bar [x]
(inc x))
we know that x
must be of type number
simply because it is being passed to inc
. This experimental branch sees this situation and effectively makes it so that it is the same as if you had type hinted x
like this:
(defn bar [^number x]
(inc x))
Additionally this branch adds a new type mismatch warning, to let you know if the inferred type of an argument doesn't match the type of the parameter. (The type of the parameter can be hinted as above, or inferred, based on the new code in the branch.)
So, if you evaluate the following
(let [x "abc"]
(bar x))
you will get a type mismatch warning:
WARNING: Type mismatch calling bar: expected [number], got [string] instead at line 2 <cljs repl>
Inference can take you pretty far. Let's say that subs
had its arguments hinted. It's not; let's make our own for the purpose of experimentation:
(defn subs' [^string s ^number n]
(.substring s n))
Check that this works by swapping argument order: Evaluate (subs' 1 "a")
and you'll get
WARNING: Type mismatch calling subs': expected [string number], got [number string] instead
Now let's try something a bit more complicated:
(defn baz [x y]
(+ x y))
(defn quux [x y z]
(subs' z (baz x y)))
With no types in sight, let's see how it fares. Evaluating
(quux "a" 3 true)
yields
WARNING: Type mismatch calling quux: expected [number number string], got [string number boolean] instead
While that's cool, it is really just a natural extension of ClojureScript type inference along a new dimension that is useful, but arguably also covered more powerfully by Spec. If this makes it into ClojureScript, it would still be nice, though, helping catch simple mistakes when you are not using Spec.
But here's the real reason I was motivated to look into this:
Consider a function like the following:
(defn xyzzy [x]
(subs' x (dec (count x))))
In this case, because x
is being passed to subs'
, we know that it is of type string
. Because of this, we can make use of that information inside of xyzzy
.
In general, while type inference in ClojureScript can be used to help you catch type errors, it is really there so that the ClojureScript compiler can leverage type information in order to generate more optimimal JavaScript. Perhaps the oldest optimization of this kind is the elision of checks in code generate for if
constructs. (See Boolean Type Hints in ClojureScript).
There is another similar optimization on the table that eliminates unneeded string coercions in the generate code in the case that the compiler can infer that an argument to the str
macro is of type string
.
In particular, x
is being passed to count
in xyzzy
, and since it is known to be a of type string
, we need not actually invoke the count
runtime function (which needlessly employs a cond
checking for various types): We can just cut to the chase and call (.-length x)
.
There is a candidate optimization that would leverage the inferred type for count
and do this. That experimental optimization is on this branch, so if you look at the code generated for xyzzy
it involves x.length
instead of cljs.core.count(x)
:
function cljs$user$xyzzy(x){
return cljs.user.subs_SINGLEQUOTE_.call(null,x,(x.length - (1)));
}
Note: You see the JavaScript for xyzzy
by doing (set! *print-fn-bodies* true)
and then evaluating xyzzy
.
If you are using :advanced
optimizations, this JavaScript will be further optimized, with subs'
being inlined and with the code ending up looking like:
function cljs$user$xyzzy(x){
return x.substring(x.length - 1);
}
This kind of optimization can help existing code that uses count
when it can be inferred that the thnig being counted is a string
. Here is an example in cljs.pprint
.
In general, the more we can infer within ClojureScript, especially around the primitive types like boolean
, number
, string
, array
, etc., the more opportunities we will be able to identify for generating optimal code. I think the ideal situation would be that optimal code will be automatically generated when the compiler can see that it is possible (without any manual type hinting), while still retaining the Lispy dynamic runtime goodness that we all love about the language.