Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save coldnew/0ac9d73432d0ebe1b77411018e8a6135 to your computer and use it in GitHub Desktop.
Save coldnew/0ac9d73432d0ebe1b77411018e8a6135 to your computer and use it in GitHub Desktop.
Clojure Design Decisions

Rich Already Answered That!

A list of commonly asked questions, design decisions, reasons why Clojure is the way it is as they were answered directly by Rich (even when from many years ago, those answers are pretty much valid today!). Feel free to point friends and colleagues here next time they ask (again). Answers are pasted verbatim (I've made small adjustments for readibility, but never changed a sentence) from mailing lists, articles, chats. The link points back at them.

If you are talking about the aspect of pattern matching that acts as a conditional based upon structure, I'm not a big fan. I feel about them the way I do about switch statements - they're brittle and inextensible. I find the extensive use of them in some functional languages an indication of the lack of polymorphism in those languages. Some simple uses are fine - i.e. empty list vs not, but Clojure's if handles that directly. More complex pattern matches end up being switches on type, with the following negatives:

  • They are closed, enumerated in a single construct.
  • They are frequently replicated.
  • The binding aspect is used to rename structure components because the language throws the names away, presuming the types are enough. Again, redundantly, all over the app, with plenty of opportunity for error.

Polymorphic functions and maps are, IMO, preferable because:

  • They are open, no single co-located enumeration of possibilities.
  • There is no master switch, each 'type' (really situation, with Clojure's multimethods) owner takes care of themselves.
  • Maps allow for structures with named components where the names don't get lost.

I'd much rather use (defrecord Color [r g b a]) and end up with maps with named parts than do color of real*real*real*real, and have to rename the parts in patterns matches everywhere (was it rgba or argb?)

I'm not saying pattern matching is wrong or has no utility, just that I would hope it would be less needed in idiomatic Clojure code. Certainly I'd want the binding part of pattern matching independent of the conditional part, and I would make any destructuring-bind for Clojure work with all of the data structures, not just lists, i.e. vectors and maps and metadata.

When speaking about general TCO, we are not just talking about recursive self-calls, but also tail calls to other functions. Full TCO in the latter case is not possible on the JVM at present whilst preserving Java calling conventions (i.e without interpreting or inserting a trampoline etc).

While making self tail-calls into jumps would be easy (after all, that's what recur does), doing so implicitly would create the wrong expectations for those coming from, e.g. Scheme, which has full TCO. So, instead we have an explicit recur construct.

Essentially it boils down to the difference between a mere optimization and a semantic promise. Until I can make it a promise, I'd rather not have partial TCO.

Some people even prefer 'recur' to the redundant restatement of the function name. In addition, recur can enforce tail-call position.

Full TCO optimizes all calls in the tail position, whether to the same function or another. No language that uses the JVM stack and calling convention (e.g. Clojure and Scala) can do TCO since it would require either stack manipulation capabilities not offered in the bytecode spec or direct support in the JVM. The latter is the best hope for functional languages on the JVM, but I'm not sure is a priority for Sun as tail-calls are not idiomatic for JRuby/Jython/ Groovy.

IMO, either a language has full TCO or it simply doesn't support using function calls for control flow, and shouldn't use the term. I didn't implement self-tail-call optimizations because I think, for people who rely on TCO, having some calls be optimized and some not is more a recipe for error than a feature.

So, Clojure has recur, which clearly labels the only situations that won't grow the stack, apologizes for no real TCO, doesn't pretend to support function calls for control flow, and anxiously awaits TCO in the JVM, at which point it will fully support it.

In addition to recur, there are lazy sequences and lazy-cons. To the extent you can structure your processing using the lazy sequence model, including making your own calls to lazy cons, you can write traditional-looking functional and recursive solutions that not only don't grow the stack, but also don't create fully-realized result lists on the heap, a terrific efficiency boost. I am not contending that this is as general as TCO, but it has other nice properties, like the ability to make results that are not tail calls (e.g. list builders) space efficient.

I prefer patches. I understand some people don't. Can't we just agree to disagree? Why must this be repeatedly gone over?

Here's how I see it. I've spent at least 100,000x as much time on Clojure as you will in the difference between producing a patch vs a pull request. The command is:

git format-patch master --stdout > your-patch-file.diff 

There are two sides to change management - production/submission, and management/assessment/application/other-stewardship. People who argue that the process should be optimized for ease of submission are simply wrong. What if I said patches were twice as efficient for me to assess and manage as pull requests? (it's more than that) Do the math and figure out how the effort should best be distributed.

I don't think asking for patches is asking too much, and I truly appreciate the people who are going to the extra effort. And, I respect the fact that people disagree, and they might decide not to participate as a result. However, I don't need to justify my decision over and over. How about a little consideration for me, and the other list participants? There is a real diluting effect of get-it-off-my-chest messages such as these on the value of the list and the utility of reading it for myself and others.

Sometimes it's better to save it for your bartender :)

It can be very difficult to enumerate (or even remember :) all of the contending tradeoffs around something like Clojure's nil handling.

The is no doubt nil punning is a form of complecting. But you don't completely remove all issues merely by using empty collections and empty?, you need something like Maybe and then things get really gross (IMO, for a concise dynamic language).

I like nil punning, and find it to be a great source of generalization and reduction of edge cases overall, while admitting the introduction of edges in specific cases. I am with Tim in preferring CL's approach over Scheme's, and will admit to personal bias and a certain comfort level with its (albeit small) complexity.

However, it couldn't be retained everywhere. In particular, two things conspire against it. One is laziness. You can't actually return nil on rest without forcing ahead. Clojure old timers will remember when this was different and the problems it caused. I disagree with Mark that this is remains significantly complected, nil is not an empty collection, nil is nothing.

Second, unlike in CL where the only 'type' of empty collection is nil and cons is not polymorphic, in Clojure conj is polymorphic and there can only be one data type created for (conj nil ...), thus we have [], {}, and empty?. Were data structures to collapse to nil on emptying, they could not be refilled and retain type.

At this point, this discussion is academic as nothing could possibly change in this area.

The easiest way to think about is is that nil means nothing, and an empty collection is not nothing. The sequence functions are functions of collection to (possibly lazy) collection, and seq/next is forcing out of laziness. No one is stopping you from using rest and empty?, nor your friend from using next and conditionals. Peace!

It is quite easy to come up to some part of Clojure, with some need, and some current perspective, and find it unsatisfying. It certainly is not perfect. But a good presumption is that there might be more context, more constraints, or more issues in play than one might recognize at first glance.

Mark was most correct in identifying that the algorithmic complexity dominates this decision. As far as the functions being defined in terms of abstractions: while true, the abstractions are still something that could be refactored. They are a means, not an end.

History: 'last' is a Lisp function. In Common Lisp it operates on lists only. It is a function that, there, advertises slowness by its mere presence. Code using it has a presumption it will only be used on short lists, and in fact it generally is; in macros and things that process code. People reading that code can presume the same.

I do think that, in general, it is bad to have a single polymorphic function that, depending on the target arguments, transitions between different algorithmic complexity categories (N.B. that is not equivalent to using polymorphism to get better performance by leveraging some implementation specifics, it is a specific subset). 'nth' for seqs is a counterexample, was requested by the community, and I still think is somewhat of a mistake. At least, it would be good to also have an 'elt' with specific non-linear complexity (for the reasons I give below).

Usually, there is a 'fast' function and someone wants it to work on something that would be in a slower category. This case is the opposite, 'last' is slow and people want it to be faster where possible. The end result is the same.

Why not just be as polymorphic as possible? In Common Lisp, there is a set of functions that lets you use lists as associative maps. The performance is not good as things get larger (i.e the sky falls). While this made sense at one point in time (when there were only lists), is it still a good idea now? Should we have functions that expect maps accept lists?

In my experience, having everything work on/with lists made it really easy to get a system together that worked on small data sets, only to have it crash and burn on larger ones. And then, making it perform well was extremely challenging. Some of that was because of the lack of interface-conforming implementations of e.g. hashmaps to swap in. but another category of problem was not being able to see what calls mattered for performance, when lists would still be ok etc. Thus Clojure 'forces' you to make some of these decisions earlier, and make them explicit. Even so, it is still highly polymorphic.

At the moment, I'm not sure it would be bad to have 'last' behave better for vectors et al while still promising in documentation to be not necessarily better than linear. But I do know this - it is seriously unimportant. We all have much better things to do, I hope, than argue over things like this.

There is a perfectly adequate and well documented way to get the last element of a vector quickly, and code for which that matters can (and should, even if last were made faster) use it. Code for which it doesn't matter can use 'last' on vectors because - ipso facto it doesn't matter! People reading either code will know what is and isn't important, and that seems to matter most.

MIT and BSD are not reciprocal licenses. I want a reciprocal license. But I don't want the license to apply to, or dictate anything about, non-derivative work that is combined with mine, as GPL does. I think doing so is fundamentally wrong.

The fact that GPL is not compatible with that approach is a problem with GPL, and for users of GPL software.

I will not be using any 'customized' license. Using a well known license intact is the only way to make it easy for users to vet the license for use. Anything else requires lawyers for me and them.

I will not be dual licensing with GPL or LGPL. Both licenses allow the creation of derived works under GPL, a license I cannot use in my work. Allowing derived works I cannot use is not reciprocal and make no sense for me.

Everyone's experience will be different, of course. Here's my 2 cents:

I don't think SICP is a book about a programming language. It's a book about programming. It uses Scheme because Scheme is in many ways an atomic programming language. Lambda calculus + TCO for loops + Continuations for control abstraction + syntactic abstraction (macros) + mutable state for when you need it. It is very small. It is sufficient.

The book really deals with the issues in programming. Modularity, abstraction, state, data structures, concurrency etc. It provides descriptions and toy implementations of generic dispatch, objects, concurrency, lazy lists, (mutable) data structures, 'tagging' etc, designed to illuminate the issues.

Clojure is not an atomic programming language. I'm too tired/old/lazy to program with atoms. Clojure provides production implementations of generic dispatch, associative maps, metadata, concurrency infrastructure, persistent data structures, lazy seqs, polymorphic libraries etc etc. Much better implementations of some of the things you would be building by following along with SICP are in Clojure already.

So the value in SICP would be in helping you understand programming concepts. If you already understand the concepts, Clojure lets you get on with writing interesting and robust programs much more quickly, IMO. And I don't think the core of Clojure is appreciably bigger than Scheme's. What do Schemers think?

I think the Lisps prior to Clojure lead you towards a good path with functional programming and lists, only to leave you high and dry when it comes to the suite of data structures you need to write real programs, such data structures, when provided, being mutable and imperative. Prior Lisps were also designed before pervasive in-process concurrency, and before the value of high-performance polymorphic dispatch (e.g. virtual functions) as library infrastructure was well understood. Their libraries have decidedly limited polymorphism.

Alas, there is no book on Clojure yet. But, to the extent Schemes go beyond the standard to provide more complete functionality (as most do), there are no books on that either. Just docs in both cases.

Learning Scheme or Common Lisp on your way to Clojure is fine. There will be specifics that don't translate (from Scheme - no TCO, false/ nil/() differences, no continuations; from CL - Lisp-1, symbol/var dichotomy). But I personally don't think SICP will help you much with Clojure. YMMV.

Thank goodness it doesn't (and first/rest etc) or speculative destructuring (and many other Clojure idioms) would be impossible.

It is a CL influence to have semantics for many functions when applied to nil rather than generate errors, and, once one knows the rules, it can be used to great effect to produce elegant code. I've mentioned this before: http://people.cs.uchicago.edu/~wiseman/humor/large-programs.html

Also answers: why only some functions are more polymorphic (e.g. into, conj, count) while others are more type specific (e.g. contains?, assoc)?

It is an important aspect of Clojure that, in general, performance guarantees are part of the semantics of functions. In particular, functions are not supported on data structures where they are not performant.

You can't just swap out sequences or lists for maps or sets in a program. So, you need to use the right data structure for the job, and then the right functions will be available. When it makes sense, some functions are maximally polymorphic (e.g. seq, or into). But lookup, under any name, shouldn't be, IMO, so it isn't in Clojure. Similarly there is no insert at the beginning of vectors, append to end of lists, lookup for values of maps, insertion in the middle of sequences etc. I simply don't want to provide the tools for people to write programs in Clojure with N^2 performance - that benefits no one. But I've seen such performance degradation in many a Lisp program as people build upon O(n) functions on lists.

If you have a lookup table in your program, please use a set or map, or, if by index, a vector. You will have get, contains? or nth available, performance will be good, and that will be clear from the code.

seq-contains? exists because sometimes, say in a macro, you have a known short sequence and need to test for the presence of something. There's no need to copy it into a set. And people have been rolling their own includes? etc for this job. Now everyone can use seq- contains? for that. Its poor performance is indicated by its name and documentation. Should someone try to take a piece of code based on seq- contains? and try to scale it, they will see that function as an obvious bottleneck.

Your type-check argument is a straw man that doesn't hold up in Clojure practice. People tend to use the right data structures for the job and don't choose algorithms via type checks.

Clojure is not particularly duck-type-y, in general things are not unified by the presence of same-named functions but by underlying shared abstractions. Just because protocols can be used to make things reach many types doesn't mean they should be used to unify things that aren't uniform.

The issue is not single-pass vs multi-pass. It is instead, what constitutes a compilation unit, i.e., a pass over what? Clojure, like many Lisps before it, does not have a strong notion of a compilation unit. Lisps were designed to receive a set of interactions/forms via a REPL, not to compile files/modules/programs etc. This means you can build up a Lisp program interactively in very small pieces, switching between namespaces as you go, etc. It is a very valuable part of the Lisp programming experience. It implies that you can stream fragments of Lisp programs as small as a single form over sockets, and have them be compiled and evaluated as they arrive. It implies that you can define a macro and immediately have the compiler incorporate it in the compilation of the next form, or evaluate some small section of an otherwise broken file. Etc, etc. That "joke from the 1980's" still has legs, and can enable things large-unit/multi-unit compilers cannot. FWIW, Clojure's compiler is two-pass, but the units are tiny (top-level forms).

What Yegge is really asking for is multi-unit (and larger unit) compilation for circular reference, whereby one unit can refer to another, and vice versa, and the compilation of both units will leave hanging some references that can only be resolved after consideration of the other, and tying things together in a subsequent 'pass'. What would constitute such a unit in Clojure? Should Clojure start requiring files and defining semantics for them? (it does not now) Forward reference need not require multi-pass nor compilation units. Common Lisp allows references to undeclared and undefined things, and generates runtime errors should they not be defined by then. Clojure could have taken the same approach. The tradeoffs with that are as follows:

  1. less help at compilation time
  2. interning clashes

While #1 is arguably the fundamental dynamic language tradeoff, there is no doubt that this checking is convenient and useful. Clojure supports 'declare' so you are not forced to define your functions in any particular order.

#2 is the devil in the details. Clojure, like Common Lisp, is designed to be compiled, and does not in general look things up by name at runtime. (You can of course design fast languages that look things up, as do good Smalltalk implementations, but remember these languages focus on dealing with dictionary-carrying objects, Lisps do not). So, both Clojure and CL reify names into things whose addresses can be bound in the compiled code (symbols for CL, vars for Clojure). These reified things are 'interned', such that any reference to the same name refers to the same object, and thus compilation can proceed referring to things whose values are not yet defined.

But, what should happen here, when the compiler has never before seen bar?

(defn foo [] (bar))

or in CL:

(defun foo () (bar))

CL happily compiles it, and if bar is never defined, a runtime error will occur. Ok, but, what reified thing (symbol) did it use for bar during compilation? The symbol it interned when the form was read. So, what happens when you get the runtime error and realize that bar is defined in another package you forgot to import. You try to import other-package and, BAM!, another error - conflict, other-package:bar conflicts with read-in-package:bar. Then you go learn about uninterning.

In Clojure, the form doesn't compile, you get a message, and no var is interned for bar. You require other-namespace and continue. I vastly prefer this experience, and so made these tradeoffs. Many other benefits came about from using a non-interning reader, and interning only on definition/declaration. I'm not inclined to give them up, nor the benefits mentioned earlier, in order to support circular reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment