A list of commonly asked questions, design decisions, reasons why Clojure is the way it is as they were answered directly by Rich (even when from many years ago, those answers are pretty much valid today!). Feel free to point friends and colleagues here next time they ask (again). Answers are pasted verbatim (I've made small adjustments for readibility, but never changed a sentence) from mailing lists, articles, chats. The link points back at them.
If you are talking about the aspect of pattern matching that acts as a conditional based upon structure, I'm not a big fan. I feel about them the way I do about switch statements - they're brittle and inextensible. I find the extensive use of them in some functional languages an indication of the lack of polymorphism in those languages. Some simple uses are fine - i.e. empty list vs not, but Clojure's if handles that directly. More complex pattern matches end up being switches on type, with the following negatives:
- They are closed, enumerated in a single construct.
- They are frequently replicated.
- The binding aspect is used to rename structure components because the language throws the names away, presuming the types are enough. Again, redundantly, all over the app, with plenty of opportunity for error.
Polymorphic functions and maps are, IMO, preferable because:
- They are open, no single co-located enumeration of possibilities.
- There is no master switch, each 'type' (really situation, with Clojure's multimethods) owner takes care of themselves.
- Maps allow for structures with named components where the names don't get lost.
I'd much rather use (defrecord Color [r g b a])
and end up with maps
with named parts than do color of real*real*real*real
, and have to
rename the parts in patterns matches everywhere (was it rgba or argb?)
I'm not saying pattern matching is wrong or has no utility, just that I would hope it would be less needed in idiomatic Clojure code. Certainly I'd want the binding part of pattern matching independent of the conditional part, and I would make any destructuring-bind for Clojure work with all of the data structures, not just lists, i.e. vectors and maps and metadata.
When speaking about general TCO, we are not just talking about recursive self-calls, but also tail calls to other functions. Full TCO in the latter case is not possible on the JVM at present whilst preserving Java calling conventions (i.e without interpreting or inserting a trampoline etc).
While making self tail-calls into jumps would be easy (after all, that's what recur does), doing so implicitly would create the wrong expectations for those coming from, e.g. Scheme, which has full TCO. So, instead we have an explicit recur construct.
Essentially it boils down to the difference between a mere optimization and a semantic promise. Until I can make it a promise, I'd rather not have partial TCO.
Some people even prefer 'recur' to the redundant restatement of the function name. In addition, recur can enforce tail-call position.
Full TCO optimizes all calls in the tail position, whether to the same function or another. No language that uses the JVM stack and calling convention (e.g. Clojure and Scala) can do TCO since it would require either stack manipulation capabilities not offered in the bytecode spec or direct support in the JVM. The latter is the best hope for functional languages on the JVM, but I'm not sure is a priority for Sun as tail-calls are not idiomatic for JRuby/Jython/ Groovy.
IMO, either a language has full TCO or it simply doesn't support using function calls for control flow, and shouldn't use the term. I didn't implement self-tail-call optimizations because I think, for people who rely on TCO, having some calls be optimized and some not is more a recipe for error than a feature.
So, Clojure has recur, which clearly labels the only situations that won't grow the stack, apologizes for no real TCO, doesn't pretend to support function calls for control flow, and anxiously awaits TCO in the JVM, at which point it will fully support it.
In addition to recur, there are lazy sequences and lazy-cons. To the extent you can structure your processing using the lazy sequence model, including making your own calls to lazy cons, you can write traditional-looking functional and recursive solutions that not only don't grow the stack, but also don't create fully-realized result lists on the heap, a terrific efficiency boost. I am not contending that this is as general as TCO, but it has other nice properties, like the ability to make results that are not tail calls (e.g. list builders) space efficient.
I prefer patches. I understand some people don't. Can't we just agree to disagree? Why must this be repeatedly gone over?
Here's how I see it. I've spent at least 100,000x as much time on Clojure as you will in the difference between producing a patch vs a pull request. The command is:
git format-patch master --stdout > your-patch-file.diff
There are two sides to change management - production/submission, and management/assessment/application/other-stewardship. People who argue that the process should be optimized for ease of submission are simply wrong. What if I said patches were twice as efficient for me to assess and manage as pull requests? (it's more than that) Do the math and figure out how the effort should best be distributed.
I don't think asking for patches is asking too much, and I truly appreciate the people who are going to the extra effort. And, I respect the fact that people disagree, and they might decide not to participate as a result. However, I don't need to justify my decision over and over. How about a little consideration for me, and the other list participants? There is a real diluting effect of get-it-off-my-chest messages such as these on the value of the list and the utility of reading it for myself and others.
Sometimes it's better to save it for your bartender :)
It can be very difficult to enumerate (or even remember :) all of the contending tradeoffs around something like Clojure's nil handling.
The is no doubt nil punning is a form of complecting. But you don't completely remove all issues merely by using empty collections and empty?, you need something like Maybe and then things get really gross (IMO, for a concise dynamic language).
I like nil punning, and find it to be a great source of generalization and reduction of edge cases overall, while admitting the introduction of edges in specific cases. I am with Tim in preferring CL's approach over Scheme's, and will admit to personal bias and a certain comfort level with its (albeit small) complexity.
However, it couldn't be retained everywhere. In particular, two things conspire against it. One is laziness. You can't actually return nil on rest without forcing ahead. Clojure old timers will remember when this was different and the problems it caused. I disagree with Mark that this is remains significantly complected, nil is not an empty collection, nil is nothing.
Second, unlike in CL where the only 'type' of empty collection is nil and cons is not polymorphic, in Clojure conj is polymorphic and there can only be one data type created for (conj nil ...), thus we have [], {}, and empty?. Were data structures to collapse to nil on emptying, they could not be refilled and retain type.
At this point, this discussion is academic as nothing could possibly change in this area.
The easiest way to think about is is that nil means nothing, and an empty collection is not nothing. The sequence functions are functions of collection to (possibly lazy) collection, and seq/next is forcing out of laziness. No one is stopping you from using rest and empty?, nor your friend from using next and conditionals. Peace!
It is quite easy to come up to some part of Clojure, with some need, and some current perspective, and find it unsatisfying. It certainly is not perfect. But a good presumption is that there might be more context, more constraints, or more issues in play than one might recognize at first glance.
Mark was most correct in identifying that the algorithmic complexity dominates this decision. As far as the functions being defined in terms of abstractions: while true, the abstractions are still something that could be refactored. They are a means, not an end.
History: 'last' is a Lisp function. In Common Lisp it operates on lists only. It is a function that, there, advertises slowness by its mere presence. Code using it has a presumption it will only be used on short lists, and in fact it generally is; in macros and things that process code. People reading that code can presume the same.
I do think that, in general, it is bad to have a single polymorphic function that, depending on the target arguments, transitions between different algorithmic complexity categories (N.B. that is not equivalent to using polymorphism to get better performance by leveraging some implementation specifics, it is a specific subset). 'nth' for seqs is a counterexample, was requested by the community, and I still think is somewhat of a mistake. At least, it would be good to also have an 'elt' with specific non-linear complexity (for the reasons I give below).
Usually, there is a 'fast' function and someone wants it to work on something that would be in a slower category. This case is the opposite, 'last' is slow and people want it to be faster where possible. The end result is the same.
Why not just be as polymorphic as possible? In Common Lisp, there is a set of functions that lets you use lists as associative maps. The performance is not good as things get larger (i.e the sky falls). While this made sense at one point in time (when there were only lists), is it still a good idea now? Should we have functions that expect maps accept lists?
In my experience, having everything work on/with lists made it really easy to get a system together that worked on small data sets, only to have it crash and burn on larger ones. And then, making it perform well was extremely challenging. Some of that was because of the lack of interface-conforming implementations of e.g. hashmaps to swap in. but another category of problem was not being able to see what calls mattered for performance, when lists would still be ok etc. Thus Clojure 'forces' you to make some of these decisions earlier, and make them explicit. Even so, it is still highly polymorphic.
At the moment, I'm not sure it would be bad to have 'last' behave better for vectors et al while still promising in documentation to be not necessarily better than linear. But I do know this - it is seriously unimportant. We all have much better things to do, I hope, than argue over things like this.
There is a perfectly adequate and well documented way to get the last element of a vector quickly, and code for which that matters can (and should, even if last were made faster) use it. Code for which it doesn't matter can use 'last' on vectors because - ipso facto it doesn't matter! People reading either code will know what is and isn't important, and that seems to matter most.
MIT and BSD are not reciprocal licenses. I want a reciprocal license. But I don't want the license to apply to, or dictate anything about, non-derivative work that is combined with mine, as GPL does. I think doing so is fundamentally wrong.
The fact that GPL is not compatible with that approach is a problem with GPL, and for users of GPL software.
I will not be using any 'customized' license. Using a well known license intact is the only way to make it easy for users to vet the license for use. Anything else requires lawyers for me and them.
I will not be dual licensing with GPL or LGPL. Both licenses allow the creation of derived works under GPL, a license I cannot use in my work. Allowing derived works I cannot use is not reciprocal and make no sense for me.
Everyone's experience will be different, of course. Here's my 2 cents:
I don't think SICP is a book about a programming language. It's a book about programming. It uses Scheme because Scheme is in many ways an atomic programming language. Lambda calculus + TCO for loops + Continuations for control abstraction + syntactic abstraction (macros) + mutable state for when you need it. It is very small. It is sufficient.
The book really deals with the issues in programming. Modularity, abstraction, state, data structures, concurrency etc. It provides descriptions and toy implementations of generic dispatch, objects, concurrency, lazy lists, (mutable) data structures, 'tagging' etc, designed to illuminate the issues.
Clojure is not an atomic programming language. I'm too tired/old/lazy to program with atoms. Clojure provides production implementations of generic dispatch, associative maps, metadata, concurrency infrastructure, persistent data structures, lazy seqs, polymorphic libraries etc etc. Much better implementations of some of the things you would be building by following along with SICP are in Clojure already.
So the value in SICP would be in helping you understand programming concepts. If you already understand the concepts, Clojure lets you get on with writing interesting and robust programs much more quickly, IMO. And I don't think the core of Clojure is appreciably bigger than Scheme's. What do Schemers think?
I think the Lisps prior to Clojure lead you towards a good path with functional programming and lists, only to leave you high and dry when it comes to the suite of data structures you need to write real programs, such data structures, when provided, being mutable and imperative. Prior Lisps were also designed before pervasive in-process concurrency, and before the value of high-performance polymorphic dispatch (e.g. virtual functions) as library infrastructure was well understood. Their libraries have decidedly limited polymorphism.
Alas, there is no book on Clojure yet. But, to the extent Schemes go beyond the standard to provide more complete functionality (as most do), there are no books on that either. Just docs in both cases.
Learning Scheme or Common Lisp on your way to Clojure is fine. There will be specifics that don't translate (from Scheme - no TCO, false/ nil/() differences, no continuations; from CL - Lisp-1, symbol/var dichotomy). But I personally don't think SICP will help you much with Clojure. YMMV.
Thank goodness it doesn't (and first/rest etc) or speculative destructuring (and many other Clojure idioms) would be impossible.
It is a CL influence to have semantics for many functions when applied to nil rather than generate errors, and, once one knows the rules, it can be used to great effect to produce elegant code. I've mentioned this before: http://people.cs.uchicago.edu/~wiseman/humor/large-programs.html
Also answers: why only some functions are more polymorphic (e.g. into
, conj
, count
) while others are more type specific (e.g. contains?
, assoc
)?
It is an important aspect of Clojure that, in general, performance guarantees are part of the semantics of functions. In particular, functions are not supported on data structures where they are not performant.
You can't just swap out sequences or lists for maps or sets in a program. So, you need to use the right data structure for the job, and then the right functions will be available. When it makes sense, some functions are maximally polymorphic (e.g. seq, or into). But lookup, under any name, shouldn't be, IMO, so it isn't in Clojure. Similarly there is no insert at the beginning of vectors, append to end of lists, lookup for values of maps, insertion in the middle of sequences etc. I simply don't want to provide the tools for people to write programs in Clojure with N^2 performance - that benefits no one. But I've seen such performance degradation in many a Lisp program as people build upon O(n) functions on lists.
If you have a lookup table in your program, please use a set or map, or, if by index, a vector. You will have get, contains? or nth available, performance will be good, and that will be clear from the code.
seq-contains? exists because sometimes, say in a macro, you have a known short sequence and need to test for the presence of something. There's no need to copy it into a set. And people have been rolling their own includes? etc for this job. Now everyone can use seq- contains? for that. Its poor performance is indicated by its name and documentation. Should someone try to take a piece of code based on seq- contains? and try to scale it, they will see that function as an obvious bottleneck.
Your type-check argument is a straw man that doesn't hold up in Clojure practice. People tend to use the right data structures for the job and don't choose algorithms via type checks.
Clojure is not particularly duck-type-y, in general things are not unified by the presence of same-named functions but by underlying shared abstractions. Just because protocols can be used to make things reach many types doesn't mean they should be used to unify things that aren't uniform.