Scala Collections CanBuildFrom explanation by Stefan Zeiger @szeiger

Taken from https://gitter.im/scala/contributors?at=5c0981af80986419d54dd08d

Stefan Zeiger @szeiger:

I'll try to give a high-level explanation (with imprecise types): In order to build something (like the result of 1.to(10).map(identity)) you use a Builder[E, To] to which you add elements of type E and eventually get a result To. That way a single implementation of map can build different result types. In order to get such a Builder you need a factory, i.e. a () => Builder[E, To]. We call this type CanBuild and pass an implicit instance of it to map. You usually don't care who's doing the building (Range in this case) but for the sake of finding an implicit CanBuild you want the source collection to determine the result type To (e.g. calling map on a List should build another List). This means the primary mechanism for lookup of these implicits is the source collection type, which brings us to CanBuildFrom[-From, -E, +To]. Now CanBuild[E, To] is simply a special case CanBuildFrom[Nothing, E, To]. There is complicated machinery in place that gives you a CanBuildFrom[From, E, To] for every sensible combination of E and From. Not only does this give you the factory for your Builder at runtime, it also determines the type To at compile-time. In some cases you don't want the source to determine what to build, that's where breakOut enters the scene. breakOut is a (non-implicit) method that gives you a CanBuildFrom[From, E, To] provided there is an implicit CanBuild[E, To] (a.k.a. CanBuildFrom[Nothing, E, To]). Usually there are many of those because for each source collection type you get one or more implicits to build the matching result types. These implicits would be ambiguous. But if you also add a type annotation for the To type, the complicated machinery is supposed to give you exactly one matching instance. Maintaining this machinery is a nightmare. The smallest change can create ambiguities or mess with prioritization.

There's an interesting aspect I haven't covered yet. Why does CBF have a method def apply(from: From): Builder[Elem, To]? What could it possibly do with the source collection object? Surely if you want to build a List it doesn't matter if you're building it from a Range or from another List. I didn't see the use for this until I worked on the new collections design and had to fix bugs in the old one. Indeed if you consider code like Vector(1,2,3).map(identity) there is no need for it. You implicitly get a CanBuildFrom[Vector[Int], Int, ?] which builds a Vector[Int]. But what about (Vector(1,2,3): Iterable[Int]).map(identity)? The expectation is that it works similar to dynamic dispatch. Even though the return type will be Iterable[Int] we still want it to build a Vector, not a List (the default implementation of Iterable). The CBF will be one of type CanBuildFrom[Iterable[Int], Int, ?] so there is no way to make the implicit lookup determine a CBF that builds a Vector. Instead you get a generic CBF implementation for Iterable which uses the source collection object to create the right Builder. I don't remember the exact names from the old collections, in the new design we call the object that does this an IterableFactory and every Iterable has a field iterableFactory to retrieve it. At one point I fixed the old collections to make this work consistently for all collections of kind * -> * (via IterableFactory) but it's still broken for other kinds (like maps which are of kind * -> * -> *) because the old collections do not have factory types for these kinds. The new design has them (MapFactory in this case), so as long as you only upcast within the same kind you still get the dynamic dispatch.

soronpo/CanBuildFrom.md