Scala 2.12 is the new big thing! It has 33 near features, but more importantly it contains optimizations for Java 8 using Java 8 language features such as lambdas and default methods, while still being backwards compatible with Java 6.
Scala 2.13 more focused on improving scala’s built-in libraries, including the collections library. Particularly, it will become more inline with spark collections and more have more robust lazy collection support. There are also plans to separate Scala into a core and platform module.
Neat things happening with Scala:
- Scala.js
- Scala native
- Dotty compiler
A lot of effort has gone into creating DOT (Dependant Object Types) and a DOT calculus. With DOT calculus, formal statements can be made and proven and therefore be used to reason about the correctness of certain language features. The rest of the language can be encoded in it.
Dotty is double the speed and half the size of the current Scala compiler! It uses TASTY (typed ASTs) as an intermediate representation, which allows more compiler insights and optimizations.
Things that will be removed:
- procedure syntax
- macros
- early initializers (in favor of traits parameters)
- existential types
- general type projection (but class projection stays, C#T)
New features:
- Intersection types (
T & U
) replaces with -- Unlikewith
, it is commutative! - Union types (
T | U
) avoids huge lubs - functional arity adaptation
- trait parameters
- static methods and fields (mainly for Java interop)
- non-blocking lazy vals (Originally, they would lock object during the evaluation of a lazy val, which can cause deadlocking. Instead, lazy vals will be made thread local. The
@volatile
tag can be used if we want thread-safe lazy vals) - multiversal equality, which compile time type checking
- Named typed parameters
- Scala.meta -- Scala.meta will be a more principled approach to metaprogramming and macros -- eliminates a lot of boilerplate
- an effect system (side effect checking using implicits) (A new model of doing side effects by passing implicit parameters)
- null safety - model nullability as
|
to make it pure - generic programming that can abstract over arity, basically faster
shapeless
- better record types - also shapeless inspired and implemented via hashmaps
@infix
to denote infix operators, such asmin
. Everything else will be required to use the dot notation.
Scala is functional, strict, and pragmatic. Scala is a great foundation for implementing cool new PL advancements
A neat new way to discover new libraries! Try it out.
In order to build fast Scala code, we need to find out why code is slow. Scala code is slow for two reasons:
- the libraries are inefficient
- user code misuse an efficient libraries in inefficient ways
An example:
Scala average
implementation is much slower than Java.
Java is just 3 checks per iteration.
Scala implemented in terms of foreach, working over boxed objects.
Boxing is very expensive, a single boxing (allocation) costs as much as 5 dynamic dispatches, 15 static dispatches, and 20 additions.
But this is because Scala has a lot of useful features to help programmers be productive, but JVM doesn't support a bunch of them. The compiler is forced to do conservative lowering to the most common object to make sure the code still runs correctly, and this causes Scala to run slower.
An alternative is having different methods for each primitive type.
BUT Scala has 9 primitive types. With n
type parameters, we would have 10^(n+m)
specializations!
We can use the @specialized
annotation to mark arguments to be specialized.
Another alternative is @miboxed
, which includes some specialization but it has substantial slowdowns for arrays.
However, all specialization breaks modularity, because some choices have to be made about what to specialize. This requires people to know in advance what has been specialized in order to write performant code!
Instead, we can analyzes how the code is used and specialize specifically for that code based!
Performance becomes closer to Java, with a hit in compilation time.
- Slows down compilation
- Requires dependencies to have TASTY
- Does not help if library does tricks e.g -- depends on boxing behaviour, -- uses null as a special value -- has a typed API but internals are completely untyped (casting everything to Any then recast back)
We can also perform user code optimizations by replacing inefficient code with efficient implementations. In an ideal world, we would be able to code without worrying about efficiency immediately, or choose more maintainable implementations without taking a performance hit. (e.g. using a global reduce on a collection)
We need library specific optimizations: For example:
- Replace linear time
x.size
withx.isEmpty
- Rewrite any division by a power of two by a bit shift
- Merge multiple filters into a single filter
There are naturally some complications, and we can’t make optimizations blindly. For example, we can’t necessarily join two filters because there may be side-effects in the filters.
The linker has to check if the functions are PURE, in the sense that there are no OBSERVABLE side-effects.
Ideally, we would also have custom warnings and error messages for non-sensical code, such as:
collection.toPar.reduceLeft
Let’s not write things twice when we don’t have to!
Scala.js can be used easily with an sbt workflow: sbt fastOptJS
produces the javascript files and the sourcemaps.
Scala.js provides us with everything we need to work with front-end:
- Dom manipulation
- Type safe CSS and HTML
- Type safe client server interactions!
It can be used with wrappers for existing client libraries such as React and angular
But is Scala.js all or nothing? What about mixed teams? Should front-end teams learn Scala?
Instead, we can create services that compile to JS that can be consumed by the front-end team. The services can have type-safe AJAX calls.
How to use actor model within Scala.js
- code re-use
- code portability
- high modularity
- same programming model everywhere
- transparent communication (between platforms)
- concurrency management
https://github.com/typesafehub/akka-js
The Scala.js compiler has three parts:
- A compiler plugin (around 5000 loc): transforms the scalac tree into an intermediate representation
- Emitter: Takes the IR and generates JavaScript
- Optimizer (around 4418 loc)
The optimizer is the most interesting! Usually, optimizers are applied until it reaches a fixed point. However, the scala.js optimizer is single-pass.
As a result, it has to be a little clever to not miss optimizations. Most optimizations go through a pre-transform phrase, which is a virtualization of the transformations. It attempts to optimize, and if there are failures then it rolls back and attempts the next best optimization. Backtracking is enabled by using CPS (continuation passing style),
Some improvements include optimizing multiplication to binary shift and tuple destructuring. Operator always works the same but built in methods you have to verify whether or not they've been overwritten.
- What is scalability?
- What characterizes a scalable architecture and design?
- What characterizes a scalable architecture and design?
- What is perfect scalability?
Performance and scalability are related, but not the same. Increasing scalability means increasing the number of requests handled. Increasing performance means processing the same load in less time, but this does not necessarily increase the number of requests handled.
When you add more resources, we can handle higher load with a linear relation.
- No state
- No contention (i.e. share nothing)
- Independent computations
BUT this is still flawed. If we have a single HTTP namespace, which might mean a single hardware node balancer, and we might eventually hit network issues. The bottleneck is the shared hardware.
We can fix this by breaking up the namespace, that is, by sharing less!
With Amdahl’s law and Gunther’s law, we know there are fundamental lower limits to a design. In fact, additional resources will result in a decrease in ability to handle the load not an increase (coordination time keeps increasing and increasing). The best we can do is prevent it from becoming a negative return, and try to aim for no return instead.
- Contention/Shared resources -- We can avoid sharing using eventual consistency event sourcing CQRS, and only keep private state. The state will be updated using the delta from domain events. -- However, this requires communication and therefore creating overhead.
- Communication is another enemy of scalability! --See it as a cost. We should limit communication, especially point to point communication, which is a form of coupling.
- Ordering -- sequence leads to shared state, and leads to contention -- stay commutative
- Linear time sequences -- Use fsm and single-use actors to avoid linear processing -- Communication between services must be a sync and non-blocking
Designing for perfect scalability must be done upfront. Build services designed to adhere to these principles.
- Elastic -- Elasticity allows reduction in cost when possible -- Spike Load is not solved by being scalable - you system needs to be elastic and predictive
- Command sourcing -- Command is a request that can fail -- Event is something that already happened -- If we persist commands and handle them asynchronously, our load can spike above and we can always queue/retry
- Degrade gracefully -- "An escalator can never break it can only become stairs"
- Microservices
- Simple is good -- Simple patterns, consistently applied are easier to scale
- No Global "now" -- Causal ordering is better than clock time ordering -- Worst kind of co-ordination is temporal
- Persistence is (not) futile -- Often systems have too much persistence -- Only required when you need to recover!!!
- Don't share databases! -- If your services share a database, your database is a monolith
- Distributed transactions are anti pattern. Don’t stop the world!
- Idempotent avoids the need for sequence to some degree because it helps events to be handled in any order -- It also avoids the need to persistence because we can reprocess using the original command
Monitoring is SUPER IMPORTANT, the log is not enough. However, monitoring is also a cost, so we must be prudent.
Perfect scalability is achievable -- but not without design. Avoid the enemies of scalability, and find patterns that don’t use the enemies of scalability. We also must monitor and adjust during runtime.
Compile-time language-integrated queries
Functions converted to AST converted to normalised SQL query string. Normalisation happens in the query (using the quote { … }
function so running each query string is as performative as running vanilla SQL.
You can abstract over predicates and pass parameters at runtime.
Lifted embedding builds a Slick AST that reifies the computation. The AST is then compiled into SQL.
Toy slick implementation: szeiger/slick/tree/toy-slick-scaladays2016
Lifted: every type T are lifted in Rep[T] Embedding: because it's embedded in Scala
Given some type parameters, we can do an implicit search to resolve another unknown type, given unambiguity.
CanBuildFrom
uses functional dependencies, as it is uniquely defined by its first two type parameters.
- It should be comprehensible -- X86 for example is not
- But it should also be consise -- lambda calculus which is turing complete, but made up of very few rules. However, it is not concise and still hard to read.
Programs written must be quick and easy to understand
For a distributed, location transparency is very important, i.e. it will run correctly no matter where it's deployed and regardless of its relationship with other nodes.
The actor model does this very well...but there is a problem.
The actor system is not composable.
The receive
blocks of each actor will override the other. We can try using multiple receives, but messages intended for one receive may be prematurely caught by another.
Working with the JVM can be hard.
- Making benchmarks is hard because of the warm up time
- It can be TOO safe -- if you want to manipulate RAM then it becomes really really hard to use.
- interop with anything that is not on the JVM is hard.
- The code runs immediately without a warmup period
- Lower-level data structures such as structs allocated to stach
- More control with memory management
- Easier calls to other languages
Imagine rewriting something in C++ to Scala, which uses vectors. But suddenly it becomes MUCH slower and consumes memory like crazy because vectors exert extreme memory pressure.
However if we had structs, we would no longer allocate it on heap and it would be much faster because there would be no need to GC.
It is a LLVM-inspired compiler and produces native binaries It can optimise tail calls, even mutual tail calls
Some questions:
- Is it the same language? Yes, mostly. Except with extra low-level primitives.
- Is it just a back end? Not quite.
- Will it use GC? Yes for now. A slower one than JVM GC but it can only get better. -- Of course, don't need GC all the time if you use lower level constructs.
- Hardware support? 64-bit Intel.
- Libraries? All java libraries have been ported to make them work and make it the least surprising experience.
- When? Developer preview in the near future.
- How can you profile GC? GC is just a library so you can just profile it as you would normally.
Meta-programming: takes programs as input, gives programs as output
- unified and extensible data structure
- immutable and higher order
- configurable visualization (Graphiz)
- functions (lambdas) as DAGs
- virtualized user-defined types
- domain-specific isopmorphisms
- domain-specific converters
You can really define your own types as first-class citizens. not just prettyfying
- based on standard Scala compiler
- systematic transformation to make code abstract
- Process an AST of a source code
- Associate nodes of the AST with calls of a virtualised API
- Generate virtualised code containing only the calls of the virtualised API
Instead of making method calls to data values, make calls to symbols (nodes of the graph) that are mapped to some data (or anything?!) Staged evaluation can be understood as a self-reproducing process When program is stage evaluated, it reproduces itself in a graph-based IR
- Vendor neutral AST (tree interchange format), not Intellij PSI trees, nor Scala internal AST.
- Trees are designed such that no syntactic details, such as formatting and comments, will get left out
- Contains a new abstraction: tokens, which represent elementary parts of Scala’s grammar, such as white spaces/comments/identifiers. Each token has a bunch of associated metadata, such as the location on the line.
It’s very easy to use:
import Scala.meta._
"x + y".parse[Term]
Because macros are bad.
- they have a lot of boilerplate
- Intellij needs out of band support from compiler to support macros
- macro code changes won’t triggen an sbt recompile
BUT they still enable unique functionality that is leveraged by many library authors. So we can’t just get rid of it!
A new future for macros! With a lot of thought and work, it was found that complexity with macros was largely incidental. There are two orthogonal concepts in the essence of macros:
- Meta programming at compile time
- Inlining code at a call site
- Scala.meta will have a lot less boilerplate
- Scala IntelliJ plugin for macros -- with in-editor expansion of macros -- works by converting PSI trees into Scala.meta trees, and thrown through Scala.meta to expand it.
Will be replaced with a better version based on Scala.meta Easier to write with better IDE support
- Dotty linker can use Scala.meta to implement rewrite rules
- Codacy (static code analysis tool for github/bitbucket) uses the output of Scala.meta parser, since it produces a very precise model of the Scala code.
- Scalafmt (code formatter for Scala) -- Works by breaking up a line into tokens and inserting/deleting things are needed
"Combining simplicity, power, and a certain ineffable grace of design" Intuitive, readable code Typesafety - robustness and confidence Often typesafety and readability don't combine well
- Scala is more complex, has more features
- transcends different user abilities; can be used quickly by beginners and also can do pretty advanced stuff
- inventiveness is encouraged
- easier to learn, use, maintain, understand
- easier to compose
- avoid polluting public APIs with types and terms not intended for end-users; hide details if it's not relevant, keep internals internal
- expect users to want to use wildcard imports; make sure everything in your package is relevant
- use fewer, generic methods instead of many specific methods
- make use of private and protected modifiers
- use typeclasses instead of overloading
- nest things more deeply
- classes, types, values, methods all need names
- name should communicate something
- consider: does the method implement a familiar concept?
- short names are good for pervasive types/methods
- if in doubt, use longer names
- especially important for implicits to avoid accidental shadowing from duplicate names
- see Haoyi Li's blog post Conciseness and Names
- empowers us with constraints
- types give us confidence to reason about our code
- avoid primitive types like Int and String; use types to express semantic ideas
- avoid structural types like Option, Either, tuples (use case class instead)
- really low bar to introducing new types
- promote values to types where possible; let the compiler help you!
- use many of the same principles that apply to UX design
- your users are all programmers
- users have different abilities and expectations
- take advantage of familiar expectations (use empathy)
- but educate where necessary (pre-empt common misunderstandings, explain if you are going against standard practice)
- keep boilerplate minimal
- each line should be significant and meaningful
- casual users should be able to understand what code does
- draw parallels with the real world, try to associate real things with your code
- consistency and standards
- error prevention
- recognition, not recall (recognise what a method does by its signature and shape, not by its name)
- flexibility and efficiency of use
- aesthetic and minimalist design
- optimize for the use site
- write some sample code you would like to compile
- then try to write the definitions to make your samples compile
- only compromise on sample code when all possibilities in the library layer have been exhausted
- it's like test-first... but different
- OO:
is a
andhas a
- implicits:
is viewable as
- a different use for the object - examples: implicit execution context in Futures, implicit sender in Akka
- consistency is overrated
- businesses used to run OK without ACID
- ask: "how much time would be acceptable to allow between the consistency of this data or the consistency of this other data?"
- eventual consistency: happens organically, usually matches business needs
a community of projects and individuals organised around
- pure, typeful, functional programming in Scala
- independent, free/libre and open source software; open, accessible, model for best practices
- a desire to share ideas and code; recognise that people want to use software with other people, other projects
- accessible and idiomatic learning resources
- an inclusive, welcoming, and safe environment
http://typelevel.org http://github.com/typelevel/general http://gitter.im/typelevel/general