You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1995-98 Pizza (written by Odersky) - FP features into Java, led to Generics which became javac
1998-99 GH, javac
2000-02 Functional Nets, Funnel
Motivations for Scala
Grew out of Funnel
Wanted to show practical combination of OOP and FP
What got dropped
Concurrency relegated to libraries
No tight connection between language and core calculus
What got added
Native object and class model, Java interop, XML literals (hides behind podium)
Why ?
Wanted Scala to have hipster syntax
What makes Scala Scala?
functional
obect-oriented / modular
statically typed
strict
closest predecessor: OCaml
Differences: OCaml separates object and module system, Scala unifies them
OCaml uses Hindley/Milner, Scala subtyping + local type inference.
1st Invariant: A Scalable Language
Instead of providing lots of features in the language, have the right abstractions so that they can be provided in libraries
This has worked quite well so far
It implicitly trusts programmers and library designers to "do the right thing", or at least the community to sort things out.
2nd Invariant: It's all about the types
Scala's core is its type system
Most of the advanced types concepts are about flexibility, less so about safety
Goals Safety vs Flexibility / Ease of Use
Scala's initial main goal is to make the type system good enough for people who would otherwise choose a dynamic language (focus on flexibility over safety)
The goal is for Scala to catch up in terms of Type Safety with other typed languages
The Present (highlights)
Emergent Ecosystem
Chart of all the Scala libraries
New environment Scala.JS, no longer experimental, beats native JS in some benchmarks, great interop with JS libraries
Works well because it plays to the strengths of Scala
Libraries instead of primitives
Flexible type system
Geared for interoperating with a host language
Tool improvements:
Incremental compiler, available in sbt and IDEs
New IDEs
Eclipse IDE 4.0
IntelliJ 13.0
Ensime: make the Scala compiler available to help editing
Coursera stats:
400,000 "inscriptions"
Success rate of ~10% (higher than the industry average)
Where is Scala going?
Emergence of a platform
Core libraries
Specifications:
Futures
Reactive Streams
Spores
Common vocabulary
Beginnings of a reactive platform, analogous to Java EE
JDK the core of the Java Platform
Java source -> Classfiles -> Native Code
What are class files good for?
Make your software portable across hardware, across OSs, versions
What's the equivalent for Scala?
Scala piggy backs on the JDK
Adds Scala Signatures for the Scala compiler to link the symbol table to the generated class files
Challenges for Scala
Binary compatibility
scalac has way more transformations to do than the javac
Compilation schemes change
Many implentation techniques are non-local, require co-compilation of library and client (eg Trait composition)
Having to pick a platform
Previously platform is "The JDK"
In the future: Which JDK? 7, 8, 9, 10? And what about JS?
Exploring/ proposing: a Scala-Specific Platform
Scalac compiles source into "TASTY" and then a packaging tool/ linker generates JS / Classfiles
The core
TASTY file format: Serialized Typed Abstract Syntax Trees
<simple statement expanded into complex tree, a TAST>
TASTY trees take up ~25% of classfile size (but carry much more information)
Higher-kinded Types List -> Type with uninstatiated type members List
Type parameters as Syntactic Sugar
example
General higher-kinded types through typed lambdas
example type Two[T] = (T,T)
Two[String]
New concepts
Type unions (T & U) and intersections (T | U)
Make the type system cleaner and more regular (eg intersection, union are commutative).
Post new challenges for compilation
class A { def x = 1 }
class B { def x = 2}
val ab: A | B : ???
ab.x
dotc compiler close to completion, hopefully alpha release by ScalaDays Amsterdam
Plan to use TASTY for merging dotc and scalac
Plans for Exploration
Cleaned up language, new compiler, lets add new stuff, right?
Ideas worth exploring:
Implicits that compose
Already have implicit lambdas implicit x => t implicit transaction => body
What about if we also allow implicit function types? implicit Transaction => Result
Then we can abstract over implicits: type Transactional[R] = implicit Transaction => R
Types like these compose eg type TransactionalIO[R] = Transactional[IO[R]]
New Rule: If the expected type of an expression E is an implicit function, E is automatically expanded to an implicit closure.
that's all you need and with that you can get implicits that compose
Better Treatment of effects
So far purity in Scala is by convention, not by coercion. In that sense, Scala is not a pure functional language (for FP extremists)
We'd like to explore "scalarly" ways to express effects of functions
Effects can be quite varied: Mutation, IO, Exceptions, Null-dereferencing
All hav two essential properties: they are additive, they propagate along the call-graph
Hascalator says "though shalt use monads for effects"
Monads are cool, but for Scala I hope we find something even better
Monads don't commute
Require monad transformers for composition, but this confuses even ODersky!
Use implicits to model effects as capabilities
instead of def f: R throws Exc = ...
use this: def f(implicit t: CanThrow[Exc]): R = ...
or add this type throws[R, Exc] = implicit CanThrow[Exc] => R
In summary
Scala established FP for the mainstream
showed that a fusion of OOP and FP is both possible and useful
promoted the adoption of strong static typing
has lots of enthusiastic users, conference attendees included
Despite it being 10 years old, it has few close competitors
Our Aims
Make platform more powerful
make the language simpler
work on foundations to get to the essence of Scala
Q&A
Q: Will generics be replaced by dependent types or what's the interaction?
A: They will both be supported going forward. Generics can be mapped to dependent types
Q: In the future with type trees, will they be distributed as something other than JARs
A: Let's see what's in Java9 because they are proposing new distribution mechanisms, and he'd like to keep using what the JDK does. Current proposal is the TASTY parts will be annotations in the classfile.
notes by Steven Gangstead
Note: there are 4 sessions at a time so my notes are just for the one session I was able to attend.
Tuesday March 17th
Key Note: Why Open Languages Win
Danese Cooper @DivaDanese
Back story, worked at Apple, Semantec...
Worked at Sun to "open source" Java
Sun didn't want to open source Java, not sure how to monetize it
SCSL - Sun Community Source License, not actually a good license, not open. Everything had to go through Sun
Parlayed that into being the Head of Open Source at Sun. They still didn't want to OS Java, but she got other things out there, like Open Office.
C++ & Java were competing in the market place, Sun's big play was to open source Tomcat by putting it into the Apache foundation
Sun willing to sacrifice a part of Java to keep people from using ASPs.
Lessons learned at Sun Apache started Geronimo => Sun open sourced Glassfish. Apache created Harmony => Sun started OpenJDK project.
Left Sun for Intel. Worked on a project over 3 years to get Linux desktop adoption over Windows adoption.
Learned Windows desktop penetration too deep. Had to compete on new devices.
"If I were to pick a language to use today other than Java, it would be Scala" - James Gosling, 2011 (inventor of Java)
R, language based on S. S came from company Sass, extremely proprietary. R made from academia and is totally open. Everyone knows R now, it's a complete ripoff of S. All quants learn R and all graduate work done in statistics is done in R.
Pie Graph of github pushes, all open languages take up the bulk. Javascript, Python, Ruby, PHP, Java then C++.
Miguel Deacaza - famous for porting C# (and other dot net languages?) into Mono. Microsoft has since open sourced .NET
Node.js, Paypal works on Node, is heavily invested. Initially worried about the io.js fork, but she sees that's where all the engineers from Node.js have gone on to because of Joyent's doing a bad job as a benevolent dictator (Joyent not mentioned by name)
Take aways:
Open source is now a requirement to drive a language adoption
Don't try to monetize it
Have a permissive license
It's quite possible to get it WRONG
Listen to your developers
They just want stuff to work
Open Standards != Open Source
Lots of big companies will tell you otherwise
Open Standards isn't clearly defined like Open Source
Look Richard Stallman's 10 requirements for open source
Open Standards bodies are worried about De Facto standards
Questions
Q: One of the challenges of open source is all the other overhead you have to do in addition to just releasing the source. Are there any good patterns for doing that well?
A: See book Producing Open Source Software (I missed the author) it's free. It is hard, but it's healthy for companies to get a feel for how outside developers do things. So you need to set it up so that the outside world has an equal chance of contributing to how to do things. At Paypal she asks questions for how/ when to open source a project: 1) Does anyone care? 2) Do we still use it (some companies tempted to OS junk they aren't using anymore)? 3) Is there a resource (so people can research it)? If the first 3 things are yes, can we continue to work on the project after we open source it (modularize the "secret sauce" parts and keep them closed while working on the rest in the open)?
Q: How do you overcome company's resistance to OS because they are afraid of devaluing their product?
A: Companies OS for three reasons: for their reputation, to disrupt, (I missed the third thing, fear?). You have to find out which of the reasons is going to motivate a particular company.
Q: How do you finance open source?
A: Dual license has been really effective for a few projects. MySQL traditionally and now MongoDB. MongoDB has a license that makes IaaS difficult without going to get a commercial license. Her favorite is the foundation. It keeps the books open, keeps transparency high and gives everyone an equal chance at contributing. Sponsorship is really hard to do well, she mainly sees it going poorly and it creates a lot of forks. She likes the idea of crowdfunding, but there's a problem with fulfillment. Grassroots suffers when it gets big enough for anyone to care, the monetizing wolves tear it to shreds.
Q: How do you feel about Contributor License Agreements?
A: For all you Californians: the idea of a pre-invention agreement does not apply to Californians because of a lot of case law. Just document that you did the work on your own resources on your own time. Projects are working on updating CLAs to be more permissive while still protecting the project from Copyright claims, possibly getting rid of CLAs. The debate is not over, but the hip projects are looking at no longer aggregating copyrights and having the "developers attest"
Q: At companies that want to open source stuff someday, just not yet because it's not ready or whatever. What do you think is the tradeoff?
A: You have to have at least something ready before releasing. It's too hard to get going otherwise. Outside of that she recommends "release early, release often". That may mean finding parts of code that you want to rewrite and just tagging it as such first and then rewriting it later.
Q: Are open source concepts taking hold in other industries?
A: Yes, I wrote a book about that in 2007, go read it. I've seen it things like science, medicine, extreme sports. Open source is everywhere you want to be.
Scala Collections Performance
Craig Motlin @motlin
Steven node Craig flew through his slides, way too much info to type down at speed. Find his slides online.
Works at Goldman
One library he works on is GS Collections
Doesn't see people using Scala at Goldmen for general code.
Sees people switch back to Java for performance reasons.
Goals:
Scala programs ought to perform as well as Java but Scala is a little slower
Highlight a few performance problems that matter to me
GS Collections (GSC) and Scala collections are similar
mutable and immutable interfaces with common parent
similar iteration patterns at hierarchy root Traversable and RichIterable
Lazy evaluation (view, asLazy)
Parallel-azy evaluation (par, as Parallell)
GS Collections and Scala Collections are different:
Persistent data structures
Hash tables
(other stuff, I couldn't keep up)
Persistent Data Structures
Data Structure that always preserves the previous version of itself when it is modified
Examples, List, Sorted Set
"Add" by constructing new nodes from leave to a new root
Important in purely functional languages
All collections are immuable
Must have good runtime complexity
No one seems to miss them in GS Collections
Proposal mutable, persistent immutable and plain immutable
Mutable same as always
Persistent: use when you want structural sharing
Plain immutable
not persistent
"adding" is much slower
speed benefits for everything else
huge memory benefits
Performance assumptions:
Iterating through an array should be faster than a linked list
linked lists won't parallelize well with .par
no surprises in the results - so we'll skip
Immutable array-backed sorted set
immutable -> trimmed, sorted array
no nodes -> ~1/8 memory
array backed -> cahce locality
Assumptions about contains
may be faster when it's a binary search in an array (good cache locality)
will be about the same in mutable / immutable tree
assumptions not quite correct, immutable sorted set slower but only a tiny bit. He's only interested in 2X or 1/2x differences.
Testing serial arrays, about the same performance as Scala.
Testing parallel-lazy evaluation
Assumption: Scala tree parallilize well & GSC's array should parallelize very well.
Surprise result: Scala collection much slower in parrallel than in serial
Scala's immutable. TreeSet doesn't override .par so parallel is slower than serial.
Some tree ops like filter are hard to parallelize
TreeSet.count should be easy to parallelize using fork/join with some work
-Persistent Data Instructions wrap up:
proposal : mutable, persistent and immutable in same library
Hash Tables
Scala's immutable.hashmap is a hash array mapped trie (pronounced tree)
"achieves almost hash table-like speed while using memory much more economically" - wikipedia
Scala's mutable.hashmap is backed by an array of Entry pairs
Java.util.hashmap.entry caches the hashcode. Takes more memory, but get a speed benefit when resize the array.
GSC's UnifiedMap is backed by Object[] flattened. ImmutableUnifiedMap is backed by a UnifiedMap
Testing: Scala's immutable hashmap memory size increases linerarly, but everyone elses is much lower, with a step size as the array doubles.
Testing hashmap get: Scala Immutable hashmap is way slower than everything else, mutable map performance is similar to GSC maps.
Testing hashmap put: Same results.
HashSets:
Scala immutable hashset is backed by an array
java util.hashset is implemented by delegating to a hashmap
GSC UnifiedSet is backed by Object [], either elements or arrays of collisions
GSC ImmutableUnifiedSet is backed by a UnifiedSet
Memory usage of hashsets: scala immutable hashsets use lots of memory, linearly increasing. scala mutable hashset good performance, increases as array expands.
Primitive Specializaiton
Boxing is expensive
costs for Reference + Header + alignment
Scala has specialization, but most of the collections are not specialzed
If you cannot afford wrappers you can:
use primitive arrays (only for lists)
use a java collections library
Proposal: Primitive lists, sets and maps in Scala
Not Traversable - outside the collections hierarchy
fix specialization on functions (lambdas) so that for-comprehensions can work well
fork - join
[test code for scala, java and GS collections]
[performance results flashed up on screen too fast to understand]
Fork join is general purpose but always requires merge work
We can get better performance through specialized data strcutrures meant for combining.
"Time is a device that was invented to keep everything from happening at once" - Graffiti on a wall at Cambridge University
Newtonion Physics - The simplified model of time is very appealing to us.
von Neumann architecture - single processors running mutable state, has full control of the present.
Concurrency comes along and makes everything difficult
Jim Gray gave us transactions to give us the illusion of order within a transaction to give us our linear time back.
Distribution comes along and makes life miserable again. Transactions don't distribute well.
This is not surprising, the world doesn't work in transactions either. There isn't an absolute single global consistent present
You can construct a local present and work with that.
"The future is a function of the past" - A J Robertson
"The (local) present is a merge function of multiple concurrent pasts" - Boner
[joke involving a foldleft in scala]
Information is always from the past, the present is relative.
The truth is actually closer to Einstein's physics, where everything is relative to the observer
Information travels at the speed of light (we all know that). This puts a cap on the speed of information. Information has latency. Contrary to newton's law.
The cost of maintaining this illusion is increased contention and coherency
Adding participants eventually slows down the system [someone's law]
As latency gets higher, the illusion cracks even more.
Classic quote: "If a tree falls in the forrest ..." - Charles Riborg Mann
Directly affects computer systems because information can get lost and it will get lost.
How do we deal with information loss in real life?
We use a simple protocol of confirm or wait / repeat.
We don't wait for guaranteed delivery
We take educated guesses to fill in the blanks
and if we are wrong we take compensating action
Can we rewrite the past?
Winner writes the history books, the history books even get rewritten
We can do this in CS, but should we?
Usually a bad idea, but we can add more information
There is a path forward
Treat time as a first class construct
What is time really?
It't not wall clock time: hours, minutes, seconds
Time is the succession of causally related events
Embrace this and things fall into place
How to manage time? Thinking in FACTS
Facts have values, they are not variables. They accrue either as new information or derived from previous information.
Immutability is a core requirement
Not a part of classic traditional object orientation
They conflate identity with value
There is a time and a place for mutability, but immutable should be the default
Do variables have a purpose in life?
"The assignment statement is the von Neumann bottleneck of programming languages and keeps us thinking in word-at-a-time terms ..." John Backus (Turing Award lecture 1977)
Mutable state needs to be contained, not exposed to the rest of the world. Only expose immutable values.
How do we manage facts? Functional Programming
"you put facts in and out comes new facts"
Dataflow graphs, model time through data dependencies
First rule of facts never delete facts
Facts represent the past and the past is the only way to the present
Disk is so cheap, there's no reason to delete.
[Long Jim Gray Quote about accountants not altering the books, but taking new notes]
CRUD becomes CRUD
"database is a cache of a subset of the log" - Pat Helland (2007)
Store facts in an event log. The log is like a database of the past.
The log id is like the ticking forward of time
The log allows time travel
Constructing a sufficiently consistent local present means employing consistency mechanisms
an agreement across processes
consistency means employing some sort of coordination
too little coordination can violate correctness, too much means reduced availability.
Inside Data: Our current present / Outside Data: Blast from the Past / Between Services : Hope for the future - Pat lelland (2011?)
Event sourcing - practical tool to capture state changing events in the log. Replay history to reconstruct present.
Queries can be hard
Microservices map to consistency boundaries.
Decoupling in space / time can give you the isolation to have fault tolerance.
In reactive systems this is called Location transparency
Strong consistency - the wrong default
It has an extremely high price
We most often don't need it
Eventual consistency
Loosen up the guarantees and focus on availability
gives us room for scalability
has loose meaning and not as useful. How eventual? How consistent?
Tracking Time is tracking causality, don't rely on timestamps.
lead to write locks
Alternative: Lamport Clocks
gives us global causal ordering between events
Vector Clocks
Partial causal ordering between events.
Logical time allows causal consistency
What consistency guarantees to you really need and when?
Sometimes events go outside your system and are then causally related
Expensive to track all the metadata
Mine for confluence
Your component produces the same set of outputs for all set of inputs
Powerful property, you don't have to coordinate.
ACID 2.0
Associative
Commutative
Idempotent
Distributed
CRDTs - conflict-free replicated data types
Experiences Using Scala in Apache Spark
Patrick Wendell, Databricks @pwendell
Spark is an execution engine for doing large scala data analytics in clusters of machines. Written in Scala, but API's for Java, Python, R.
Most active project in the Apache Foundation
Simple example from Spark REPL (fork of the Scala REPL)
Databricks founded by Spark creaters
Databricks Cloud - basically Spark as a Service
Internal components written in Scala
Datbricks' Overall Impressions of Scala
Using a new Programming Language is like Falling in Love
honeymoon phase gives way to quirks
key to success is investing in the relationship
Why we chose Scala
Wanted to work with Hadoop, which is jvm based and a concise programming interface
Compatible with JVM ecosystem (big legacy codebase in big data)
DSL support
Concise syntax (rapid prototype, but still typesafe)
Thinking functionally (encourages immutability and good practices)
Perspective of a software platform
Users make multi-year investments in Spark
large ecosystem of third party libraries
hundreds of developers on the project (who come and go)
Source compatibility is a big step towards improving this
Binary compatibility still far off in the future
Announcing the Databricks Style Guide
"Code is written once by the author and modified multiple times by lots of other engineers"
it's on github.com/databricks/scala-style-guide
Example: Symbolic Names
Symbolic method names are hard to understand the intent of the functions:
channel ! msg
stream1 >>= stream2
Not as clear as:
channel.send(msg)
stream.append(stream2)
Example: Monadic Chaining
Example with getting a value from a map where there's a nested call of get, get, flatmap, get, flatmap...
Refactored to not be so deep.
Question from the audience What about for comprehensions?
Answer yeah those are fine, makes it more readable.
Subjective rule re. Monadic Chaining:
Do not chain / nest more than 3 operations deep
Non scala devs in particular have a hard time understanding code more nested than that.
Less obvious things that break binary compatibility
adding concrete members to traits
trait person {
def name: String
}
trait person {
def name: String
def age: Option[Int] = None
}
Make this an abstract class and it works instead
Might change in future versions of scala where this won't break binary compatibility
Return Types
Explictly list return type in public API's, otherwise type inference can silently change them between releases.
This is good practice anyways
Verifying binary compatibility
Typesafe code called MIMA, outdated but useful
We've build tooling around it to support package private visibility
Building a better compatibility checker would be a great community contribution
Java API's
Conjecture: the most popular scala projects in the future will have Java API's
because the user base is so much bigger
need to runtime unit test everything using Java
with some encouragement, Scala team has helped fix Java compatibility bugs
Have to do things like avoid some features (default implementations) and return Java collections instead of Scala collections.
Performance in Scala
Understand when low-level performance is important
prefer java collections over scala collections
prefer while loops over for loops
prefer private[this] over private
IDE's - they prefer IntelliJ
Build tools
They used to support SBT and Maven builds.
Now the "official" build is Maven, but they use sbt-pom-reader plugin to support SBT
SBT has improved substantially since they made that decision
SBT & Maven differences boil down to: do you prefer Scala over XML for the language? And SBT-Plugins are better than MOJO plugins (Maven)
Overall he makes a stronger case for SBT over Maven now.
Getting help with Scala
Hipster beginner Scala book Atomic Scala - learning programming in a language of the future . Hipster because it's hard to get a hold of a copy. Second edition published online a week ago.
Scala has a large surface area. For best results, we've constrained our use of Scala
Keeping your internals and (especially) API simple is really important
Spark is unique in its scale, our conventions may not apply to your project
Q: How is their Scala style different from Typesafe's and how do they enforce it programmatically
A: They use Scala Style tool to enforce it automatically. Databricks' guide fills in some gaps that Typesafe leaves in their sytle guide.
Type-level Programming in Scala 101
Joe Barnes, Senior Software Architect at Mentor Graphics, @joescii
New name: Type-Level Programming, The Subscpace of Scala
Not an expert on the subject, here to share his Aha! moment
Programming in Scala is like Super Mario 2
Normal value programming, there's a lot of stuff that will kill you, a flask of type programming and a door appears and it goes into the bizarro world. That world is a subspace.
Basic stuff
Value programming
val num = 1 + 2 + 3
happens at run time
lazy val str = "a" + "b" + "c"
happens later, when you access it
'def now = new java.util.Date'
Even lazier than than lazy, happens even later because it happens every time you access it
type MyMap = Map[Int, String]
Like structs back in C, happens in the compiler
Code examples Defining boolean values with traits.
Can create it in types, instead of a case object for FalseVal it's now a trait called FalseType.
Everything is the same but we're replaing def and val with type. This is moving everything into compile time from run time.
Example working with Options, you can put them in for comprehensions, map them.
Example in a for comprehension of Options, when one fails, you don't know which one.
Scalaz has Maybe[A] which is similar to Option but it is invariant (instead of covariant)
Working with Try example. Very similar to Option except you have a Failure case (which contains a throwable) instead of a None.
Tip can incrementally migrate Options to Trys. Take the piece you are working on and put it in a try, then at the end put .get and it will throw the thing it was going to throw anyways.
Try only catches non-fatal exceptions.
Cool kids on scalaz use a Disjunction (Scalaz' version of Either) \/ (note: not a V)
Gives you explicit exception types.
/ has .fromTryCatchThrowable method to catch only specific exceptions
[ten slides of category theory, I lost focus]
Monadic Laws
Left identity
Right identity
Associativity
Lists of Monads. Start with a List of names, map and use a try in the map and you have a List[Try[Person]], but you really wanted Try[List[Person]], so sequence!
val people: Try[List[Person]] = peopleList.sequence
Sequence will only return the first failure
Futures
Example of two futures that take different times to complete.
Every operation of a future takes an execution context, you can't force it to complete. You can only wait (possibly for a timeout)
Execution contexts are basically thread pools
Example to get the first Success from a list of futures.
Scalaz version of this is Task
Will attempt to reuse thread and wait until you tell it to run
Tasks also have many features defined for you already (like gatherUnordered)
Co- / Contra- / In- varience
Most collections are covariant
Example of how covarience gives you some wonkiness in everyday collections
Summary
We should all strive for better tools, but until they get here, we must also improve the ones we were given.
Akka in Production: Why and How
Evan Chan, Socrata Inc., @Evanfchan
Reactive applications - event driven, scalable, resilient and responsive
For most people this means akka and play
Lots of companies and frameworks are using Akka
Ingestion Architectures with Akka
Typical Akka stack:
We want some standard behavior around actors -- but we need to wrap the actor Receive block:
Start with a base trait trait ActorStack extends Actor { ...}
then wrap your receive block with functionality
your actor code is still nice and clean and you get your boilerplate functionality mixed in easily
Productionizing Akka
Akka Performance Metrics
define a trait that adds two metrics for every actor
frequency of messages
time spent in receive block
all metrics exposed via a spray route / metricz
daemon polls /metrics to aggregate data
[examples of charts you get with this data]
VisualVM and Akka
Bounded mailboxes = time spent enqueueing msgs
A way to provide backpressure
Stack traces don't work for akka apps [example]
What we want is a way to track the message flows
Trait sends an Edge(source, dest, messageInfo) to a local Collector actor, Trakkar
Aggregate edges across nodes then graph
Akka service discovery
Akka remote - need to know the remote nodes
Akka cluster - need to know the seed nodes
Use zookeeper or /etcd
Be careful - akka is very picky about IP addresses. Beware of AWS, Docker, etc. Test, test, test.
Akka instrumentation libraries
kamon.io, uses aspectj to "weave" in instrumentation, metrics, logging, tracing
akka-tracing, zipkin distributed tracking for Akka
Backpressure and Reliability
Backpressure - the ability to tell senders to slow down / stop
Must look at entire system, individual components having flow control does not mean the system behaves well
by default, actor mailboxes are unbounded
using bounded mailboxes
when mailbox full messges go to DeadLetters
mailbox-push-timeout-time: setting for how long to wait when mailbox is full
doesn't work for distributed systems
Real flow control: pull, push with acks, etc
works anywhere but more work.
Backpressure in action
a working back pressure system causes the rate of all actor components to be in sync
witness this message flow rate graph of the start of event processesing (they all go at the same rate)
Akka streams
very conservative - Pull Based
consumer must first give permission to publisher to send data
Backpressure for fan-in
multiple input streams to go single resource (DB?)
may come and go
pressure comes from each stream
Three messages Register, Ready for data, Data
High overhead: lots of streams to notify "Ready"
At least once delivery
let every message have a unique id, ack returns with unique id. What happens when one message doesn't get acked.
Resent unacked messages until confirmed== "at least once"
Requires keeping message history around
unless souce is Kafka then just replay from last succcessful offset + 1
Use akka persistence - has at least once semantics
Combining fan-in and at-least-once
let the client have upper limit of unacked messages
Messages are Msg ###, Ack ### and Reject.
Use an actor to limit # of outstanding futures
[example code] - keeps memory from filling up with futures
Good Akka development practices
Don't put things that can fail into Actor constructor
default supervision strategy stops an actor which cannot initialize itself
instead use an initialize message
learn akka testkit!
The Scalactic Way
Bill Venners
Scalactic grew out of Scalatest
Scalatest - quality through tests
"The Scalactic Way" - quality through types
SuperSafe - quality through static analysis
Scalactic Anyvals
PosInt, PosLong ...
PosInt(1) works, but PosInt(-42) caught at compile time.
val x = 1; PosInt(x) Can't be caught at compile time so you have to do PosInt.from(x) to get an Option[PosInt]
They are called AnyVals because at runtime PosInt is just an Int
val x: PosInt = -1 uses a macro to do implicit conversion
PropertyCheckConfig has a bunch of requires to ensure you have a valid PropertyCheckConfig, but they are checked at runtime
Changed in Scalatest 2.3 to use AnyVals and there is less code, and it's caught at compile time.
This also reduces some other run time requires and assertions
Can roll your own Compile-time assertions
requires writing a Macro.
Macros can be hairy, but they have a ways of making that easier for you
Example: half function with a require assertion on the input to be even
replaced with a EvenInt type
Now it's just tested in one place that the type satisfies it's requirements instead of in requires and assertions all over your code.
The Scalactic way - Use types where practical to focus and reduce the need for tests (both assert and requires)
Sits on comfy chair to tell stories
ScalaTest 2.0 -> 3.0 He attempted to add typesafe equality but comes at the cost of complexity
In the end he decided the added compile time to get better equality type checking was not worth it, and they're not releasing it.
This halted effort combined with some forum discussion with Odersky gave way to the solution being done in the SuperSafe project which is doing static analysis instead of type checking, and doesn't even make the compile time predictably larger.
This is the gordian knot solution where types were not the right solution
TypeSafe contains were also another tricky problem to do with Scalactic, but was much easier to do with SuperSafe
Lesson: the type system does not offer the best solution to every problem
The SuperSafe Way
Can run all the time
Doesn't hurt compile time
No warnings, only errors
Not a linter; a "scala subset policy enforcer"
Free as in free beer, but has premium licensed version of $60 / seat / year
Q: Does super safe affect the run time binary?
A: No, other than the code changes it causes you to make.
notes by Steven Gangstead
Note: there are 4 sessions at a time so my notes are just for the one session I was able to attend.
Wednesday March 18th
Announcements
The Scalawags (Josh Sureth and Daniel)
Code Mash - Family friendly conference in Ohio. Hosted at a water park.
http://www.codemash.org/
Keynote: Technical Leadership from wherever you are
Dianne Marsh, Director of Engineering Tools at Netflix, @dmarsh
Own It - own the decision making process, as if you own the company, and you own your job satisfaction
What is Leadership - Kids on a playground form leadership organically. Writing out the rules and having people sign them isn't leadership. Leadership is understanding the difference between leaders and management.
Leaders aren't appointed.
Are early adopters leaders? Not necessarily. Early adopters are learning and sharing what they're learning. It's compelling to think of early adopters as leaders, often times they are, but it's not something we want to blindly follow. Assuming EA are always leaders leads to "Shiny Object Syndrome"
Leaders emerge from great organizations - the more your opinions are valued the more freely you give them.
You Ride - You Decide - as a manager you don't want to make decisions for your team that limit their success. Showed a picture of coworkers mountain biking - they are the ones riding down the mountain, don't want a manager that affects that and doesn't ride.
Value courage - leader has to have courage to deliver unpalatable news
Netflix Culture: Freedom & Responsibility
Responsibility doesn't get as much press as the Freedom part, but it's integral
One of the most important things that leaders bring to the table is Vision
If you own a company, you'd better understand where the company is going
Strategize - Leaders spend a lot of times figuring out how to get that vision to reality
Communicate - If you don't communicate an idea it's as if you never had it
Great leaders Inspire their team / community. If you don't inspire your team / community they will start following someone who does
Remain Flexible - You have to have a plan, but you don't have to act on it. The strategy should evolve as new information is revealed
Listening is one of those things that's really hard to do well. Leaders have a proclivity to dominate the conversation and not give the silent people a chance to join the conversation.
Story - some people process information differently. They're deep thinkers and need time to process info before they will give their opinion. She started giving them information before the meeting so they could prep ahead of time.
Managing Up - leaders job is to make other people look great. Give people under you all the information they need to make good decisions
Managing Sideways - similar but with the distinction that when you are talking about your peers you know their strengths and goals. How can you help improve the situation for everyone on the team?
Feedback - the biggest gift you can give someone
stories about people soliciting "360" reviews from their friends, daughters, etc because they value feedback so much
Challenges of Technical Managers - technical managers feel like they aren't contributing if they don't get their hands on some code and that's not always the best thing to do.
"I didn't get anything done today, just went to the meetings" - But that's actually your job as a technical manager
Technical managers still need to exercise their technical muscle
Ways to do that:
Internal Hackathons
Off critical path projects (not on any timeline, usually something like internal tools)
Learn by listening - podcasts
Where do you recharge?
Winter Tech Forum (formerly Java Posse Roundup)
Find your Place to go to immerse yourself and refresh, engage
Types of leaders and what they do:
Project leaders:
project management groups
follow through
communicate with team
People leaders:
give honest, frequent feedback
protect the company culture
help recruit coworkers
Idea Leaders:
Speak at conferences
Write
Organize Internal Events
Scala Leaders:
Be welcoming of newcomers
Recognize there's not Just One Way
Respect the journey - help new developers
"You can't spend half a career as someone else's employee and then, suddenly, one day, start thinking like an owner. Think like an owner from the very first day of the job."
The Unreasonable Effectiveness of Scala for Big Data
Dean Wampler @deanwampler
Claimed "Hadoop is the Enterprise Java Beans of our time"
Hadoop history
Started to get traction mainstream around 2008 when Yahoo posted an article with some big numbers for how much data they were processing.
HDFS architecture explained
Example: Inverted Index
Web crawlers index a bunch of pages, map reduce job makes an inverse index where the keys are the words and the values are lists of tuples of the files containing the word and the word count
Problems
Hard to implement anything more than simple algorithms in map reduce
Hadoop API is horrible
Example Java code:
lots of method calls just to set properties, lots of other ceremony to declare the types and get the code set up. The actual core of the algorithm is only a few lines and even that is more complicated than it needs to be.
You get lost in trivial details, you implement everything that matters yourself
This is the state of things around 2011/2012
Twitter had this problem and was using Scalding, a Scala API, which was based off of Cascading (Java) which was based off of map reduce
The same example is now less than half the code
Code you write uses basic Scala methods flatMap, groupBy and doesn't have much ceremony.
This still has problems since it is using Map Reduce
Still uses a bad mapreduce api
Only works in "Batch mode"
What's next: Event Streams
Storm
You have to do all your logic and querying twice: once in the batch layer and once in the streaming layer
Twitter came up with an API Summingbird that sits on top of Scalding and Storm
Spark - answers the problem of having one api for batch and streaming mode
very concise, elegant functional api
flexible for algorithms
composable primitives
Efficient: builds a dataflow Directed-Acyclig-Graph and caches it in memory
Process streams in "mini batches"
Reuse "batch" code
Adds "window" functions - a stream is just a very small batch over a short window of time (down to 1 second)
Inverted Index example in Spark:
Spark code looks just like Scala collections code with a few additions for big data conventions like reduceByKey, groupByKey, mapValues
Shows how great the Spark API is to work with
This could have been something that Clojure could have taken over
but Clojure didn't really go after the community like Scala did
and Scala is an easier transition for Java developers
The point is not that Scala is so great, but that Functional Programming is so great
Working with Data is Mathematics
Numbers come in, you work on them and numbers go out
Mathematics libraries in Scala
Algebird
Addition
Associativity explained, important when you are adding billions of numbers
Identity explained
Generalize addition: Monoid
A set of elements, an associative operation, and an identity function
Other monoids
Top K
Average
Max/Min
This break down at Twitter scale
But approximations are ok
Algebird lets you tune performance vs accuracy
Monoid approximations:
hyperloglog for cardinality (how many items in set)
minhash for set similarity
bloom filter for set membership
... and more
"Hash, don't Sample" - mantra at twitter
Hash uses all of the data, unlike sampling
Spire
Numeric library
Some overlap with Algebird
SQL is also functional
also has functional combinators, order by and group by
Example collecting stats on airline flights, taken from a Typesafe training course
Conclusions
Scala has won in big data
Akka HTTP: The Reactive Web Toolkit
Roland Kuhn, Akka Tech Lead at Typesafe, @rolandkuhn
Akka Http is the bridge between actors and an http client.
It's the port of the Spray project, which came out of Akka in the first place
It's "Spray 2.0" but with quotes because they changed the actor based model to a stream based model.
Live demo
In addition to an actor system you need an ActorFlowMaterializer()
Source is a blueprint for what shall happen (for a file, list, http connection or whatever it's a source of) but doesn't actually run it. It's a source of an event stream.
Sink is the end of an event stream.
When you call .run() on a stream with a source and a sink it turns them into Actors
ActorFlowMaterializer() is what turns the stream with a source and sink into actors, I guess there are other materializers you can turn streams into.
Sources can be composed from other sources with an implicit builder function.
He makes a composite source by zipping together two other sources
It looks like all the Actors are under the covers, with the actorflowmaterializer, you don't actually write any.
He had been using Thread.sleep to make it artificially slow. Now he changes map to mapAsynch and the after pattern in Akka. These are executed in parallel and show up in order, but it buffers to groups of 4.
In order to asynchronously have a stream complete one future per second he creates a Flow with the OperationAttibutes.inputBuffer set to 1 and inserts that flow into the previous stream with a .via(...) stage
We can also make Http connections.
Create an StreamTCP() and bind it to an address and you get a source of incoming connections
To run this source we add a new stage .to(...) and put a Sink in there. In the Sink we join the connection's flow to a ByteString flow and run it with .run()
Then he creates an outgoingConnection from the address created by the StreamTCP
Then he binds a list to it and when it runs the program sends the string to itself via an http connection and prints out the list as a bytestring
End Demo
API Design
goals: no magic, compositionality
Why do we add an http module? Akka is about building distributed applciations
distribution implies integration
between internal sub-systems => actors (akka remote)
loosely couple systems => http is the lingua franca
Stream pipelines
SSL stage is almost but not quite ready
HTTP Live demo
Defines a route with path directives, looks just like Spray routing
Bind that route to an Http() connection. Looks just like Spray-can
Adds an upload path to the route and gets the entity as a Source and sends it to a previously defined Flow.
Q: Release schedule?
A: RC1 hopefully in 4 weeks
Q: Will HTTP2 be a problem?
A: We have a proposal for how to handle some of the new streaming
Q: Will Akka remoting be based on streams?
A: Yes, it will simplify the remoting layer a lot