machine learning systems

by jeff smith

part 1: fundamentals of reactive machine learning

chapter 1: learning reactive machine learning

introducing the components of machine learning systems
understanding the reactive systems design paradigm
the reactive approach to building machine learning systems

summary

even simple machine learning systems can fail
machine learning should be viewed as an application, not as a technique
a machine learning system is composed of five components, or phases:
- the data-collection component ingests data from the outside world into the machine learning system
- the data-transformation component transforms raw data into useful derived representations of the data: features and concepts
- the model-learning component learns models from the features and concepts
- the model-publishing component makes a model available to make predictions
- the model-serving component connects models to requests for predictions
the reactive systems design paradigm is a coherent approach to building better systems
- reactive systems are responsive, resilient, elastic, and message-driven
- reactive systems use the strategies of replication, containment, and supervision as concrete approaches to maintaining the reactive traits
reactive machine learning is an extension of the reactive systems approach that addresses the specific challenges of building machine learning systems
- data in a machine learning system is effectively infinite. laziness, or delay of execution, is a way of conceiving the infinite flows of data, rather than finite batches. pure functions without side effects help manage infinite data by ensuring that functions behave predictably, regardless of context
- uncertainty is intrinsic and pervasive in the data of a machine learning system. writing all data in the form of immutable facts makes it easier to reason about views of uncertain data at points in time. different views of uncertain data can be thought of as possible worlds that can be queried across.

chapter 2: using reactive tools

managing uncertainty using Scala
implementing supervision and fault tolerance with Akka
Using Spark and MLlib as framenworks for distributed machine learning pipelines

summary

scala gives you constructs to help you reason about uncertainty:
- options abstract over the uncertainty of something being present or not
- futures abstract over the uncertainty of actions, which take time
- futures give you the ability to implement timeouts, which help ensure responsiveness through bounding response times
with Akka, you can build protections against failure into the structure of your application using the power of the actor model:
- communciation via message passing helps you keep system components contained
- supervisory hierarchies can help ensure resilience of components
- one of the best ways to use the power of the actor model is in libraries that use it behind the scenes, instead of doing much of the definition of the actor systems directly in your code
Spark gives you reasonable components to build data-processing pipelines:
- Spark pipelines are constructued using pure functions and immutable transformations
- Spark uses laziness to ensure efficient, reliable execution
- MLlib provides useful tools for building and evaluating models with a minimum of code

part 2: building a reactive machine learning system

chapter 3: collecting data

collecting inherently uncertain data
handling data collection at scale
querying aggregates of uncertain data
avoiding updating data after it's been written to a database

summary

facts are immutable records of something that happened and the time that it happened:
- transforming facts during data collection results in information loss and should never be done
- facts should encode any uncertainty about that information
data collection can't work at scale with shared mutable state and locks
fact databases solve the problems of collecting data at scale:
- facts can always be written without blocking or using locks
- facts can be written in any order
futures-based programming handles the possibility that operations can take time and even fail

chapter 4: generating features

extracting features from raw data
transforming features to make them more useful
selecting among the features you've created
how to organize feature-generation code

summary

like chicks cracking through eggs and entering the world of real birds, features are our entry points into the process of building intelligence into a machine learning system. although they haven't always gotten the attention they deserve, features are a large and crucial part of a machine learning system
it's easy to begin writing feature-generation functionality. but that doesn't mean your feature-generation pipeline should be implemented with anything less than the same rigor you'd apply to your real-time predictive application. feature-generation pipelines can and should be awesome applications that live up to all the reactive traits
feature extraction is the process of producing semantically meaningful, derived representations of raw data
featuers can be transformed in various ways to make them easier to learn from
you can select among all the features you have to make the model-learning process easier and more successful
feature extractors and transformers should be well structured for composition and reuse
feature-generation pipelines should be assembled into a series of immutable transformations (pure functions) that can easily be searialized and reused
features that rely on external resources should be built with resilience in mind

chapter 5: learning models

implementing model-learning algorithms
use spark's model-learning capabilities
handling third-party code

summary

a model is a program that can make predictions about the future
model learning consists of processing features and returning an model
model learning must be implemented with an expectation of failure modes (for example, timeouts)
containment, using the facade pattern, is a crucial technique for integrating third-party code
contained code wrapped in a facade can be integrated with the rest of your data pipeline using standard reactive-programming techniques

chapter 6: evaluating models

calculating model metrics
training versus testing data
recording model metrics as messages

summary

models can be evaluated over hold-out data to assess their performance
statistics like accuracy, precision, recall, f-measure, and area under the curve can quantify model performance
failing to separate data used in training from testing can result in models that lack predictive capability
recording the provenance of models allows you to pass messages to other systems about their performance

chapter 7: publishing models

persisting learned models
modeling microservices using Akka HTTP
containerization of services using Docker

summary

models, and even entire training pipelines, can be persisted for later use
microservices are simple services that have very narrow responsibilities
models, as pure functions,can be encapsulated into microservices
you can contain failure of a predictive service by only communicating via message passing
you can use an actor hierarchy to ensure resilience within a service
applications can be containerized using tools like Docker

chapter 8: responding

using models to respond to user requests
managing containerized services
designing for failure

summary

tasks are useful lazy primitives for structuring expensive computations
structuring models as services makes elastic architectures easier to build
failing model services can be handled by a model supervisor
the principles of containment and supervision can be applied at several levels of systems design to ensure reactive properties

part 3: operating a machine learning system

chapter 9: delivering

building scala code using sbt
evaluating applications for deployment
strategies for deployments

summary

scala applications can be packaged into archives called JARs using sbt
build pipelines can be used to execute evaluations of machine learning functionality, like models
the decision to deploy a model can be made based on comparisons with meaningful values, like the performance of a random model, previous models' performance, or some known parameter
deploying applications continuously can allow a team to deliver a new functionality quickly
using metrics to determine whether new applications are deployable can make a deployment system fully autonomous

chapter 10: evolving intelligence

understanding artificial intelligence
working with agents
evolving the complexity of agents

summary

an agent is a software application that can act on its own
a reflect agent acts according to statically defined behavior
an intelligent agent acts according to knowledge that it has
a learning agent is capable of learning--it can improve its performance on a task given exposure to more data

cindywu/mls.md

machine learning systems

part 1: fundamentals of reactive machine learning

chapter 1: learning reactive machine learning

chapter 2: using reactive tools

part 2: building a reactive machine learning system

chapter 3: collecting data

chapter 4: generating features

chapter 5: learning models

chapter 6: evaluating models

chapter 7: publishing models

chapter 8: responding

part 3: operating a machine learning system

chapter 9: delivering

chapter 10: evolving intelligence