Entities and cardinality:
1 +------------+
+------+ stream_dtc |
+-------------+ 1 n +-------------+ 1 | +------------+
// THIS IS NOT WORKING! | |
type A = { | |
type: "A"; | |
aprop: number; | |
}; | |
type B = { | |
type: "B"; | |
otherprops: number; |
Mitchell Hashimoto - @mitchellh
https://eng.uber.com/michelangelo/
Finding good features is often the hardest part of machine learning and we have found that building and managing data pipelines is typically one of the most costly pieces of a complete machine learning solution.
A platform should provide standard tools for building data pipelines to generate feature and label data sets for training (and re-training) and feature-only data sets for predicting. These tools should have deep integration with the company’s data lake or warehouses and with the company’s online data serving systems. The pipelines need to be scalable and performant, incorporate integrated monitoring for data flow and data quality, and support both online and offline training and predicting. Ideally, they should also generate the features in a way that is shareable across teams to reduce duplicate work and increase data quality. They should also provide strong guard rails and controls to encourage and empower users to adop
Type: Blurr:Transform:Streaming | |
Version: '2018-03-01' | |
Description: New York Store Exchange Transformations | |
Name: nyse | |
Import: | |
- { Module: datetime, Identifiers: [ datetime ] } | |
Identity: source.symbol |
echo "postprocessing documentation..." | |
PACKAGE=`cat blurr/PACKAGE` # PACKAGE and VERSION are generated during pypy package build | |
VERSION=`cat blurr/VERSION` | |
BRANCH=`git branch | sed -n -e 's/^\* \(.*\)/\1/p'` | |
sed -e "s/\@BRANCH@/$BRANCH/" binder/README-template.md > binder/README.md | |
sed -e "s/\@PACKAGE@/$PACKAGE/" -e "s/\@VERSION@/$VERSION/" binder/requirements-template.txt > binder/requirements.txt |
I't a serverless tool that extend traditional event logging introducing support advance Real Time Processing and ML Training scenarios.
Because it enables advanced data engineering and ML scenarios for your project in a lightweight and affordable fashion
Because we're open to colaborate in your proposal and build the features you need.