Skip to content

Instantly share code, notes, and snippets.

@anjijava16
Created March 29, 2021 20:21
Show Gist options
  • Save anjijava16/3b7d369da23d37607277e9d0d3653d67 to your computer and use it in GitHub Desktop.
Save anjijava16/3b7d369da23d37607277e9d0d3653d67 to your computer and use it in GitHub Desktop.
Database pioneer and Turing Award winner Jim Gray gave a famous adage: When you have lots of data, bring [machine learning] computations to the data, rather than data to the computations.
According to him, there is nothing closer to the data than the database; so the computations have to be done inside the database.
Now all major cloud and database vendors are:
๐Ÿ”ธ offering SQL data pipelines in the data warehouse
๐Ÿ”ธ expanding in-database ML computations offerings
ML and analytics in the data warehouse are cheaper and more efficient.
๐Ÿ”น Google Cloud BigQuery
- Build SQL data pipeline with Dataform, and
- ML models with BigQuery ML
๐Ÿ”น Amazon Web Services (AWS) RedShift
- Build serverless data pipelines in SQL with Datacoral, and
- ML models using SQL with Amazon Redshift ML
๐Ÿ”น Microsoft Azure Data Factory
- Run data and machine learning pipelines in Azure Data Factory
IBM DB2 and Oracle also offer in-database machine learning.
Check out resources for each in the comments, and share your feedback and experiences.
๐Ÿ‘‰ Click #ML4Devs and follow for more content on Machine Learning for Developers.
Resources:
1. Google BigQuery:
- Data transformation with SQL in BigQuery
https://cloud.google.com/blog/products/data-analytics/welcoming-dataform-to-bigquery
- BigQuery ML: https://cloud.google.com/bigquery-ml/docs/introduction
2. AWS RedShift:
- SQL pipelines with Datacoral: https://aws.amazon.com/blogs/apn/building-serverless-data-pipelines-on-amazon-redshift-by-writing-sql-with-datacoral/
- ML models using SQL in Redshift: https://aws.amazon.com/blogs/big-data/create-train-and-deploy-machine-learning-models-in-amazon-redshift-using-sql-with-amazon-redshift-ml/
Microsoft Azure Data Factory:
- Data pipelines: https://docs.microsoft.com/en-us/azure/data-factory/transform-data
- ML pipelines: https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-machine-learning
Jim Gray's Laws:
- https://www.microsoft.com/en-us/research/wp-content/uploads/2009/10/Fourth_Paradigm.pdf
- https://cacm.acm.org/magazines/2008/11/549-jim-gray-astronomer/fulltext
Bonus:
- 8 databases supporting in-database machine learning
https://www.infoworld.com/article/3607762/8-databases-supporting-in-database-machine-learning.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment