Created
March 29, 2021 20:21
-
-
Save anjijava16/3b7d369da23d37607277e9d0d3653d67 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Database pioneer and Turing Award winner Jim Gray gave a famous adage: When you have lots of data, bring [machine learning] computations to the data, rather than data to the computations. | |
According to him, there is nothing closer to the data than the database; so the computations have to be done inside the database. | |
Now all major cloud and database vendors are: | |
๐ธ offering SQL data pipelines in the data warehouse | |
๐ธ expanding in-database ML computations offerings | |
ML and analytics in the data warehouse are cheaper and more efficient. | |
๐น Google Cloud BigQuery | |
- Build SQL data pipeline with Dataform, and | |
- ML models with BigQuery ML | |
๐น Amazon Web Services (AWS) RedShift | |
- Build serverless data pipelines in SQL with Datacoral, and | |
- ML models using SQL with Amazon Redshift ML | |
๐น Microsoft Azure Data Factory | |
- Run data and machine learning pipelines in Azure Data Factory | |
IBM DB2 and Oracle also offer in-database machine learning. | |
Check out resources for each in the comments, and share your feedback and experiences. | |
๐ Click #ML4Devs and follow for more content on Machine Learning for Developers. | |
Resources: | |
1. Google BigQuery: | |
- Data transformation with SQL in BigQuery | |
https://cloud.google.com/blog/products/data-analytics/welcoming-dataform-to-bigquery | |
- BigQuery ML: https://cloud.google.com/bigquery-ml/docs/introduction | |
2. AWS RedShift: | |
- SQL pipelines with Datacoral: https://aws.amazon.com/blogs/apn/building-serverless-data-pipelines-on-amazon-redshift-by-writing-sql-with-datacoral/ | |
- ML models using SQL in Redshift: https://aws.amazon.com/blogs/big-data/create-train-and-deploy-machine-learning-models-in-amazon-redshift-using-sql-with-amazon-redshift-ml/ | |
Microsoft Azure Data Factory: | |
- Data pipelines: https://docs.microsoft.com/en-us/azure/data-factory/transform-data | |
- ML pipelines: https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-machine-learning | |
Jim Gray's Laws: | |
- https://www.microsoft.com/en-us/research/wp-content/uploads/2009/10/Fourth_Paradigm.pdf | |
- https://cacm.acm.org/magazines/2008/11/549-jim-gray-astronomer/fulltext | |
Bonus: | |
- 8 databases supporting in-database machine learning | |
https://www.infoworld.com/article/3607762/8-databases-supporting-in-database-machine-learning.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment