This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cloud Spanner has three types of replicas: | |
i. read-write replicas, | |
ii. read-only replicas, | |
iii. witness replicas. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Azure: | |
ADF/Databricks with Spark- Ingestion framework | |
ADLS - Data Storage | |
ADB - Transformations | |
Data Flows/ Polybase - To load data to Warehouse | |
Synapse - Datawarehouse | |
Azure SQL - Metadata Storage | |
ADF - Orchestration | |
Logic Apps : Alerts/Email | |
Azure Devops : Code Deployment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://www.boredapi.com/api/activity | |
https://www.mockaroo.com/help/terms_of_use | |
https://randomuser.me/api/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://stackoverflow.com/questions/40838036/what-is-the-difference-between-split-by-and-boundary-query-in-sqoop | |
https://discuss.itversity.com/t/using-boundary-query/18673 | |
https://stackoverflow.com/questions/37206232/sqoop-import-composite-primary-key-and-textual-primary-key | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql.types._ | |
// Create an RDD | |
val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt") | |
// The schema is encoded in a string | |
val schemaString = "name age" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
conda info | |
conda update -n base -c defaults conda | |
conda create --name data_ingestion python=3.6 | |
(OR) | |
conda create --name data_ingestion | |
conda activate data_ingestion | |
conda list |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://www.esg-global.com/validation/esg-technical-review-analyzing-the-performance-of-mapr-db | |
https://medium.com/hackernoon/interacting-with-mapr-db-58c4f482efa1 | |
https://www.linkedin.com/pulse/hbase-mapr-db-designed-distribution-scale-speed-chaaranpall-lambba/ | |
https://stackoverflow.com/questions/30254134/difference-between-mapr-db-and-hbase |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Understand the unique processing characteristics of stream processing: | |
This includes the difference between event time and processing time, sliding and tumbling windows, latearriving data and watermarks, | |
and missing data. | |
i. Event time is the time that something occurred at the place where the data is generated. | |
ii. Processing time is the time that data arrives at the endpoint where data is ingested. | |
iii. Sliding windows are used when you want to show how an aggregate, such as the average of the last three values, change over time, | |
and you want to update that stream of averages each time a new value arrives in the stream. | |
iv. Tumbling windows are used when you want to aggregate data over a fixed period of time for example, for the last one minute. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
i. GCS Trasnfer Tools (For small trasnfers upto a few TB'S) | |
GSUTIL | |
rsync --Fast multi thread mode | |
ii. Trasnfer service | |
Tools: UI,Client Libraries,HTTP REST API | |
Transfer Service for cloud data : | |
Transfer Service enables you to quickly and securely transfer data into Google Cloud Storage from a variety of online sources, such as Amazon S3 and Azure Blob Storage, or to move data between Cloud Storage buckets. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# az vm create command to create a Linux VM: | |
az vm create \ | |
--resource-group learn-85594f60-ef0f-4f1e-ad12-08bf2ea66630 \ | |
--name myvmanji \ | |
--image UbuntuLTS \ | |
--admin-username azureuser \ | |
--generate-ssh-keys | |
#Run the following az vm extension set command to configure Nginx on your VM: |