Anjaiah Methuku anjijava16

💭

Awesome

Real-time Streaming ■ BigData ■ Machine Learning ■ CLOUD ■ JAVA ■ PYTHON ■ Blog ■ Remain Curious And Keep Learning .....

anjijava16 / slot_utilization.sql

Created July 11, 2021 15:30

	SELECT
	COUNT(*) TOTAL_QUERIES,
	SUM(total_slot_ms/TIMESTAMP_DIFF(end_time,creation_time,MILLISECOND)) AVG_SLOT_USAGE,
	SUM(TIMESTAMP_DIFF(end_time,creation_time,SECOND)) TOTAL_DURATION_IN_SECONDS,
	AVG(TIMESTAMP_DIFF(end_time,creation_time,SECOND)) AVG_DURATION_IN_SECONDS,
	SUM(total_bytes_processed*10e-12) TOTAL_PROCESSED_TB,
	EXTRACT (DATE FROM creation_time) AS EXECUTION_DATE,
	user_email as USER
	FROM `iwinner-data-318822.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
	WHERE state='DONE'

anjijava16 / Replicas_In_Cloud.scala

Created July 11, 2021 04:13

	Cloud Spanner has three types of replicas:
	i. read-write replicas,
	ii. read-only replicas,
	iii. witness replicas.

anjijava16 / AWS_AZUE_GCP.sh

Created July 10, 2021 21:29

	Azure:
	ADF/Databricks with Spark- Ingestion framework
	ADLS - Data Storage
	ADB - Transformations
	Data Flows/ Polybase - To load data to Warehouse
	Synapse - Datawarehouse
	Azure SQL - Metadata Storage
	ADF - Orchestration
	Logic Apps : Alerts/Email
	Azure Devops : Code Deployment

anjijava16 / Free_Rest_API.txt

Last active June 23, 2021 12:56

	https://www.boredapi.com/api/activity
	https://www.mockaroo.com/help/terms_of_use
	https://randomuser.me/api/

anjijava16 / boundary_query_vs_split_by_sqoop.sh

Created June 21, 2021 03:51

	https://stackoverflow.com/questions/40838036/what-is-the-difference-between-split-by-and-boundary-query-in-sqoop


	https://discuss.itversity.com/t/using-boundary-query/18673



	https://stackoverflow.com/questions/37206232/sqoop-import-composite-primary-key-and-textual-primary-key

anjijava16 / Generic_Schema.python

Created June 21, 2021 03:33



	import org.apache.spark.sql.types._

	// Create an RDD
	val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")

	// The schema is encoded in a string
	val schemaString = "name age"

anjijava16 / conda_install.txt

Last active December 2, 2021 03:56

	conda info

	conda update -n base -c defaults conda

	conda create --name data_ingestion python=3.6
	(OR)
	conda create --name data_ingestion

	conda activate data_ingestion
	conda list

anjijava16 / maprdb_imp.scala

Created June 13, 2021 04:54

	https://www.esg-global.com/validation/esg-technical-review-analyzing-the-performance-of-mapr-db
	https://medium.com/hackernoon/interacting-with-mapr-db-58c4f482efa1
	https://www.linkedin.com/pulse/hbase-mapr-db-designed-distribution-scale-speed-chaaranpall-lambba/
	https://stackoverflow.com/questions/30254134/difference-between-mapr-db-and-hbase

anjijava16 / streaming_processing.scala

Created May 30, 2021 12:44

	Understand the unique processing characteristics of stream processing:

	This includes the difference between event time and processing time, sliding and tumbling windows, latearriving data and watermarks,
	and missing data.

	i. Event time is the time that something occurred at the place where the data is generated.
	ii. Processing time is the time that data arrives at the endpoint where data is ingested.
	iii. Sliding windows are used when you want to show how an aggregate, such as the average of the last three values, change over time,
	and you want to update that stream of averages each time a new value arrives in the stream.
	iv. Tumbling windows are used when you want to aggregate data over a fixed period of time for example, for the last one minute.

anjijava16 / GCP_Data_Trasnfer_Services.scala

Last active May 30, 2021 12:56


	i. GCS Trasnfer Tools (For small trasnfers upto a few TB'S)
	GSUTIL
	rsync --Fast multi thread mode
	ii. Trasnfer service
	Tools: UI,Client Libraries,HTTP REST API

	Transfer Service for cloud data :
	Transfer Service enables you to quickly and securely transfer data into Google Cloud Storage from a variety of online sources, such as Amazon S3 and Azure Blob Storage, or to move data between Cloud Storage buckets.