Anjaiah Methuku anjijava16

💭

Awesome

Real-time Streaming ■ BigData ■ Machine Learning ■ CLOUD ■ JAVA ■ PYTHON ■ Blog ■ Remain Curious And Keep Learning .....

anjijava16 / gcloud_cheat_sheet.md

Created January 19, 2021 15:50 — forked from pydevops/gcloud-cheat-sheet.md

gcp gcloud cheat sheet

anjijava16 / Spark_Strcutured_Streaming_Write_MySQL.scala

Created January 9, 2021 21:44

	package com.mts.matrix.spark.stream

	import com.mts.matrix.spark.utils.SparkUtils
	import org.apache.spark.sql.{DataFrame, SaveMode}
	import org.apache.spark.sql.functions.{col,lit, from_json}
	import org.apache.spark.sql.streaming.{StreamingQuery, Trigger}
	import org.apache.spark.sql.types.{IntegerType, StringType, StructType}
	import org.apache.spark.sql.streaming.Trigger

anjijava16 / read_jdbc_parquet_write_mongo.scala

Last active December 22, 2020 23:30

	def getSparkSessionMongoDbConfig(parms: Map[String, String]): SparkSession = {
	val spark = SparkSession
	.builder
	.appName(parms("JOB_NAME"))
	.master("local[*]")
	.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/retaildb.orders?authSource=admin")
	.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/retaildb.orders?authSource=admin")
	.getOrCreate()

	val isS3Enable = parms("S3_OPERATION_ENABLE").toBoolean;

anjijava16 / udf_udaf_udtf.sql

Created December 11, 2020 00:45


	####################################################################################
	UDF VS UDAF VS UDTF

	1.UDF : UDFs works on a single row in a table and produces a single row as output. Its one to one relationship between input and output of a function. e.g Hive built in TRIM() function.
	Extends UDF
	we have to overload a method called evaluate() inside our class.
	2.UDAF : User defined aggregate functions works on more than one row and gives single row as output. e.g Hive built in MAX() or COUNT() functions.
	Extends UDAF.
	We need to overwrite five methods called init(), iterate(), terminatePartial(), merge() and terminate()

anjijava16 / mongo_db_windows_setup.txt

Created December 11, 2020 00:41

	MongoDB :
	localhost
	Port:27017

	username: admin
	password: admin
	Port : 27017
	Databasename: meetup
	collectionName(Table_Name): meetup_rsvp_tbl

anjijava16 / Spark_write_Nosql.py

Created December 11, 2020 00:39

	Write to Cassandra using foreachBatch() in Scala

	import org.apache.spark.sql._
	import org.apache.spark.sql.cassandra._

	import com.datastax.spark.connector.cql.CassandraConnectorConf
	import com.datastax.spark.connector.rdd.ReadConf
	import com.datastax.spark.connector._

	val host = "<ip address>"

anjijava16 / spark_cassndra_filter_query

Created September 2, 2020 03:31

	Spark Cassandra Filter

	CREATE TABLE data_storage.stack_overflow_test_table (
	id int,
	text_id text,
	clustering date,
	some_other text,
	PRIMARY KEY (( id, text_id ), clustering)
	)

anjijava16 / Deep_Learning.scala

Created June 28, 2020 15:40

	https://www.guru99.com/deep-learning-libraries.html

	################################################################

	TensorFlow
	Created by Google
	version 1.0 in February, 2017
	TensorFlow is an open-source software library for dataflow programming across a range of tasks.
	It is a symbolic math library that is used for machine learning applications like neural networks.

anjijava16 / DataStreamReader.scala

Created June 26, 2020 18:20

	/*
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*

anjijava16 / DataScience_youtube

Last active June 25, 2021 08:32

	https://www.youtube.com/playlist?list=PLZoTAELRMXVPUyxuK8AphGMuIJHTyuWna
	https://www.youtube.com/watch?v=p_tpQSY1aTs&list=PLZoTAELRMXVPUyxuK8AphGMuIJHTyuWna&index=3&t=0s

Table of Contents