Skip to content

Instantly share code, notes, and snippets.

View anjijava16's full-sized avatar
💭
Awesome

Anjaiah Methuku anjijava16

💭
Awesome
View GitHub Profile
package com.mts.matrix.spark.stream
import com.mts.matrix.spark.utils.SparkUtils
import org.apache.spark.sql.{DataFrame, SaveMode}
import org.apache.spark.sql.functions.{col,lit, from_json}
import org.apache.spark.sql.streaming.{StreamingQuery, Trigger}
import org.apache.spark.sql.types.{IntegerType, StringType, StructType}
import org.apache.spark.sql.streaming.Trigger
def getSparkSessionMongoDbConfig(parms: Map[String, String]): SparkSession = {
val spark = SparkSession
.builder
.appName(parms("JOB_NAME"))
.master("local[*]")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/retaildb.orders?authSource=admin")
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/retaildb.orders?authSource=admin")
.getOrCreate()
val isS3Enable = parms("S3_OPERATION_ENABLE").toBoolean;
####################################################################################
UDF VS UDAF VS UDTF
1.UDF : UDFs works on a single row in a table and produces a single row as output. Its one to one relationship between input and output of a function. e.g Hive built in TRIM() function.
Extends UDF
we have to overload a method called evaluate() inside our class.
2.UDAF : User defined aggregate functions works on more than one row and gives single row as output. e.g Hive built in MAX() or COUNT() functions.
Extends UDAF.
We need to overwrite five methods called init(), iterate(), terminatePartial(), merge() and terminate()
MongoDB :
localhost
Port:27017
username: admin
password: admin
Port : 27017
Databasename: meetup
collectionName(Table_Name): meetup_rsvp_tbl
Write to Cassandra using foreachBatch() in Scala
import org.apache.spark.sql._
import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql.CassandraConnectorConf
import com.datastax.spark.connector.rdd.ReadConf
import com.datastax.spark.connector._
val host = "<ip address>"
Spark Cassandra Filter
CREATE TABLE data_storage.stack_overflow_test_table (
id int,
text_id text,
clustering date,
some_other text,
PRIMARY KEY (( id, text_id ), clustering)
)
https://www.guru99.com/deep-learning-libraries.html
################################################################
TensorFlow
Created by Google
version 1.0 in February, 2017
TensorFlow is an open-source software library for dataflow programming across a range of tasks.
It is a symbolic math library that is used for machine learning applications like neural networks.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
https://www.youtube.com/playlist?list=PLZoTAELRMXVPUyxuK8AphGMuIJHTyuWna
https://www.youtube.com/watch?v=p_tpQSY1aTs&list=PLZoTAELRMXVPUyxuK8AphGMuIJHTyuWna&index=3&t=0s