Skip to content

Instantly share code, notes, and snippets.

@j-thepac
j-thepac / BUILD
Last active May 16, 2022 09:47
bazel build file to add multiple class files in jar file
#---------------------------
#Bazel build parent.jar
#This would add Parent.scala , Child1.scala , Child2.scala in the jar file creatd
#---------------------------
load("@io_bazel_rules_scala//scala:scala.bzl", "scala_binary","scala_library")
package(default_visibility = ["//visibility:public"])
@j-thepac
j-thepac / notebook.ipynb
Last active May 6, 2022 11:24
Azure Synapse Receiving and Passing Values from Notepad
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@j-thepac
j-thepac / DP-900.md
Last active December 26, 2025 09:47
Material for Dp-900 Exam

DP 900

Azure Data Fundamentals: Explore core data concepts

  • Type: structured, semi-structured, or unstructured.
  • Data stores : File stores ,Databases
  • file Formats :
    • Delimited text files
    • JavaScript Object Notation (JSON)
    • Extensible Markup Language (XML)
  • Optimized File Format
@j-thepac
j-thepac / indexing.md
Last active April 27, 2022 12:18
Indexes in SQL
  • Indexes to query data faster, speed up sort operation, and enforce unique constraints.
  • A DB table each row has rowid and sequence number to identify row
  • Eg :table = list of pairs (rowid, row) )
  • Index is created on a seperate table which has opposite relationship: (row, rowid)
  • SQLite uses B-tree ie., balanced-tree ie., Actual table rows = Index table rows

Mac:

package ch06
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.lit
import org.apache.spark.sql.types.StringType
//nc -l -p 9999
object SimpleStream extends App {
val spark = SparkSession.builder
.master("local")
.appName("StructuredNetworkWordCount")
@j-thepac
j-thepac / DStreamSoacket.scala
Created April 5, 2022 03:47
Data into Socket
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.DStream
//nc -l -p 9999
object DStreamSoacket extends App {
val conf = new SparkConf().setMaster("local[*]").setAppName("App name")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(5))
val lines: DStream[String] = ssc.socketTextStream(hostname = "localhost", port = 9999)
val wordCounts = lines.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
@j-thepac
j-thepac / DStreamSimpleText.scala
Created April 5, 2022 03:45
Send Simple text Files as Dstream
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream
object DStreamSimpleText extends App {
val conf = new SparkConf().setMaster("local[*]").setAppName("App name")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(5))
val filestream: DStream[String] = ssc.textFileStream(
"/Users/deepakjayaprakash/Downloads/testing"
) // read new file
@j-thepac
j-thepac / DStreamSendRDD.scala
Created April 5, 2022 03:44
Send RDD in Streaming
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf}
import org.apache.spark.streaming.{Seconds, StreamingContext}
object DStreamSendRDD extends App {
val conf = new SparkConf().setMaster("local[*]").setAppName("StreamingTransformExample")
val ssc = new StreamingContext(conf, Seconds(5))
val rdd1 = ssc.sparkContext.parallelize(Array(1, 2, 3))
val rddQueue = scala.collection.mutable.Queue[RDD[Int]]() //scala.collection.mutable.Queue[RDD[MyObject]]()
@j-thepac
j-thepac / stream.scala
Created April 4, 2022 10:17
spark read stream data
import org.apache.spark.sql.SparkSession
object SimpleStream extends App {
val spark = SparkSession.builder
.master("local")
.appName("StructuredNetworkWordCount")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
@j-thepac
j-thepac / sttpApiClient.scala
Created March 12, 2022 06:10
Simple Http Client Implementation in Scala
// Ref -https://sttp.softwaremill.com/en/latest/examples.html
/*
libraryDependencies += "com.softwaremill.sttp.client3" %% "core" % "3.5.1"
Bazel Dependencies :
com.softwaremill.sttp.client3.core
com.softwaremill.sttp.client3.model
com.softwaremill.sttp.client3.shared
*/