- Type: structured, semi-structured, or unstructured.
- Data stores : File stores ,Databases
- file Formats :
- Delimited text files
- JavaScript Object Notation (JSON)
- Extensible Markup Language (XML)
- Optimized File Format
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #--------------------------- | |
| #Bazel build parent.jar | |
| #This would add Parent.scala , Child1.scala , Child2.scala in the jar file creatd | |
| #--------------------------- | |
| load("@io_bazel_rules_scala//scala:scala.bzl", "scala_binary","scala_library") | |
| package(default_visibility = ["//visibility:public"]) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
- Indexes to query data faster, speed up sort operation, and enforce unique constraints.
- A DB table each row has rowid and sequence number to identify row
- Eg :table = list of pairs (rowid, row) )
- Index is created on a seperate table which has opposite relationship: (row, rowid)
- SQLite uses B-tree ie., balanced-tree ie., Actual table rows = Index table rows
Mac:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| package ch06 | |
| import org.apache.spark.sql.SparkSession | |
| import org.apache.spark.sql.functions.lit | |
| import org.apache.spark.sql.types.StringType | |
| //nc -l -p 9999 | |
| object SimpleStream extends App { | |
| val spark = SparkSession.builder | |
| .master("local") | |
| .appName("StructuredNetworkWordCount") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark.{SparkConf, SparkContext} | |
| import org.apache.spark.streaming.{Seconds, StreamingContext} | |
| import org.apache.spark.streaming.dstream.DStream | |
| //nc -l -p 9999 | |
| object DStreamSoacket extends App { | |
| val conf = new SparkConf().setMaster("local[*]").setAppName("App name") | |
| val sc = new SparkContext(conf) | |
| val ssc = new StreamingContext(sc, Seconds(5)) | |
| val lines: DStream[String] = ssc.socketTextStream(hostname = "localhost", port = 9999) | |
| val wordCounts = lines.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark._ | |
| import org.apache.spark.streaming._ | |
| import org.apache.spark.streaming.dstream.DStream | |
| object DStreamSimpleText extends App { | |
| val conf = new SparkConf().setMaster("local[*]").setAppName("App name") | |
| val sc = new SparkContext(conf) | |
| val ssc = new StreamingContext(sc, Seconds(5)) | |
| val filestream: DStream[String] = ssc.textFileStream( | |
| "/Users/deepakjayaprakash/Downloads/testing" | |
| ) // read new file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark.rdd.RDD | |
| import org.apache.spark.{SparkConf} | |
| import org.apache.spark.streaming.{Seconds, StreamingContext} | |
| object DStreamSendRDD extends App { | |
| val conf = new SparkConf().setMaster("local[*]").setAppName("StreamingTransformExample") | |
| val ssc = new StreamingContext(conf, Seconds(5)) | |
| val rdd1 = ssc.sparkContext.parallelize(Array(1, 2, 3)) | |
| val rddQueue = scala.collection.mutable.Queue[RDD[Int]]() //scala.collection.mutable.Queue[RDD[MyObject]]() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark.sql.SparkSession | |
| object SimpleStream extends App { | |
| val spark = SparkSession.builder | |
| .master("local") | |
| .appName("StructuredNetworkWordCount") | |
| .getOrCreate() | |
| spark.sparkContext.setLogLevel("ERROR") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // Ref -https://sttp.softwaremill.com/en/latest/examples.html | |
| /* | |
| libraryDependencies += "com.softwaremill.sttp.client3" %% "core" % "3.5.1" | |
| Bazel Dependencies : | |
| com.softwaremill.sttp.client3.core | |
| com.softwaremill.sttp.client3.model | |
| com.softwaremill.sttp.client3.shared | |
| */ |