Hungsiro506’s gists

Hungsiro506 / SlidingWindow.scala

Created July 29, 2017 10:54 — forked from koen-dejonghe/SlidingWindow.scala

	case class SomeEvent(value: Long)

	val events = Source
	.tick(0 seconds, 250 millis, "")
	.zipWithIndex
	.map { case (_, l) =>
	SomeEvent(l)
	}

	val group = Flow[SomeEvent].groupedWithin(100, 500 millis) // +/- 2 events per group

Created August 12, 2017 10:55

Slack integration project

	<component name="libraryTable">
	<library name="SBT: com.fasterxml.jackson.core:jackson-annotations:2.8.5:jar">
	<CLASSES>
	<root url="jar://$USER_HOME$/.ivy2/cache/com.fasterxml.jackson.core/jackson-annotations/bundles/jackson-annotations-2.8.5.jar!/" />
	</CLASSES>
	<JAVADOC />
	<SOURCES>
	<root url="jar://$USER_HOME$/.ivy2/cache/com.fasterxml.jackson.core/jackson-annotations/srcs/jackson-annotations-2.8.5-sources.jar!/" />
	</SOURCES>
	</library>

Hungsiro506 / Task_Not_Serialize.scala

Created August 28, 2017 19:20

	cannot be Serializable, because it contains references to
	Spark structures (i.e. SparkSession, SparkConf, etc...) as attributes.
	Job aborted due to stage failure: Task not serializable:
	If you see this error:

	org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: ...
	The above error can be triggered when you intialize a variable on the driver (master), but then try to use it on one of the workers. In that case, Spark Streaming will try to serialize the object to send it over to the worker, and fail if the object is not serializable. Consider the following code snippet:

	NotSerializable notSerializable = new NotSerializable();
	JavaRDD<String> rdd = sc.textFile("/tmp/myfile");

Hungsiro506 / Spark_data_cast.scala

Created September 6, 2017 18:00

	df.select( df("year").cast(IntegerType).as("year"), ... )
	to cast to the requested type? As a neat side effect, values not castable / "convertable" in that sense, will become null.

	In case you need this as a helper method, use:

	object DFHelper{
	def castColumnTo( df: DataFrame, cn: String, tpe: DataType ) : DataFrame = {
	df.withColumn( cn, df(cn).cast(tpe) )
	}
	}

Hungsiro506 / spark_datetime_convert.scala

Created September 11, 2017 12:51

	val todayLog_DF = loadData(today,Extensions.ALL).withColumn("Ref",sqlBoolFunc(col("Date")))
	.withColumn("Date",from_unixtime(unix_timestamp(col("Date"),"MMM dd yyyy HH:mm:ss"),"MM dd yyyy HH:mm:ss"))
	.selectExpr("Name","Date","SessID","Input","Output","Ref")
	todayLog_DF

Hungsiro506 / Redis TestApp

Created October 30, 2017 07:52 — forked from uromahn/ Redis TestApp

Jedis test with Redis Cluster

Title file

Hungsiro506 / date_time.sh

Last active February 5, 2019 02:55

Hungsiro506 / export_npic.scala

Last active November 23, 2017 04:22

	scala> val dns = spark.sqlContext.read.parquet("/data/dns/dns-extracted-two-hours/2017-11-22-02/out/")
	dns: org.apache.spark.sql.DataFrame = [value: string]

	scala> val splited = dns.withColumn("temp",split(col("value"),"\\t"))
	splited: org.apache.spark.sql.DataFrame = [value: string, temp: array<string>]

	scala> val df = splited.select((0 until 25).map(i => col("temp").getItem(i).as(s"col$i")): _*)
	df: org.apache.spark.sql.DataFrame = [col0: string, col1: string ... 23 more fields]

	scala> val npic = df.where("col24 = '-1'").select("col2")

Hungsiro506 / dns_mapping_test.scala

Created November 23, 2017 11:55

	scala> val df = spark.sqlContext.read.csv("/data/dns/cached_ip/*")
	df: org.apache.spark.sql.DataFrame = [_c0: string]

	scala> val cached = df
	cached: org.apache.spark.sql.DataFrame = [_c0: string]

	scala> val npic = spark.sqlContext.read.csv("/data/dns/npic_dns/*")
	npic: org.apache.spark.sql.DataFrame = [_c0: string]

	scala> val allo = spark.sqlContext.read.csv("/user/hungvd8/internet_user_profile_duration/Allocated-IPs2017-11-21.csv/*")

Hungsiro506 / gist:f8222d6ce6acdbab528d0fd0d200b76e

Created February 19, 2019 03:49 — forked from manjuraj/gist:9296160

Loan pattern

	// Purpose:
	// - Ensures that a resource is deterministically disposed of once it goes out of scope
	// - Use this pattern when working with resources that should be closed or managed after use
	//
	// The benefit of this pattern is that it frees the developer from the responsibility of
	// explicitly managing resources

	import scala.io.Source
	import java.io._

Hưng Vũ Hungsiro506