Imran Rashid squito

References

https://stackoverflow.com/a/9576170/1442961
https://dba.stackexchange.com/q/59006/134391
timestamp means timestamp without time zone, SQL spec, 6.1, point 35:

If is not specified, then WITHOUT TIME ZONE is implicit

Parsing Strings

Spark "Timestamp" Behavior

Reading data in different timezones

Note that the ansi sql standard defines "timestamp" as equivalent to "timestamp without time zone". However Spark's behavior depends on both the version of spark and the file format

format \ spark version	<= 2.0.0	>= 2.0.1

	val dictionary = Map(
	"a" -> Set("apple", "ant"),
	"b" -> Set("banana", "barn")
	)

	// lets count how many times each letter occurs in all words in our dictionary
	val letters = dictionary.values.flatMap {x => x.flatMap {_.toCharArray} }
	val letterCounts = letters.groupBy(identity).mapValues(_.size)
	letterCounts.toArray.sorted.foreach{println}

	#!/bin/bash
	# this is a demo of how to remove an argument given with the [-arg value] notation for a specific
	# [arg] (-T in this case, but easy to modify)

	echo $@
	echo $#

	i=0
	ORIGINAL_ARGS=("$@")
	TRIMMED_ARGS=()

	# really, plus https://github.com/squito/spark/commit/8ce85969b680424ebda51ff9fe8f6e9ab9a9c4a9, b/c otherwise
	# its getting unfairly penalized for my stupid framework
	# but it really should also have 8b41649 (offers.toIndexedSeq) to be a fair comparison

	[info] SchedulerPerformanceSuite:
	Iteration 0 finished in 470 ms
	Iteration 1 finished in 150 ms
	Iteration 2 finished in 122 ms
	Iteration 3 finished in 122 ms
	Iteration 4 finished in 101 ms

	#!/bin/bash

	my_func() {(
	# this takes a big shortcut around doing testing & unsetting -- because this entire function
	# is wrapped in "()", it executes in a subsell, so we can unconditionally unset, without
	# effecting vars outside
	unset MASTER
	echo "do something with MASTER=${MASTER-unset}"
	)}

	import java.lang.reflect.Method

	import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
	import org.apache.spark.sql.sources.{HadoopFsRelation, BaseRelation}
	import org.apache.spark.sql.DataFrame

	def getPaths(relation: BaseRelation): Iterator[String] = {
	relation match {
	case hr: HadoopFsRelation =>
	hr.paths.toIterator

	import traceback
	import sys


	def a(x): b(x)

	def b(x): c(x)

	def c(x): d(x)

	/* For example, I want to do this:
	*
	* sqlContext.catalog.client.getTable("default", "blah").properties
	*
	* but none of that is public to me in the shell. Using this, I can now do:
	*
	* sqlContext.reflectField("catalog").reflectField("client").reflectMethod("getTable", Seq("default", "blah")).reflectField("properties")
	*
	* not perfect, but usable.
	*/

	import java.io._

	object CanIReadOpenDeletedFile {
	def main(args: Array[String]): Unit = {
	try {
	val f = new File("deleteme")
	val out = new FileOutputStream(f)
	out.write(1)
	out.close()