Sujith Jay Nair sujithjay

/* Part 0: Initialise */
def randomInt = scala.util.Random.nextInt(10000)

val dataframe = sc.parallelize(
  Seq.fill(100000){(randomInt,randomInt,randomInt)}
  ).toDF("cID", "c2", "c3")

val anotherDataframe = sc.parallelize(

scala> widetable.explain()
== Physical Plan ==
*SortMergeJoin [customer_id#2], [customer_id#17], Inner
:- *Sort [customer_id#2 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(customer_id#2, 200)
:     +- *Scan JDBCRelation(foodmart.sales_fact_1998) [numPartitions=1] [product_id#0,time_id#1,customer_id#2,promotion_id#3,store_id#4,store_sales#5,store_cost#6,unit_sales#7] ReadSchema: struct<product_id:int,time_id:int,customer_id:int,promotion_id:int,store_id:int,store_sales:decim...
+- *Sort [customer_id#17 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(customer_id#17, 200)
 +- *Scan JDBCRelation(foodmart.customer) [numPartitions=1] [customer_id#17,fullname#45] ReadSchema: struct

/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private scala.collection.Iterator inputadapter_input;

Disjoint subtyping is a scenario that is often encountered in data modeling. In one frequently used modeling approach, an entity of a certain type is represented by a database table, and each subtype of this entity is represented by another table. Subtyping is disjoint if an instance of a type corresponds to at most one instance of a subtype.

When querying on a data model with subtypes, a common (verbose?!) way of doing this is using case expressions. The idea of this post is to introduce an alternative to this approach using coalesce. I will be illustrating this across multiple examples.

Example #1

In the toy schema depicted below, list all orders with their corresponding vendors and customers

Order Table

Id | Status | Quantity | Type |

A Point of Distinction between MariaDB and MySQL

TL; DR

In MariaDB, query with ORDER BY in a FROM subquery produces an unordered result. In effect, ORDER BY is ignored in FROM subqueries. MySQL does not ignore ORDER BY in FROM subqueries.

Longer Version

Older versions of MariaDB(< 10.2.0) did not have window functions such as rank(), dense_rank(), row_number() among others. To understand where you would use such a function, dense_rank() for instance, consider the following example:

	wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2
	tar xvf protobuf-2.5.0.tar.bz2
	cd protobuf-2.5.0
	./configure CC=clang CXX=clang++ CXXFLAGS='-std=c++11 -stdlib=libc++ -O3 -g' LDFLAGS='-stdlib=libc++' LIBS="-lc++ -lc++abi"
	make -j 4
	sudo make install
	protoc --version

	/**
	* Returns a new `DataFrame` that replaces null or NaN values in specified
	* numeric, string columns. If a specified column is not a numeric, string
	* or boolean column it is ignored.
	*/
	private def fillValue[T](value: T, cols: Seq[String]): DataFrame = {
	// the fill[T] which T is Long/Double,
	// should apply on all the NumericType Column, for example:
	// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b")
	// input.na.fill(3.1)

	/**
	* A tiny class that extends a list with four combinatorial operations:
	* ''combinations'', ''subsets'', ''permutations'', ''variations''.
	*
	* You can find all the ideas behind this code at blog-post:
	*
	* http://vkostyukov.ru/posts/combinatorial-algorithms-in-scala/
	*
	* How to use this class.
	*

	import scala.reflect.ClassTag
	import scala.collection.mutable.WrappedArray
	import scala.collection.mutable.ArrayLike

	def ASeq[T](elt: T*)(implicit ct: ClassTag[T]): IndexedSeq[T] = {
	val a = elt.toArray.clone
	a.deep.asInstanceOf[IndexedSeq[T]]
	}

	val a = Array(1,2,3) //> a : Array[Int] = Array(1, 2, 3)