/* Part 0: Initialise */
def randomInt = scala.util.Random.nextInt(10000)
val dataframe = sc.parallelize(
  Seq.fill(100000){(randomInt,randomInt,randomInt)}
  ).toDF("cID", "c2", "c3")
val anotherDataframe = sc.parallelize(| wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2 | |
| tar xvf protobuf-2.5.0.tar.bz2 | |
| cd protobuf-2.5.0 | |
| ./configure CC=clang CXX=clang++ CXXFLAGS='-std=c++11 -stdlib=libc++ -O3 -g' LDFLAGS='-stdlib=libc++' LIBS="-lc++ -lc++abi" | |
| make -j 4 | |
| sudo make install | |
| protoc --version | 
| /** | |
| * Returns a new `DataFrame` that replaces null or NaN values in specified | |
| * numeric, string columns. If a specified column is not a numeric, string | |
| * or boolean column it is ignored. | |
| */ | |
| private def fillValue[T](value: T, cols: Seq[String]): DataFrame = { | |
| // the fill[T] which T is Long/Double, | |
| // should apply on all the NumericType Column, for example: | |
| // val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b") | |
| // input.na.fill(3.1) | 
scala> widetable.explain()
== Physical Plan ==
*SortMergeJoin [customer_id#2], [customer_id#17], Inner
:- *Sort [customer_id#2 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(customer_id#2, 200)
:     +- *Scan JDBCRelation(foodmart.sales_fact_1998) [numPartitions=1] [product_id#0,time_id#1,customer_id#2,promotion_id#3,store_id#4,store_sales#5,store_cost#6,unit_sales#7] ReadSchema: struct<product_id:int,time_id:int,customer_id:int,promotion_id:int,store_id:int,store_sales:decim...
+- *Sort [customer_id#17 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(customer_id#17, 200)
 +- *Scan JDBCRelation(foodmart.customer) [numPartitions=1] [customer_id#17,fullname#45] ReadSchema: struct| /** | |
| * A tiny class that extends a list with four combinatorial operations: | |
| * ''combinations'', ''subsets'', ''permutations'', ''variations''. | |
| * | |
| * You can find all the ideas behind this code at blog-post: | |
| * | |
| * http://vkostyukov.ru/posts/combinatorial-algorithms-in-scala/ | |
| * | |
| * How to use this class. | |
| * | 
| import scala.reflect.ClassTag | |
| import scala.collection.mutable.WrappedArray | |
| import scala.collection.mutable.ArrayLike | |
| def ASeq[T](elt: T*)(implicit ct: ClassTag[T]): IndexedSeq[T] = { | |
| val a = elt.toArray.clone | |
| a.deep.asInstanceOf[IndexedSeq[T]] | |
| } | |
| val a = Array(1,2,3) //> a : Array[Int] = Array(1, 2, 3) | 
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private scala.collection.Iterator inputadapter_input;Disjoint subtyping is a scenario that is often encountered in data modeling. In one frequently used modeling approach, an entity of a certain type is represented by a database table, and each subtype of this entity is represented by another table. Subtyping is disjoint if an instance of a type corresponds to at most one instance of a subtype.
When querying on a data model with subtypes, a common (verbose?!) way of doing this is using case expressions. The idea of this post is to introduce an alternative to this approach using coalesce. I will be illustrating this across multiple examples.
In the toy schema depicted below, list all orders with their corresponding vendors and customers
Order Table
Id | Status | Quantity | Type |
In MariaDB, query with ORDER BY in a FROM subquery produces an unordered result. In effect, ORDER BY is ignored in FROM subqueries. MySQL does not ignore ORDER BY in FROM subqueries.
Older versions of MariaDB(< 10.2.0) did not have window functions such as rank(), dense_rank(), row_number() among others. To understand where you would use such a function, dense_rank() for instance, consider the following example: