/* Part 0: Initialise */
def randomInt = scala.util.Random.nextInt(10000)
val dataframe = sc.parallelize(
Seq.fill(100000){(randomInt,randomInt,randomInt)}
).toDF("cID", "c2", "c3")
val anotherDataframe = sc.parallelize(
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2 | |
tar xvf protobuf-2.5.0.tar.bz2 | |
cd protobuf-2.5.0 | |
./configure CC=clang CXX=clang++ CXXFLAGS='-std=c++11 -stdlib=libc++ -O3 -g' LDFLAGS='-stdlib=libc++' LIBS="-lc++ -lc++abi" | |
make -j 4 | |
sudo make install | |
protoc --version |
/** | |
* Returns a new `DataFrame` that replaces null or NaN values in specified | |
* numeric, string columns. If a specified column is not a numeric, string | |
* or boolean column it is ignored. | |
*/ | |
private def fillValue[T](value: T, cols: Seq[String]): DataFrame = { | |
// the fill[T] which T is Long/Double, | |
// should apply on all the NumericType Column, for example: | |
// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b") | |
// input.na.fill(3.1) |
scala> widetable.explain()
== Physical Plan ==
*SortMergeJoin [customer_id#2], [customer_id#17], Inner
:- *Sort [customer_id#2 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(customer_id#2, 200)
: +- *Scan JDBCRelation(foodmart.sales_fact_1998) [numPartitions=1] [product_id#0,time_id#1,customer_id#2,promotion_id#3,store_id#4,store_sales#5,store_cost#6,unit_sales#7] ReadSchema: struct<product_id:int,time_id:int,customer_id:int,promotion_id:int,store_id:int,store_sales:decim...
+- *Sort [customer_id#17 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(customer_id#17, 200)
+- *Scan JDBCRelation(foodmart.customer) [numPartitions=1] [customer_id#17,fullname#45] ReadSchema: struct
/** | |
* A tiny class that extends a list with four combinatorial operations: | |
* ''combinations'', ''subsets'', ''permutations'', ''variations''. | |
* | |
* You can find all the ideas behind this code at blog-post: | |
* | |
* http://vkostyukov.ru/posts/combinatorial-algorithms-in-scala/ | |
* | |
* How to use this class. | |
* |
import scala.reflect.ClassTag | |
import scala.collection.mutable.WrappedArray | |
import scala.collection.mutable.ArrayLike | |
def ASeq[T](elt: T*)(implicit ct: ClassTag[T]): IndexedSeq[T] = { | |
val a = elt.toArray.clone | |
a.deep.asInstanceOf[IndexedSeq[T]] | |
} | |
val a = Array(1,2,3) //> a : Array[Int] = Array(1, 2, 3) |
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;
/* 007 */ private scala.collection.Iterator[] inputs;
/* 008 */ private scala.collection.Iterator inputadapter_input;
Disjoint subtyping is a scenario that is often encountered in data modeling. In one frequently used modeling approach, an entity of a certain type is represented by a database table, and each subtype of this entity is represented by another table. Subtyping is disjoint if an instance of a type corresponds to at most one instance of a subtype.
When querying on a data model with subtypes, a common (verbose?!) way of doing this is using case
expressions. The idea of this post is to introduce an alternative to this approach using coalesce
. I will be illustrating this across multiple examples.
In the toy schema depicted below, list all orders with their corresponding vendors and customers
Order Table
Id | Status | Quantity | Type |
In MariaDB, query with ORDER BY
in a FROM
subquery produces an unordered result. In effect, ORDER BY
is ignored in FROM
subqueries. MySQL does not ignore ORDER BY
in FROM
subqueries.
Older versions of MariaDB(< 10.2.0) did not have window functions such as rank()
, dense_rank()
, row_number()
among others. To understand where you would use such a function, dense_rank()
for instance, consider the following example: