spark notes
sc. [tab] - show available methods
- dependency issues? parquet avro, ... specificRecord not present...
— Spark's sort-based shuffle is affected by a kernel bug Spark's sort-based shuffle is affected by a kernel bug (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2cb4b05e7647891b46b91c07c9a60304803d1688). The kernel bug was fixed in RHEL/CentOS 6.2. Note: CDH defaults to hash-based shuffle.
not nice: Type mismatch in Spark shell when using case class defined in shell https://issues.apache.org/jira/browse/SPARK-1199
don't care: for python only: https://issues.apache.org/jira/browse/SPARK-5363 case class cannot be used as key for reduce: not nice, but not that horrible- just don't define a case class in REPL.
Don't apply accumulator updates multiple times for tasks in result stages