Skip to content

Instantly share code, notes, and snippets.

@vidma
Last active August 29, 2015 14:16
Show Gist options
  • Save vidma/8e13f793bdb56bcfbfa0 to your computer and use it in GitHub Desktop.
Save vidma/8e13f793bdb56bcfbfa0 to your computer and use it in GitHub Desktop.

spark notes

sc. [tab] - show available methods

issues

  • dependency issues? parquet avro, ... specificRecord not present...

— Spark's sort-based shuffle is affected by a kernel bug Spark's sort-based shuffle is affected by a kernel bug (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2cb4b05e7647891b46b91c07c9a60304803d1688). The kernel bug was fixed in RHEL/CentOS 6.2. Note: CDH defaults to hash-based shuffle.

not nice: Type mismatch in Spark shell when using case class defined in shell https://issues.apache.org/jira/browse/SPARK-1199

don't care: for python only: https://issues.apache.org/jira/browse/SPARK-5363 case class cannot be used as key for reduce: not nice, but not that horrible- just don't define a case class in REPL.

Don't apply accumulator updates multiple times for tasks in result stages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment