Skip to content

Instantly share code, notes, and snippets.

View JoshRosen's full-sized avatar

Josh Rosen JoshRosen

View GitHub Profile
diff --git a/.generated-mima-class-excludes b/generate-class-excludes-new
index 68d31fa..7d3b0b6 100644
--- a/.generated-mima-class-excludes
+++ b/generate-class-excludes-new
@@ -6,17 +6,13 @@ org.apache.spark.AccumulatorParam$StringAccumulatorParam$
org.apache.spark.AccumulatorParam$UpdatedBlockStatusesAccumulatorParam$
org.apache.spark.Accumulators
org.apache.spark.Accumulators#
-org.apache.spark.Accumulators#
org.apache.spark.Accumulators$
This file has been truncated, but you can view the full file.
diff --git a/.generated-mima-member-excludes b/member-excludes-new
index 1ebc496..5c4b58c 100644
--- a/.generated-mima-member-excludes
+++ b/member-excludes-new
@@ -1,3 +1,7 @@
+akka.actor.Actor.aroundPostStop
+akka.actor.Actor.aroundPreRestart
+akka.actor.Actor.aroundPreStart
+akka.actor.Actor.aroundReceive
com.esotericsoftware.reflectasm.shaded.org.objectweb.asm.ClassVisitor.api
@JoshRosen
JoshRosen / scala-lambda-serialization-with-lifted-local-defs.md
Last active June 12, 2021 16:35
Serialization of Scala closures that contain local defs

Serialization of Scala closures that contain local defs

Several Apache Spark APIs rely on the ability to serialize Scala closures. Closures may reference non-Serializable objects, preventing them from being serialized. In some cases (SI-1419 and others), however, these references are unnecessary and can be nulled out, allowing otherwise-unserializable closures to be serialized (in Spark, this nulling is performed by the ClosureCleaner).

Scala 2.12's use of Java 8 lambdas for implementing closures appears to have broken our ability to serialize closures which contain local defs. If we cannot resolve this problem, Spark will be unable to support Scala 2.12 and will be stuck on 2.10 and 2.11 forever.

As an example which illustrates this problem, the following closure has a nested localDef and is defined inside of a non-serializable class:

``

Flask-SQLAlchemy Caching

The following gist is an extract of the article Flask-SQLAlchemy Caching. It allows automated simple cache query and invalidation of cache relations through event among other features.

Usage

retrieve one object

# pulling one User object

user = User.query.get(1)

Keybase proof

I hereby claim:

  • I am joshrosen on github.
  • I am joshrosen (https://keybase.io/joshrosen) on keybase.
  • I have a public key whose fingerprint is DD10 7726 FCC2 C9F2 6BBE 0688 5CDE B147 5FD1 9FBE

To claim this, I am signing this object:

This file has been truncated, but you can view the full file.
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-core_2.11:jar:2.0.0-SNAPSHOT
[WARNING] The expression ${pom.version} is deprecated. Please use ${project.version} instead.
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ------------------------------------------------------------------------
diff --git before after
index bdf27c9..88b5282 100644
--- before
+++ after
@@ -1,3253 +1,3241 @@
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-core_2.11:jar:2.0.0-SNAPSHOT
[WARNING] The expression ${pom.version} is deprecated. Please use ${project.version} instead.
[WARNING]
<!DOCTYPE html>
<html>
<head>
<meta name="databricks-html-version" content="1">
<title>Casting structs - Databricks</title>
<meta charset="utf-8">
<meta name="google" content="notranslate">
<meta http-equiv="Content-Language" content="en">
<meta http-equiv="Content-Type" content="text/html; charset=UTF8">
@JoshRosen
JoshRosen / apply-patch.sh
Created June 24, 2016 23:10 — forked from kfish/apply-patch.sh
Apply a patch file that was produced with "git format-patch" using the patch command, and commit it using the message from the original commit.
#!/bin/bash
apply () {
filename=$1
shift
patch_args=$*
gotSubject=no
msg=""

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee