Skip to content

Instantly share code, notes, and snippets.

@petrovg
Created August 12, 2015 15:50
Show Gist options
  • Select an option

  • Save petrovg/d5e705e710efeb822141 to your computer and use it in GitHub Desktop.

Select an option

Save petrovg/d5e705e710efeb822141 to your computer and use it in GitHub Desktop.
How to broadcast your JSON ObjectMapper across a Spark cluster, so that it's only instantiated once on each node. The ObjectMapper is supposed to be reused, but this is not trivial in Spark. Fortunately, Scala's lazy values make this super easy...
class MapperHolder extends Serializable {
lazy val mapper = {
val m = new ObjectMapper with ScalaObjectMapper
m.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
m.registerModule(DefaultScalaModule)
m
}
}
val holderBcast = sc.broadcast(new MapperHolder())
// This function will then use the broadcast mapper, which would be instantiated on the
// local node when it's first requested - saving time and memory
def toJson(l: String): LogEntry = {
holderBcast.value.mapper.readValue[LogEntry](l)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment