Created
August 12, 2015 15:50
-
-
Save petrovg/d5e705e710efeb822141 to your computer and use it in GitHub Desktop.
How to broadcast your JSON ObjectMapper across a Spark cluster, so that it's only instantiated once on each node. The ObjectMapper is supposed to be reused, but this is not trivial in Spark. Fortunately, Scala's lazy values make this super easy...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class MapperHolder extends Serializable { | |
| lazy val mapper = { | |
| val m = new ObjectMapper with ScalaObjectMapper | |
| m.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) | |
| m.registerModule(DefaultScalaModule) | |
| m | |
| } | |
| } | |
| val holderBcast = sc.broadcast(new MapperHolder()) | |
| // This function will then use the broadcast mapper, which would be instantiated on the | |
| // local node when it's first requested - saving time and memory | |
| def toJson(l: String): LogEntry = { | |
| holderBcast.value.mapper.readValue[LogEntry](l) | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment