AST-free JSON parsing

Provisional benchmarks of AST-free serialization puts my WIP branch of uPickle about ~40% faster than circe on my current set of ad-hoc benchmarks, if the encoders/decoders are cached (bigger numbers is better)

playJson Read 2761067
playJson Write 3412630
circe Read 6005895
circe Write 5205007
upickleDefault Read 4543628
upickleDefault Write 3814459
upickleLegacy Read 8393416
upickleLegacy Write 7431523

Circe is still significantly faster in the case where encoders/decoders are not cached, but I assume I just need to spend a bit of time micro-optimizing the encoder/decoder instantiation code and it's not a fundamental limitation (and more time optimizing should help the cached-encoder benchmark as well)

playJson Read 1975992
playJson Write 2811139
circe Read 4701980
circe Write 4252224
upickleDefault Read 2724334
upickleDefault Write 2443416
upickleLegacy Read 3142672
upickleLegacy Write 2878934

Jackson-module-scala is not included in the benchmarks because I couldn't figure out how to stop it from corrupting my data structure after being serialized/deserialized.

Note that in that branch, String -> Case Class and Case Class -> String are both AST-free; my upickle Readers simply implement jawn.Facade, and the upickle Writers effectively extend jawn.Facade => Unit, and so actual definition of reader/writer instances for various types looks pretty similar to what you would see if you pattern matched over the AST (Reader example, Writer example) but it can be driven directly by the parser without any intermediate AST being constructed

The patched version of jawn.Facade also gives you workflows like Case Class => Case Class, String => String (e.g. re-formatting your JSON), AST => Case Class, Case Class => AST, String => AST, AST => String all basically for free, also without any intermediate JSON AST

This looks great! I'll be curious to see how it compares to circe-algebra, which at least makes it possible to write an interpreter for circe decoders that doesn't require instantiating any AST (although I haven't actually done that yet).

In the meantime, a slightly fairer comparison would be against circe-derivation, which avoids the runtime overhead of going through Shapeless's generic representation (in addition to the AST). It's a drop-in replacement for io.circe.generic.semiauto, but when I tried changing the deps and imports here I got a bunch of compilation errors in codegen-ed code.

(Update: I was using the sbt build instead of mill (which works)—will try to get circe-derivation working here later today.)

lihaoyi/gist:5ae1d6b544d65fb8190534e8c13b8de7

travisbrown commented Mar 12, 2018 •

edited

Loading

Uh oh!

lihaoyi/gist:5ae1d6b544d65fb8190534e8c13b8de7

travisbrown commented Mar 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travisbrown commented Mar 12, 2018 •

edited

Loading