Last active
December 1, 2022 20:44
-
-
Save billdueber/e04f49409b1588968f5cee64a2c50dfb to your computer and use it in GitHub Desktop.
Slowness of parsing a string read from a gzipped file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
truffleruby 22.3.0, like ruby 3.0.3, GraalVM CE Native [x86_64-darwin] | |
Base unit is an array of 20 integers | |
JSON-decode a string encoding an array of 10 of those base units. | |
Calculating ------------------------------------- | |
JSON 10 plain 6.817k (±14.0%) i/s - 133.364k in 19.995768s | |
JSON 10 gzdata 1.204k (±28.0%) i/s - 15.611k in 20.047997s | |
JSON 10 flattened 6.954k (± 9.6%) i/s - 138.006k in 20.090469s | |
Comparison: | |
JSON 10 flattened: 6954.1 i/s | |
JSON 10 plain: 6817.4 i/s - same-ish: difference falls within error | |
JSON 10 gzdata: 1203.7 i/s - 5.78x (± 0.00) slower | |
JSON-decode a string encoding an array of 500 of those base units. | |
Calculating ------------------------------------- | |
JSON 500 plain 111.994 (± 8.9%) i/s - 2.224k in 20.040061s | |
JSON 500 gzdata 0.643 (± 0.0%) i/s - 13.000 in 20.250865s | |
JSON 500 flattened 105.381 (±10.4%) i/s - 2.079k in 20.000481s | |
Comparison: | |
JSON 500 plain: 112.0 i/s | |
JSON 500 flattened: 105.4 i/s - same-ish: difference falls within error | |
JSON 500 gzdata: 0.6 i/s - 174.20x (± 0.00) slower |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'json' | |
require 'zlib' | |
require 'tmpdir' | |
require 'benchmark' | |
require 'oj' | |
LINES = (ARGV.shift || 300).to_i | |
BMTIMES = (ARGV.shift || 3).to_i | |
PLAIN = Dir.tmpdir + "/test.json" | |
GZIP = Dir.tmpdir + "/test.json.gz" | |
arr = (1..LINES).each_with_object([]) do |i, a| | |
a << (i..(i+20)).to_a | |
end | |
File.open(PLAIN, "w:utf-8") {|f| f.puts arr.to_json} | |
Zlib::GzipWriter.open(GZIP) {|f| f.puts arr.to_json} | |
plain_data = File.read(PLAIN) | |
gzdata = Zlib::GzipReader.open(GZIP).read | |
flattened = Truffle::Debug.flatten_string(gzdata) | |
forced_copy = gzdata + " " | |
puts "\n" + RUBY_DESCRIPTION | |
puts "Benchmarking with an array of #{LINES} 20-element arrays (repeat #{BMTIMES} times)" | |
puts "\nBEGIN STDLIB JSON" | |
Benchmark.bm do |x| | |
BMTIMES.times do | |
puts "\n" | |
x.report("%-25s" % "Plain" ) do | |
3.times do | |
json = JSON.parse(plain_data) | |
end | |
end | |
x.report('%-25s' % "Previously gzipped") do | |
3.times do | |
json = JSON.parse(gzdata) | |
end | |
end | |
x.report("%-25s" % "Flattened" ) do | |
3.times do | |
json = JSON.parse(flattened) | |
end | |
end | |
x.report('%-25s' % "Gzipped/forced copy") do | |
3.times do | |
json = JSON.parse(forced_copy) | |
end | |
end | |
end | |
end | |
puts "\n\nBEGIN Oj" | |
Oj.default_options = {:mode => :compat } | |
BMTIMES.times do | |
Benchmark.bm do |x| | |
puts "\n" | |
x.report('%-25s' % "Plain" ) do | |
3.times do | |
json = Oj.load(plain_data) | |
end | |
end | |
x.report('%-25s' % "Previously gzipped") do | |
3.times do | |
json = Oj.load(gzdata) | |
end | |
end | |
x.report('%-25s' % "Gzipped/forced copy") do | |
3.times do | |
json = Oj.load(forced_copy) | |
end | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment