Skip to content

Instantly share code, notes, and snippets.

@mikeschinkel
Last active August 23, 2024 11:51
Show Gist options
  • Save mikeschinkel/14106e1aaa2b66f05ccce73b1af336e1 to your computer and use it in GitHub Desktop.
Save mikeschinkel/14106e1aaa2b66f05ccce73b1af336e1 to your computer and use it in GitHub Desktop.
Go json.Unmarshal() vs. PHP json_decode() performance a large JSON file

GoLang json.Unmarshal() vs. PHP json_decode()

To evaluate if Go json.Unmarshal() is faster or slower than PHP json_decode() for arbitrary JSON I decided to run a quick benchmark on my 2015 MacBook Pro (Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz).

I used this ~25Mb JSON file and ran the attached two program under Go 1.22 and PHP 8.3.2, respectively.

My benchmarks were very unscientific but I was just looking for orders of magnitude.

Here are the results from three different runs, each:

Go (avg 30.82)

  1. Unmarshalling took 31.082830437s for 100 iterations
  2. Unmarshalling took 30.713377126s for 100 iterations
  3. Unmarshalling took 30.665393891s for 100 iterations

PHP (avg 16.59)

  1. Decoding took 16.25067782402 seconds for 100 iterations
  2. Decoding took 17.178377866745 seconds for 100 iterations
  3. Decoding took 16.336436033249 seconds for 100 iterations

PHP is faster than Go?!? (86% faster, or takes only 54% as long)

You may be suprised becayse you know PHP is interpretted and Go is compiled and you might assume that would make PHP much faster. But the fact is the PHP's json_decode() was written in C, and Go's json.UnMarshal() uses reflection, especially for loading a type of any.

Optimizing Go

Go there are ways of making Go's JSON performance faster. Just follow the link to learn more.

I decided to explore of a few of them and found the main2.go which creates a slice of struct rather than use any shaved 22% off the any time, or Unmarshalling took 23.9677021s for 100 iterations.

Suprisingly, at least to me, switching to json.RawMessage in main3.go for the sub-structs Actor, Repo and Payload made effectively no difference from main2.go (Unmarshalling took 23.286783374s for 100 iterations)

Further, dropping the fields Actor, Repo and Payload in main4.go also made effectively no difference from main2.go or main3.go. (Unmarshalling took 22.251595239s for 100 iterations)

So it seems that ~22 seconds for 100 iterations is a floor for json.Unmarshal() for the given JSON file. That tells me if you need better performance then you'll need to evaluate one of the "fast JSON" packages mentioned here.

Hope this helps.

-Mike

P.S. Someone claimed the results would be different if I had used "object" in PHP, so I added a main2.php to test it. I had to increase the memory limit though, which I doubled to 512M from 256M.

The results were:

  1. Decoding took 17.953987836838 seconds for 100 iterations
  2. Decoding took 17.878257989883 seconds for 100 iterations
  3. Decoding took 17.94038105011 seconds for 100 iterations

The difference for objects from these results is %7.5 slower, or 17.92 vs. 16.59 seconds on average.

package main
import (
"encoding/json"
"fmt"
"os"
"time"
)
const iterations = 100
func main() {
filename := "large-file.json"
// Adjust the number of iterations as needed
// Read the JSON file outside of the timing loop
data, err := os.ReadFile(filename)
if err != nil {
fmt.Printf("failed to read file: %v\n", err)
return
}
start := time.Now() // Start the timer
for i := 0; i < iterations; i++ {
var jsonData any // Use this for JSON arrays
// var jsonData map[string]interface{} // Use this instead if the JSON root is an object
if err := json.Unmarshal(data, &jsonData); err != nil {
fmt.Printf("failed to unmarshal json: %v\n", err)
return
}
}
duration := time.Since(start)
fmt.Printf("Unmarshalling took %v for %d iterations\n", duration, iterations)
}
<?php
ini_set('memory_limit', '256M');
$filename = 'large-file.json';
$iterations = 100; // Adjust the number of iterations as needed
$data = file_get_contents($filename);
if ($data === false) {
echo "Failed to read file\n";
return;
}
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
$jsonData = json_decode($data, true); // Pass true for associative array, false for object
if ($jsonData === null) {
echo "Failed to decode JSON\n";
return;
}
}
$duration = microtime(true) - $start;
echo "Decoding took $duration seconds for $iterations iterations\n";
package main
import (
"encoding/json"
"fmt"
"os"
"time"
)
type Item struct {
Id string `json:"id"`
Type string `json:"type"`
Actor struct {
Id int `json:"id"`
Login string `json:"login"`
GravatarId string `json:"gravatar_id"`
Url string `json:"url"`
AvatarUrl string `json:"avatar_url"`
} `json:"actor"`
Repo struct {
Id int `json:"id"`
Name string `json:"name"`
Url string `json:"url"`
} `json:"repo"`
Payload struct {
Ref string `json:"ref"`
RefType string `json:"ref_type"`
MasterBranch string `json:"master_branch"`
Description string `json:"description"`
PusherType string `json:"pusher_type"`
} `json:"payload"`
Public bool `json:"public"`
CreatedAt time.Time `json:"created_at"`
}
func main() {
filename := "large-file.json"
// Adjust the number of iterations as needed
// Read the JSON file outside of the timing loop
data, err := os.ReadFile(filename)
if err != nil {
fmt.Printf("failed to read file: %v\n", err)
return
}
start := time.Now() // Start the timer
for i := 0; i < iterations; i++ {
var jsonData []Item // Use this for JSON arrays
// var jsonData map[string]interface{} // Use this instead if the JSON root is an object
if err := json.Unmarshal(data, &jsonData); err != nil {
fmt.Printf("failed to unmarshal json: %v\n", err)
return
}
}
duration := time.Since(start)
fmt.Printf("Unmarshalling took %v for %d iterations\n", duration, iterations)
}
<?php
ini_set('memory_limit', '512M');
$filename = 'large-file.json';
$iterations = 100; // Adjust the number of iterations as needed
$data = file_get_contents($filename);
if ($data === false) {
echo "Failed to read file\n";
return;
}
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
$jsonData = json_decode($data, false); // Pass true for associative array, false for object
if ($jsonData === null) {
echo "Failed to decode JSON\n";
return;
}
}
$duration = microtime(true) - $start;
echo "Decoding took $duration seconds for $iterations iterations\n";
package main
import (
"encoding/json"
"fmt"
"os"
"time"
)
type Item struct {
Id string `json:"id"`
Type string `json:"type"`
Actor json.RawMessage `json:"actor"`
Repo json.RawMessage `json:"repo"`
Payload json.RawMessage `json:"payload"`
Public bool `json:"public"`
CreatedAt time.Time `json:"created_at"`
}
func main() {
filename := "large-file.json"
// Adjust the number of iterations as needed
// Read the JSON file outside of the timing loop
data, err := os.ReadFile(filename)
if err != nil {
fmt.Printf("failed to read file: %v\n", err)
return
}
start := time.Now() // Start the timer
for i := 0; i < iterations; i++ {
var jsonData []Item // Use this for JSON arrays
// var jsonData map[string]interface{} // Use this instead if the JSON root is an object
if err := json.Unmarshal(data, &jsonData); err != nil {
fmt.Printf("failed to unmarshal json: %v\n", err)
return
}
}
duration := time.Since(start)
fmt.Printf("Unmarshalling took %v for %d iterations\n", duration, iterations)
}
package main
import (
"encoding/json"
"fmt"
"os"
"time"
)
type Item struct {
Id string `json:"id"`
Type string `json:"type"`
Public bool `json:"public"`
CreatedAt time.Time `json:"created_at"`
}
func main() {
filename := "large-file.json"
// Adjust the number of iterations as needed
// Read the JSON file outside of the timing loop
data, err := os.ReadFile(filename)
if err != nil {
fmt.Printf("failed to read file: %v\n", err)
return
}
start := time.Now() // Start the timer
for i := 0; i < iterations; i++ {
var jsonData []Item // Use this for JSON arrays
// var jsonData map[string]interface{} // Use this instead if the JSON root is an object
if err := json.Unmarshal(data, &jsonData); err != nil {
fmt.Printf("failed to unmarshal json: %v\n", err)
return
}
}
duration := time.Since(start)
fmt.Printf("Unmarshalling took %v for %d iterations\n", duration, iterations)
}
@Bayashka1234
Copy link

D

@mfriedenhagen
Copy link

Hi, thanks for the write-up. Probably just a typo, on my machine the JSON file has 25 MB not GB :-)

@mfriedenhagen
Copy link

mfriedenhagen commented Mar 11, 2024

Just another data-point with Python3 (Macbook Pro 2019, 2,4 GHz Quad-Core Intel Core i5).

❯ uname -a
Darwin XXXX 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:28:58 PST 2023; root:xnu-10002.81.5~7/RELEASE_X86_64 x86_64

❯ python3 --version; for i in 1 2 3; do ./read100.py; done
Python 3.12.2
Decoding took 25.59744 seconds for 100 iterations
Decoding took 26.563069 seconds for 100 iterations
Decoding took 26.196574 seconds for 100 iterations```

```python
#!/usr/bin/env python3

import json
from datetime import datetime

iterations = 100

with open("large-file.json", "r") as h:
    raw = h.read()

start = datetime.now()

for i in range(iterations):
    data = json.loads(raw)

end = datetime.now()
duration = end - start

print(f"Decoding took {duration.total_seconds()} seconds for {iterations} iterations")

@mfriedenhagen
Copy link

Result for above main.go on my machine, roughly the same as in Python although Python's JSON module is C as well.

for i in 1 2 3; do ./main; done                                                                                           
Unmarshalling took 25.782570472s for 100 iterations
Unmarshalling took 25.999600283s for 100 iterations
Unmarshalling took 26.393734441s for 100 iterations

@mikeschinkel
Copy link
Author

mikeschinkel commented Mar 11, 2024

@mfriedenhagen — Thank you for the comments, for catching my typo (fixed!), and especially for your contribution of your Python and Go timings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment