Skip to content

Instantly share code, notes, and snippets.

@Midnighter
Last active August 29, 2015 13:57
Show Gist options
  • Save Midnighter/9567746 to your computer and use it in GitHub Desktop.
Save Midnighter/9567746 to your computer and use it in GitHub Desktop.
object serialization for beanstalkc
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Speed Tests for Object Serialization"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pickle\n",
"import cPickle\n",
"import json\n",
"\n",
"import yaml\n",
"import beanstalkc\n",
"import umsgpack\n",
"\n",
"from uuid import uuid4\n",
"\n",
"from yaml import (CLoader, CDumper)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`beanstalkc` by default uses `PyYAML` to turn strings into Python objects. `beanstalkd` allows to transmit only strings. So we have a couple of options to serialize Python objects to strings and back again. Consider that both `pickle` (through `cPickle`) and `yaml` (via `libyaml`) have `C` versions and that `JSON` is limited to basic types."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First start a local `beanstalkd` queue:\n",
"\n",
" beanstalkd -l localhost\n",
" \n",
"or adjust the arguments to match in the following."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"queue = beanstalkc.Connection(host=\"127.0.0.1\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"lib_q = beanstalkc.Connection(host=\"127.0.0.1\", parse_yaml=CLoader)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pure input/output using PyYAML with and without libyaml does not affect speed since the transmitted strings are not touched."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"content = \"\"\"\n",
"- This\n",
"- is\n",
"- a\n",
"- yaml\n",
"- list\n",
"\"\"\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"queue.put(content)\n",
"job = queue.reserve()\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 73.9 \u00b5s per loop\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"lib_q.put(content)\n",
"job = lib_q.reserve()\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 75.1 \u00b5s per loop\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Getting information from `beanstalkd`, however, is sped up quite a bit."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit queue.stats()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100 loops, best of 3: 6.32 ms per loop\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit lib_q.stats()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 64.1 \u00b5s per loop\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A larger job that needs to be transformed to a string."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"unique_id = str(uuid4())\n",
"unique_id"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"'caadbd6d-9155-4809-a639-6070f0a72b53'"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"job_desc = {\n",
" \"name\": \"fitzroy\",\n",
" \"id\": unique_id,\n",
" \"discrete\": 2435466673,\n",
" \"real\": 34.23552,\n",
" \"tuple\": (43, 34),\n",
" \"dict\": {\"yes\": True, \"no\": False, \"I don't know\": None},\n",
" \"list\": [4, 2, None, (True, False), [234.2534, \"yay\"]]\n",
" }"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"`pickle`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"p_dump = pickle.dumps(job_desc, protocol=pickle.HIGHEST_PROTOCOL)\n",
"queue.put(p_dump)\n",
"job = queue.reserve()\n",
"p_load = pickle.loads(job.body)\n",
"job.delete()\n",
"p_load"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"{'dict': {\"I don't know\": None, 'no': False, 'yes': True},\n",
" 'discrete': 2435466673,\n",
" 'id': 'caadbd6d-9155-4809-a639-6070f0a72b53',\n",
" 'list': [4, 2, None, (True, False), [234.2534, 'yay']],\n",
" 'name': 'fitzroy',\n",
" 'real': 34.23552,\n",
" 'tuple': (43, 34)}"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(p_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"219"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit pickle.dumps(job_desc, protocol=pickle.HIGHEST_PROTOCOL)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 74.1 \u00b5s per loop\n"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit pickle.loads(p_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 38.7 \u00b5s per loop\n"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit \n",
"queue.put(pickle.dumps(job_desc, protocol=pickle.HIGHEST_PROTOCOL))\n",
"job = queue.reserve()\n",
"p_load = pickle.loads(job.body)\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 233 \u00b5s per loop\n"
]
}
],
"prompt_number": 15
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"`cPickle`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"cp_dump = cPickle.dumps(job_desc, protocol=pickle.HIGHEST_PROTOCOL)\n",
"queue.put(cp_dump)\n",
"job = queue.reserve()\n",
"cp_load = cPickle.loads(job.body)\n",
"job.delete()\n",
"cp_load"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 16,
"text": [
"{'dict': {\"I don't know\": None, 'no': False, 'yes': True},\n",
" 'discrete': 2435466673,\n",
" 'id': 'caadbd6d-9155-4809-a639-6070f0a72b53',\n",
" 'list': [4, 2, None, (True, False), [234.2534, 'yay']],\n",
" 'name': 'fitzroy',\n",
" 'real': 34.23552,\n",
" 'tuple': (43, 34)}"
]
}
],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(cp_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
"211"
]
}
],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit cPickle.dumps(job_desc, protocol=pickle.HIGHEST_PROTOCOL)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 3.55 \u00b5s per loop\n"
]
}
],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit cPickle.loads(cp_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 3.14 \u00b5s per loop\n"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit \n",
"queue.put(cPickle.dumps(job_desc, protocol=pickle.HIGHEST_PROTOCOL))\n",
"job = queue.reserve()\n",
"cp_load = cPickle.loads(job.body)\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 160 \u00b5s per loop\n"
]
}
],
"prompt_number": 20
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"`json`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"j_dump = json.dumps(job_desc)\n",
"queue.put(j_dump)\n",
"job = queue.reserve()\n",
"j_load = json.loads(job.body)\n",
"job.delete()\n",
"j_load"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 21,
"text": [
"{u'dict': {u\"I don't know\": None, u'no': False, u'yes': True},\n",
" u'discrete': 2435466673,\n",
" u'id': u'caadbd6d-9155-4809-a639-6070f0a72b53',\n",
" u'list': [4, 2, None, [True, False], [234.2534, u'yay']],\n",
" u'name': u'fitzroy',\n",
" u'real': 34.23552,\n",
" u'tuple': [43, 34]}"
]
}
],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(j_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
"240"
]
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit json.dumps(job_desc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 8.44 \u00b5s per loop\n"
]
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit json.loads(j_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 9.69 \u00b5s per loop\n"
]
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit \n",
"queue.put(json.dumps(job_desc))\n",
"job = queue.reserve()\n",
"j_load = json.loads(job.body)\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 231 \u00b5s per loop\n"
]
}
],
"prompt_number": 25
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"`umsgpack`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"u_dump = umsgpack.dumps(job_desc)\n",
"queue.put(u_dump)\n",
"job = queue.reserve()\n",
"u_load = umsgpack.loads(job.body)\n",
"job.delete()\n",
"u_load"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
"text": [
"{'dict': {\"I don't know\": None, 'no': False, 'yes': True},\n",
" 'discrete': 2435466673,\n",
" 'id': 'caadbd6d-9155-4809-a639-6070f0a72b53',\n",
" 'list': [4, 2, None, [True, False], [234.2534, 'yay']],\n",
" 'name': 'fitzroy',\n",
" 'real': 34.23552,\n",
" 'tuple': [43, 34]}"
]
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(u_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
"text": [
"159"
]
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit umsgpack.dumps(job_desc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 42.7 \u00b5s per loop\n"
]
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit umsgpack.loads(u_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 51.1 \u00b5s per loop\n"
]
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit \n",
"queue.put(umsgpack.dumps(job_desc))\n",
"job = queue.reserve()\n",
"u_load = umsgpack.loads(job.body)\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 178 \u00b5s per loop\n"
]
}
],
"prompt_number": 30
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"`yaml` (without `libyaml`)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"y_dump = yaml.dump(job_desc)\n",
"queue.put(y_dump)\n",
"job = queue.reserve()\n",
"y_load = yaml.load(job.body)\n",
"job.delete()\n",
"y_load"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 31,
"text": [
"{'dict': {\"I don't know\": None, 'no': False, 'yes': True},\n",
" 'discrete': 2435466673,\n",
" 'id': 'caadbd6d-9155-4809-a639-6070f0a72b53',\n",
" 'list': [4, 2, None, (True, False), [234.2534, 'yay']],\n",
" 'name': 'fitzroy',\n",
" 'real': 34.23552,\n",
" 'tuple': (43, 34)}"
]
}
],
"prompt_number": 31
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(y_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
"text": [
"245"
]
}
],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit yaml.dump(job_desc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 1.13 ms per loop\n"
]
}
],
"prompt_number": 33
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit yaml.load(y_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100 loops, best of 3: 1.95 ms per loop\n"
]
}
],
"prompt_number": 34
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"queue.put(yaml.dump(job_desc))\n",
"job = queue.reserve()\n",
"y_load = yaml.load(job.body)\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100 loops, best of 3: 3.48 ms per loop\n"
]
}
],
"prompt_number": 35
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"`yaml` (with `libyaml`)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"cy_dump = yaml.dump(job_desc, Dumper=CDumper)\n",
"queue.put(cy_dump)\n",
"job = queue.reserve()\n",
"cy_load = yaml.load(job.body, Loader=CLoader)\n",
"job.delete()\n",
"cy_load"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 36,
"text": [
"{'dict': {\"I don't know\": None, 'no': False, 'yes': True},\n",
" 'discrete': 2435466673,\n",
" 'id': 'caadbd6d-9155-4809-a639-6070f0a72b53',\n",
" 'list': [4, 2, None, (True, False), [234.2534, 'yay']],\n",
" 'name': 'fitzroy',\n",
" 'real': 34.23552,\n",
" 'tuple': (43, 34)}"
]
}
],
"prompt_number": 36
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(cy_dump)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": [
"245"
]
}
],
"prompt_number": 37
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit yaml.dump(job_desc, Dumper=CDumper)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 315 \u00b5s per loop\n"
]
}
],
"prompt_number": 38
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit yaml.load(job.body, Loader=CLoader)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 284 \u00b5s per loop\n"
]
}
],
"prompt_number": 39
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"queue.put(yaml.dump(job_desc, Dumper=CDumper))\n",
"job = queue.reserve()\n",
"cy_load = yaml.load(job.body, Loader=CLoader)\n",
"job.delete()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 997 \u00b5s per loop\n"
]
}
],
"prompt_number": 40
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Conclusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Speed-wise, `cPickle` seems the clear winner with `JSON` surprisingly as a close second. `cPickle` also allows serialization of any kind of Python object unlike `JSON`. Size-wise `umsgpack` wins but due to its pure Python implementation (as of today) it cannot compete in speed."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment