Warning
This was written in 2005 so may be out of date. Originally published on the old Plone documentation section: https://plone.org/documentation/kb-old/debug-zodb-bloat/
Having spent a lot of time tracking down the cause of ZODB bloat in an Archetypes application I thought I'd share my experience in case it was useful to anyone else.
First step was to analyse the extent of the bloat. The 'analyze.py' script in the Zope 'bin' directory (found in the ZODB.scripts egg) allows you to see what sort of objects are using up space in your database and how many are current or old revisions. If you pack the db then add an object and analyze.py you can see how much bloat is being caused by the number of old object revisions around. add objects and analyze a few times and you can see what sort of objects are the cause of the bloat.
(The version of analyze.py shipped with Zope 2.10 (Plone < 4) is broken, download the latest versions of ZODB scripts from Zope svn if you are using Plone 3.):
$ bin/zopepy -m ZODB.scripts.analyze var/filestorage/Data.fs Processed 10213 records in 162 transactions Average record size is 409.77 bytes Average transaction size is 25833.09 bytes Types used: Class Name Count TBytes Pct AvgSize ---------------------------------------------- ------- --------- ----- ------- AccessControl.User.User 1 137 0.0% 137.00 AccessControl.User.UserFolder 1 65 0.0% 65.00 App.ApplicationManager.ApplicationManager 2 480 0.0% 240.00 App.Product.Product 150 174890 4.2% 1165.93 App.Product.ProductFolder 86 223839 5.3% 2602.78 BTrees.IIBTree.IIBTree 1300 138462 3.3% 106.51 BTrees.IIBTree.IISet 6 1526 0.0% 254.33 BTrees.IIBTree.IITreeSet 142 10144 0.2% 71.44 BTrees.IOBTree.IOBTree 586 149196 3.6% 254.60 BTrees.IOBTree.IOBucket 461 423508 10.1% 918.67 ... ============================================== ======= ========= ===== ======= Total Transactions 162 25.23k Total Records 10213 4086k 100.0% 409.77 Current Objects 8550 2870k 70.2% 343.77 Old Objects 1663 1216k 29.8% 749.08
From this I could see that BTrees.IOBtree.IOBucket
objects were the culprit. I know they're used in the catalog, but where?
fsdump.py will print (hexadecimal) oid, size and class for each record of each transaction in your ZODB:
$ bin/zopepy -c "import sys; from ZODB.FileStorage import fsdump; fsdump.fsdump(sys.argv[3])" var/filestorage/Data.fs Trans #00000 tid=0382427a3a7f7022 time=2009-11-23 19:06:13.710423 offset=52 status=' ' user='' description='initial database creation' data #00000 oid=0000000000000000 size=66 class=persistent.mapping.PersistentMapping Trans #00001 tid=0382427a3a91e411 time=2009-11-23 19:06:13.727317 offset=215 status=' ' user='' description='Created Zope Application' data #00000 oid=0000000000000000 size=92 class=persistent.mapping.PersistentMapping data #00001 oid=0000000000000001 size=207 class=OFS.Application.Application data #00002 oid=0000000000000003 size=81 class=App.ApplicationManager.ApplicationManager data #00003 oid=0000000000000004 size=39 class=App.Product.ProductFolder data #00004 oid=0000000000000002 size=65 class=AccessControl.User.UserFolder data #00005 oid=0000000000000005 size=63 class=Persistence.mapping.PersistentMapping ...
analyze.py shows how to open a file storage and inspect the contents. So I iterated to the transaction in question and listed the records:
fs = FileStorage(path_to_Data_fs, read_only=1) fsi = fs.iterator() TCOUNT = 2000 # or whatever for n in xrange(TCOUNT): fsi.next() txn = fsi.next() records = list(txn)
You can get the size of each record and the oid using [(len(rec.data),rec.oid) for rec in records]
From a Zope debug console you can get the object with ob = app._p_jar[oid]
For some objects though (like IOBuckets) all you get is the c data structure back, which is not all that helpful for our purposes.
I needed to get the path of the object represented. Fortunately ZODB gives you the tools to make a good guess at it. Using the attached utility methods you can build first a map of object references and then try to reconstruct the object path:
from inspectZodbUtils import buildRefmap, doSearch target = rec.oid # assuming rec is the record your interested in refmap = buildRefmap(fs) path, additionals = doSearch(target,refmap) print path
use app._p_jar[oid]
from a zope debug console to see what sort of object it is. (This is the packed oid, use ZODB.utils.p64(0x00000000000123f)
when working with a hex oid from fsdump.py.