First a little background info. The Berkeley DB Java Edition library used by the pools (if so configured) uses a log structured file format. What this means is that the files of the database (called log segments) are only ever appended to. Once they reach a certain size (10 MB by default), a new log segment is created and the previous log segments are never modified. If existing data is modified or deleted, this leaves unused fragments in these these database files. Once the utilization (amount of data still in use) falls under a certain level, remaining data is copied to the end of the last segment and the original segment is deleted (this is all text book log structured database).
Berkeley DB uses a btree structure, i.e. it is structured as a tree, with the actual data at the leafs and the internal nodes allowing fast search of the data.
Berkeley DB internally maintains a cache of the files. The default in dCache is to use 20% of the maximum heap size as a cache for Berkeley DB. It is a recommendation from Oracle that the cache is big enough to contain all the internal btree nodes.
Since not everything is cached, the data will have to be read from disk (or at least the file system cache). Although only the last file is ever appended to, reads happen from all the files of the database. By default the library keeps up to 100 open files, so if the database grows beyond this limit it will have to close one file and open another. In particular during pool startup in which the entire content is read in lexicographic order, this may cause a lot of file open-close cycles.
The Berkeley DB Jave Edition library has a lot of configuration settings that can be set by placing a je.properties
file inside the meta
directory. When the pool is restarted, the settings in this file are read. There are two settings I want to point out here:
je.log.fileCacheSize=100
This is the number of files to keep open. It defaults to 100, but if you have significantly more jdb files in the meta directory, it may be worth increasing this limit. Be aware though that you need a file descriptor for each open file. You should make sure to increase the OS limit on how many file descriptors the pool process may keep open.
The other relevant settings are
je.maxMemory=
je.maxMemoryPercent=20
These are equivalent. The former sets the number of bytes used for the btree cache, while the latter defines it as a percentage of the max heap size. E.g. if dcache.java.memory.heap
is set to 2048m, then 410 MB is used for the btree cache.
The question is then how to determine a good size for the cache. There are some hidden utilities you can use to do this:
/srv/ore_ndgf_org_002/pool/meta$ java -cp /usr/share/dcache/classes/je-*.jar com.sleepycat.je.util.DbPrintLog -h . -S
<DbPrintLog>
Log statistics:
type total provisional total min max avg entries
count count bytes bytes bytes bytes as % of log
MapLN 26 0 2,211 49 125 85 0
NameLN 4 0 141 31 38 35 0
FileSummaryLN 23 0 331,258 24 74,250 14,402 2.3
IN 21 0 35,582 44 4,383 1,694 0.2
BIN 394 394 242,633 70 2,130 615 1.7
DbTree 10 0 1,174 100 134 117 0
Commit 69,561 0 2,225,952 32 32 32 15.4
CkptStart 5 0 147 28 31 29 0
CkptEnd 5 0 761 61 177 152 0
Trace 12 0 1,392 47 383 116 0
FileHeader 3 0 114 38 38 38 0
DEL_LN_TX 18,833 0 1,261,811 67 67 67 8.7
INS_LN_TX 18,837 0 4,362,303 95 391 231 30.2
UPD_LN_TX 31,891 0 5,454,607 100 396 171 37.7
UPD_LN 2,340 0 542,409 18 522 231 3.7
NewBINDelta 17 17 2,487 121 364 146 0
key bytes 71,901 2,802,302 1 61 38 (19.4)
data bytes 53,068 6,910,747 1 502 130 (47.8)
Total bytes in portion of log read: 14,464,982
Total number of entries: 141,982
Per checkpoint interval info:
lnTxn ln mapLNTxn mapLN end to end end to start start to end maxLNReplay ckptEnd
21,276 790 0 8 14,711,616 14,469,073 242,543 22,066 0x41/0x47e4c0
48,261 1,550 0 8 9,724,864 9,520,804 204,060 49,813 0x42/0x43b200
0 0 0 4 1,339 664 675 0 0x42/0x43b73b
8 0 0 3 12,182 1,999 10,183 8 0x42/0x43e6d1
16 0 0 3 5,564,955 5,553,901 11,054 16 0x43/0x3a6c
0 0 0 0 0 0 0 0 0x43/0x3a6c
</DbPrintLog>
This prints some statistics about the database. In particular the two rows called key bytes and data bytes are relevant. For the following step you need the values from the avg column - i.e. 38 and 130 in this case. You also need the value of “Total number of entries” (141,982 in this case).
Now you take those values and put them into this command:
/srv/ore_ndgf_org_002/pool/meta$ java -cp /usr/share/dcache/classes/je-*.jar com.sleepycat.je.util.DbCacheSize -records 141982 -key 38 -data 130
=== Environment Cache Overhead ===
3,157,213 minimum bytes
=== Database Cache Size ===
Number of Bytes Description
--------------- -----------
11,473,200 Internal nodes only
37,341,168 Internal nodes and leaf nodes
For this very small pool it tells us that we need a bit above 11 MB to keep all the internal btree nodes cached. Oracle’s recommendation is that if the database is updated often, the cache is at least big enough to contain the internal nodes. Obviously you want to make it a bit bigger to leave room for it to grow. One could take this and configure the cache using the je.maxMemory setting. If the size is lower than the 20% of the max heap size you already use, then you don’t need to do anything (I do not suggest lowering it further).
There is one caveat though: If you increase the cache size, less free space is left on the heap. If the pool is pushed to the limit, this may actually slow down the pool as garbage collection overhead increases. You need to ensure that enough space is left in addition to the cache (possibly by increasing the max heap size). An alternative to adjusting je.maxMemory is of course to adjust the max heap size. If you make it large enough so that 20% is enough to cache the internal btree nodes, then all is well. This may however mean you assign significantly more memory to the pool than it really needs. Yeah, there are lots of things to consider :-)
If all of this is confusing and you are happy with your pools, then simply ignore all I said.
A little bonus: There is also this command:
/srv/ore_ndgf_org_002/pool/meta$ java -cp /usr/share/dcache/classes/je-*.jar com.sleepycat.je.util.DbSpace -h .
File Size (KB) % Used
-------- --------- ------
00000041 9765 4
00000042 4345 11
00000043 14 86
TOTALS 14125 6
It will tell you for each of the database files (the log segments) how big it is and what the utilization is. The Berkeley DB will try to keep the total utilization above 50%, but for a small pool like this one it cannot do it. I figure that you may find this interesting to know after reading about utilization above.
The log segment size and the utilization goal can be adjusted too, but I cannot give any sound advice on whether that’s a good idea and how to determine good values.
All of the above is only relevant if you use the Berkeley DB backend for pools.