Skip to content

Instantly share code, notes, and snippets.

@hyc
Created April 9, 2016 11:49
Show Gist options
  • Save hyc/913420265895e7fcc20473264324d05c to your computer and use it in GitHub Desktop.
Save hyc/913420265895e7fcc20473264324d05c to your computer and use it in GitHub Desktop.
DB migration test
Some results from working with the blockchain DB on a 5400rpm HDD (WD20EARX), starting with blockchain.raw from 2015-12-19 (874830 blocks).
The blockchain.raw file is on a separate drive, an SSD.
Import using v0.9.4
2016-Apr-08 16:46:38.540175 End of file reached
2016-Apr-08 16:46:39.038292 Number of blocks imported: 874829
2016-Apr-08 16:46:39.038366 Finished at block: 874829 total blocks: 874830
2016-Apr-08 16:46:39.038984 Closing IO Service.
Command being timed: "./blockchain_import --data-dir /mnt/1/bitmo --database lmdb#nosync --verify off --input-file /home/hyc/Public/blockchain.raw"
User time (seconds): 235.51
System time (seconds): 61.95
Percent of CPU this job got: 6%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:12:21
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6910624
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 91009
Minor (reclaiming a frame) page faults: 1174101
Voluntary context switches: 98735
Involuntary context switches: 218044
Swaps: 0
File system inputs: 12429392
File system outputs: 65239264
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
violino:/home/software/bitmonero/build/release/bin094> ls -l /mnt/1/bitmo/lmdb
total 9349064
-rw-r--r-- 1 hyc hyc 9564078080 Apr 8 16:46 data.mdb
-rw-r--r-- 1 hyc hyc 8192 Apr 8 16:50 lock.mdb
Exporting back to .raw format, again storing the raw file on SSD
2016-Apr-08 16:55:21.023009 Using block height of source blockchain: 874829
block 874829/874829
2016-Apr-08 17:21:48.845258 Number of blocks exported: 874830
2016-Apr-08 17:21:48.850003 Largest chunk: 85111 bytes
2016-Apr-08 17:21:48.851219 Blockchain raw data exported OK
Command being timed: "./blockchain_export --data-dir /mnt/1/bitmo --output-file /home/hyc/Public/exp1.raw"
User time (seconds): 64.01
System time (seconds): 22.67
Percent of CPU this job got: 5%
Elapsed (wall clock) time (h:mm:ss or m:ss): 26:28.20
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 5821264
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 175817
Minor (reclaiming a frame) page faults: 368449
Voluntary context switches: 185151
Involuntary context switches: 29123
Swaps: 0
File system inputs: 25230016
File system outputs: 4813120
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
violino:/home/software/bitmonero/build/release/bin094> ls -l ~/Public/*.raw
-rw-r--r-- 1 hyc hyc 439221 Mar 28 00:27 /home/hyc/Public/block1.raw
-rw-r--r-- 1 hyc hyc 2463754366 Dec 19 07:04 /home/hyc/Public/blockchain.raw
-rw-r--r-- 1 hyc hyc 2463754366 Apr 8 17:21 /home/hyc/Public/exp1.raw
All of this is quite slow because the old format stores the tx indices in non-sequential order,
but the import and export procedures want access in sequential order. So there are far too many
random accesses going on.
Migrating the DB in-place using the current patch:
2016-Apr-08 17:26:55.977352 LMDB Mapsize increased. Old: 10061MiB, New: 11085MiB
2016-Apr-08 17:26:55.977689 Migrating blockchain from DB version 0 to 1 - this may take a while:
2016-Apr-08 17:26:55.977736 updating blocks, hf_versions, outputs, txs, and spent_keys tables...
2016-Apr-08 17:26:55.977776 Total number of blocks: 874830
2016-Apr-08 17:26:55.977813 block migration will update block_heights, block_info, and hf_versions...
2016-Apr-08 17:26:55.977841 migrating block_heights:
2016-Apr-08 17:27:05.433310 migrating block info:
2016-Apr-08 17:28:11.281047 migrating hf_versions:
2016-Apr-08 17:28:32.798102 Total number of outputs: 15664622
2016-Apr-08 17:28:32.798169 outputs migration will update output_amounts and output_txs...
2016-Apr-08 17:28:32.798210 migrating output_amounts:
2016-Apr-08 17:54:35.525729 migrating output_txs:
2016-Apr-08 18:09:21.103997 Total number of txs: 1393439
2016-Apr-08 18:09:21.104046 txs migration will update tx_indices, tx_outputs, and txs...
2016-Apr-08 18:09:21.104070 migrating tx_indices:
2016-Apr-08 18:15:44.070378 migrating txs and tx_outputs:
2016-Apr-08 19:41:59.214975 migrating spent_keys:
2016-Apr-08 20:19:38.443371 reorganizing from 864750
2016-Apr-08 20:19:44.402316 reorganization done
Migrating the block indices takes only a matter of seconds, because they're in sequential order
in both old and new formats. The only change is to packing efficiency, really.
The output indices are in sequential order too; migrating just takes a long time
because there's such a large volume to read and write.
The tx indices take the most time because they were stored in hash order before. The tx_indices
table itself stays in hash order, so we can migrate that as a sequential operation. But the txs
and tx_outputs tables go from hash to sequential order, which again involves a lot of random
accesses. Worse because while txs and tx_outputs are both keyed with the hash, they didn't use
the same key comparator function, so they're in different order from each other. So we read
the txs table sequentially in hash order, but we're still doing random reads from the tx_outputs
table. And both are generating the new tables in random order.
The spent_keys table is all sequential; it just takes time because there are 12.5 million of them.
In contrast, just running blockchain_import from the performance branch took just 4-1/2 minutes:
2016-Apr-08 20:32:23.495492 End of file reached
2016-Apr-08 20:32:23.960825 Number of blocks imported: 874829
2016-Apr-08 20:32:23.960894 Finished at block: 874829 total blocks: 874830
2016-Apr-08 20:32:23.962182 Closing IO Service.
Command being timed: "./blockchain_import --data-dir /mnt/1/bitmo --database lmdb#nosync --verify off --input-file /home/hyc/Public/blockchain.raw"
User time (seconds): 196.46
System time (seconds): 34.44
Percent of CPU this job got: 86%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:27.51
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6803776
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 482
Minor (reclaiming a frame) page faults: 700730
Voluntary context switches: 3633
Involuntary context switches: 202247
Swaps: 0
File system inputs: 1882160
File system outputs: 15390360
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
violino:/home/software/bitmonero/build/release/bin> ls -l /mnt/1/bitmo/lmdb
total 7073532
-rw-r--r-- 1 hyc hyc 7236210688 Apr 8 20:32 data.mdb
-rw-r--r-- 1 hyc hyc 8192 Apr 8 20:32 lock.mdb
Of course, this is faster because it's reading the .raw file from a separate drive than it's writing the DB to.
(But the v0.9.4 import was using the identical setup.)
This tells me that the current migrate function, which just attempts read all the old indices and rewrite them
again in sequential format, is the wrong approach. Instead it should just erase the old tables and then do the
equivalent of blockchain_import, regenerating all the indices from the original block and txs data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment