Skip to content

Instantly share code, notes, and snippets.

@jpountz
Last active August 31, 2016 12:17
Show Gist options
  • Save jpountz/20c2ac781f378fddba3fc605fbdb74e0 to your computer and use it in GitHub Desktop.
Save jpountz/20c2ac781f378fddba3fc605fbdb74e0 to your computer and use it in GitHub Desktop.
NYC taxi rides disk usage
total disk: 30,412,907,228
num docs: 165,346,692
stored fields: 11,049,749,404
term vectors: 0
norms: 0
docvalues: 8,722,043,518
postings: 918,100,584
prox: 0
points: 8,320,476,220
terms: 1,402,532,940
field total terms dict postings proximity points docvalues % with dv features
===== ===== ========== ======== ========= ========= ========= ======== ========
dropoff_datetime 1,989,691,235 0 0 0 1,162,957,566 826,733,669 100.0% 8bytes/1D sorted_numeric
pickup_datetime 1,740,801,044 0 0 0 1,162,087,413 578,713,631 100.0% 8bytes/1D sorted_numeric
dropoff_location 1,575,027,760 54,831 252,199,190 0 0 1,322,773,739 100.0% docs sorted_numeric
pickup_location 1,562,021,643 54,333 239,193,573 0 0 1,322,773,737 100.0% docs sorted_numeric
_uid 1,402,421,227 1,402,421,117 110 0 0 0 0.0% docs
total_amount 1,330,557,744 0 0 0 669,170,775 661,386,969 100.0% 4bytes/1D sorted_numeric
trip_distance 1,330,237,045 0 0 0 668,850,074 661,386,971 100.0% 4bytes/1D sorted_numeric
tip_amount 1,328,640,151 0 0 0 667,253,180 661,386,971 100.0% 4bytes/1D sorted_numeric
tolls_amount 1,326,996,955 0 0 0 665,609,986 661,386,969 100.0% 4bytes/1D sorted_numeric
fare_amount 1,326,774,807 0 0 0 665,387,836 661,386,971 100.0% 4bytes/1D sorted_numeric
extra 1,325,810,688 0 0 0 664,423,717 661,386,971 100.0% 4bytes/1D sorted_numeric
improvement_surcharge 850,423,053 0 0 0 664,407,582 186,015,471 100.0% 4bytes/1D sorted_numeric
mta_tax 829,761,659 0 0 0 664,413,746 165,347,913 100.0% 4bytes/1D sorted_numeric
passenger_count 747,080,635 0 0 0 664,407,086 82,673,549 100.0% 4bytes/1D sorted_numeric
payment_type 162,672,441 326 79,998,537 0 0 82,673,578 100.0% docs sorted_set
_field_names 161,851,682 819 161,850,863 0 0 0 0.0% docs
rate_code_id 132,591,553 347 49,917,589 0 0 82,673,617 100.0% docs sorted_set
vendor_id 108,551,176 294 87,882,316 0 0 20,668,566 100.0% docs sorted_set
store_and_fwd_flag 54,417,968 292 33,749,110 0 0 20,668,566 100.0% docs sorted_set
trip_type 46,981,822 292 5,644,630 0 0 41,336,900 11.6% docs sorted_set
_type 28,333,521 289 7,664,666 0 0 20,668,566 100.0% docs sorted_set
_version 194 0 0 0 0 194 100.0% numeric
_source 0 0 0 0 0 0 0.0%
total disk: 25,357,324,672
num docs: 165,346,692
stored fields: 10,935,007,394
term vectors: 0
norms: 0
docvalues: 8,308,676,779
postings: 920,168,137
prox: 0
points: 3,791,449,288
terms: 1,402,018,511
field total terms dict postings proximity points docvalues % with dv features
===== ===== ========== ======== ========= ========= ========= ======== ========
dropoff_datetime 1,826,128,520 0 0 0 999,394,851 826,733,669 100.0% 8bytes/1D sorted_numeric
pickup_datetime 1,578,095,769 0 0 0 999,382,138 578,713,631 100.0% 8bytes/1D sorted_numeric
dropoff_location 1,575,057,385 54,787 252,228,859 0 0 1,322,773,739 100.0% docs sorted_numeric
pickup_location 1,562,105,797 54,286 239,277,774 0 0 1,322,773,737 100.0% docs sorted_numeric
_uid 1,401,906,896 1,401,906,786 110 0 0 0 0.0% docs
trip_distance 1,102,963,605 0 0 0 276,229,942 826,733,663 100.0% 8bytes/1D sorted_numeric
total_amount 910,611,802 0 0 0 249,224,833 661,386,969 100.0% 8bytes/1D sorted_numeric
tip_amount 883,096,874 0 0 0 221,709,903 661,386,971 100.0% 8bytes/1D sorted_numeric
fare_amount 772,115,487 0 0 0 193,401,862 578,713,625 100.0% 8bytes/1D sorted_numeric
tolls_amount 584,116,414 0 0 0 170,749,483 413,366,931 100.0% 8bytes/1D sorted_numeric
extra 583,347,849 0 0 0 169,980,916 413,366,933 100.0% 8bytes/1D sorted_numeric
improvement_surcharge 355,854,496 0 0 0 169,839,025 186,015,471 100.0% 8bytes/1D sorted_numeric
mta_tax 335,509,244 0 0 0 170,161,340 165,347,904 100.0% 8bytes/1D sorted_numeric
passenger_count 252,272,158 0 0 0 169,598,609 82,673,549 100.0% 4bytes/1D sorted_numeric
payment_type 162,721,940 325 80,048,037 0 0 82,673,578 100.0% docs sorted_set
_field_names 161,852,249 818 161,851,431 0 0 0 0.0% docs
rate_code_id 132,535,488 346 49,861,525 0 0 82,673,617 100.0% docs sorted_set
vendor_id 110,627,048 293 89,958,189 0 0 20,668,566 100.0% docs sorted_set
store_and_fwd_flag 54,301,537 291 33,632,680 0 0 20,668,566 100.0% docs sorted_set
trip_type 46,982,057 291 5,644,866 0 0 41,336,900 11.6% docs sorted_set
_type 28,333,520 288 7,664,666 0 0 20,668,566 100.0% docs sorted_set
_version 194 0 0 0 0 194 100.0% numeric
_source 0 0 0 0 0 0 0.0%
total disk: 25,884,080,928
num docs: 165,346,692
stored fields: 11,049,749,404
term vectors: 0
norms: 0
docvalues: 8,722,043,518
postings: 918,100,584
prox: 0
points: 3,791,649,929
terms: 1,402,532,930
field total terms dict postings proximity points docvalues % with dv features
===== ===== ========== ======== ========= ========= ========= ======== ========
dropoff_datetime 1,826,128,520 0 0 0 999,394,851 826,733,669 100.0% 8bytes/1D sorted_numeric
pickup_datetime 1,577,449,873 0 0 0 998,736,242 578,713,631 100.0% 8bytes/1D sorted_numeric
dropoff_location 1,575,027,759 54,830 252,199,190 0 0 1,322,773,739 100.0% docs sorted_numeric
pickup_location 1,562,021,642 54,332 239,193,573 0 0 1,322,773,737 100.0% docs sorted_numeric
_uid 1,402,421,226 1,402,421,116 110 0 0 0 0.0% docs
trip_distance 939,688,578 0 0 0 278,301,607 661,386,971 100.0% 4bytes/1D sorted_numeric
total_amount 914,081,704 0 0 0 252,694,735 661,386,969 100.0% 4bytes/1D sorted_numeric
tip_amount 883,924,567 0 0 0 222,537,596 661,386,971 100.0% 4bytes/1D sorted_numeric
fare_amount 854,853,512 0 0 0 193,466,541 661,386,971 100.0% 4bytes/1D sorted_numeric
tolls_amount 831,732,881 0 0 0 170,345,912 661,386,969 100.0% 4bytes/1D sorted_numeric
extra 830,063,021 0 0 0 168,676,050 661,386,971 100.0% 4bytes/1D sorted_numeric
improvement_surcharge 354,562,043 0 0 0 168,546,572 186,015,471 100.0% 4bytes/1D sorted_numeric
mta_tax 334,222,960 0 0 0 168,875,047 165,347,913 100.0% 4bytes/1D sorted_numeric
passenger_count 251,617,835 0 0 0 168,944,286 82,673,549 100.0% 4bytes/1D sorted_numeric
payment_type 162,672,440 325 79,998,537 0 0 82,673,578 100.0% docs sorted_set
_field_names 161,851,681 818 161,850,863 0 0 0 0.0% docs
rate_code_id 132,591,552 346 49,917,589 0 0 82,673,617 100.0% docs sorted_set
vendor_id 108,551,175 293 87,882,316 0 0 20,668,566 100.0% docs sorted_set
store_and_fwd_flag 54,417,967 291 33,749,110 0 0 20,668,566 100.0% docs sorted_set
trip_type 46,981,821 291 5,644,630 0 0 41,336,900 11.6% docs sorted_set
_type 28,333,520 288 7,664,666 0 0 20,668,566 100.0% docs sorted_set
_version 194 0 0 0 0 194 100.0% numeric
_source 0 0 0 0 0 0 0.0%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment