https://s3.amazonaws.com/amazon-reviews-pds/readme.html
scala> spark.sql("select count(*) from amazon_reviews").show
+---------+
| count(1)|
+---------+
|160796570|
+---------+
import com.uber.hoodie.common.model.HoodieLogFile; | |
import com.uber.hoodie.common.table.log.HoodieLogFileReader; | |
import com.uber.hoodie.common.table.log.block.HoodieAvroDataBlock; | |
import com.uber.hoodie.common.table.log.block.HoodieLogBlock; | |
import com.uber.hoodie.common.table.log.block.HoodieLogBlock.HoodieLogBlockType; | |
import com.uber.hoodie.common.util.FSUtils; | |
import com.uber.hoodie.common.util.ParquetUtils; | |
import com.uber.hoodie.exception.HoodieIOException; | |
import java.io.IOException; | |
import java.util.ArrayList; |
07:26:38 [Cache]$ RC_NUM=rc6 | |
# Checksums and Signatures OK | |
07:26:42 [Cache]$ shasum -a 512 hudi-0.5.0-incubating-${RC_NUM}.src.tgz > sha512 | |
07:26:58 [Cache]$ diff sha512 hudi-0.5.0-incubating-${RC_NUM}.src.tgz.sha512.txt | wc -l | |
0 | |
07:27:19 [Cache]$ gpg --verify hudi-0.5.0-incubating-${RC_NUM}.src.tgz.asc.txt hudi-0.5.0-incubating-${RC_NUM}.src.tgz | |
gpg: Signature made Wed Oct 16 03:34:37 2019 PDT | |
gpg: using RSA key AF9BAF79D311A3D3288E583F24A499037262AAA4 |
#user nobody; | |
worker_processes 1; | |
error_log /tmp/ngnix-error.log; | |
#error_log logs/error.log notice; | |
#error_log logs/error.log info; | |
pid /tmp/nginx.pid; |
#user nobody; | |
worker_processes 1; | |
error_log /tmp/ngnix-error.log; | |
#error_log logs/error.log notice; | |
#error_log logs/error.log info; | |
pid /tmp/nginx.pid; |
RC_NUM=rc1 | |
RC_VERSION=0.9.0 | |
# Checksums and Signatures OK | |
shasum -a 512 hudi-${RC_VERSION}-${RC_NUM}.src.tgz > sha512 | |
diff sha512 hudi-${RC_VERSION}-${RC_NUM}.src.tgz.sha512 | wc -l | |
0 | |
https://s3.amazonaws.com/amazon-reviews-pds/readme.html
scala> spark.sql("select count(*) from amazon_reviews").show
+---------+
| count(1)|
+---------+
|160796570|
+---------+
https://s3.amazonaws.com/amazon-reviews-pds/readme.html
vmacs:amazon-reviews vs$ find . -type f | cut -d/ -f2 | sort | uniq -c
10 product_category=Apparel
10 product_category=Automotive
10 product_category=Baby
10 product_category=Beauty
10 product_category=Books
10 product_category=Camera
TL;DR :
BLOOM_INDEX
still significantly outperforms. But we should really step on the gas for RFC-15 like efforts/RFC-08 to make this much fasterhttps://microsoft.github.io/hyperspace/docs/ug-quick-start-guide/
>>>> TestBootstrap : | |
files: | |
[file:/tmp/junit4111271187227693299/data/datestr=2020%252F04%252F03/part-00000-6d89c75e-d3e2-49ee-9b89-db88bb4dbb36.c000.snappy.parquet, | |
file:/tmp/junit4111271187227693299/data/datestr=2020%252F04%252F03/part-00001-6d89c75e-d3e2-49ee-9b89-db88bb4dbb36.c000.snappy.parquet, | |
file:/tmp/junit4111271187227693299/data/datestr=2020%252F04%252F01/part-00000-6d89c75e-d3e2-49ee-9b89-db88bb4dbb36.c000.snappy.parquet, | |
file:/tmp/junit4111271187227693299/data/datestr=2020%252F04%252F01/part-00001-6d89c75e-d3e2-49ee-9b89-db88bb4dbb36.c000.snappy.parquet, | |
file:/tmp/junit4111271187227693299/data/datestr=2020%252F04%252F02/part-00000-6d89c75e-d3e2-49ee-9b89-db88bb4dbb36.c000.snappy.parquet, | |
file:/tmp/junit4111271187227693299/data/datestr=2020%252F04%252F02/part-00001-6d89c75e-d3e2-49ee-9b89-db88bb4dbb36.c000.snappy.parquet] | |
numVersions:2 | |
numFiles:6 |
hudi:hoodie_benchmark->desc | |
╔═════════════════════════════════════════════════╤══════════════════════════════════════════════════════════════════════════════╗ | |
║ Property │ Value ║ | |
╠═════════════════════════════════════════════════╪══════════════════════════════════════════════════════════════════════════════╣ | |
║ basePath │ file:/Users/vs/Cache/hudi-test-data/output-mor-smoke/org.apache.hudi ║ | |
╟─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────╢ | |
║ metaPath │ file:/Users/vs/Cache/hudi-test-data/output-mor-smoke/org.apache.hudi/.hoodie ║ | |
╟─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────╢ | |
║ fileSystem │ file |