Follow the doc https://hudi.apache.org/docs/docker_demo. It's sufficient to finish Step 3: Sync with Hive
and then continue to next sections.
Follow the doc https://github.com/prestodb/presto/blob/master/README.md to build and launch your local presto server.
First, copy hadoop configuration files from containers to your local machine; eg:
mkdir -p /tmp/hadoop
docker cp adhoc-1:/etc/hadoop/core-site.xml /tmp/hadoop/core-site.xml
docker cp adhoc-1:/etc/hadoop/hdfs-site.xml /tmp/hadoop/hdfs-site.xml
Secondly, update hudi connector configuration (presto-main/etc/catalog/hudi.properties
); eg:
connector.name=hudi
hive.config.resources=/tmp/hadoop/core-site.xml,/tmp/hadoop/hdfs-site.xml
hive.metastore.uri=thrift://localhost:9083
Then relaunch your local presto server. Now, you can use presto-cli to access hudi tables:
$ presto-cli/target/presto-cli-*-executable.jar --catalog hudi --schema default
presto:default> show tables;
...
presto:default> select * from stock_ticks_cow;
...
You can deploy the hadoop cluster on a remote machine in case you do not have a local machine with enough memory. If so, update your local /etc/hosts by adding
127.0.0.1 namenode
127.0.0.1 datanode1
127.0.0.1 hivemetastore
and then create a ssh channel (let's suppose your remote machine ip is 10.20.30.40
)
ssh -N -L 9083:10.20.30.40:9083 -L 8020:10.20.30.40:8020 -L 50010:10.20.30.40:50010 10.20.30.40
If you meet any error like readDirect unsupported in RemoteBlockReader
, insert the configuration below to
your local /tmp/hadoop/hdfs-site.xml
:
<property><name>dfs.client.use.legacy.blockreader</name><value>false</value></property>