h1. How to create patched version of Hive JDBC driver that will work with Spark h2. Goal Make it possible to successfully use statement like
spark.read.jdbc(jdbcUrl, query, props).show()
when JDBC URL looks like jdbc:hive2://the-host:10000/the-namespace
Unfortunately this code does not work throwing various exceptions from the driver layer.
h2. How to make the patch h3. Clone hive from git repository
git clone https://github.com/apache/hive.git
h3. Checkout the relevant revistion
git checkout d81c41c4f54160376c2a1b5186d5ceb7ef29a770
h3. Apply patch
git apply --check fix_hive_jdbc.patch
h3. Compile Unfortunately maven build fails for this version due to lack of the third party dependencies in central maven repository. So, the easiest whay I found is to
Create directory ./jdbc/jars
and put there the following files:
commons-logging-1.1.3.jar
hive-jdbc-1.2.1.jar
hive-service-1.2.1.jar
libthrift-0.9.3.jar
The files can be found ini the central maven repository.
Put build_patch.sh
under src/java/
, grant it execute permissions using chmod +x src/java/build_patch.sh
and run it from directory src/java/
.
This script should create hive-jdbc-1.2.1_patch.jar
into directory ./jdbc/jars
.
h3. Put patch to spark cluster
The patch should be copied to all nodes of spark cluster under ./spark-SPARK_VERSION-bin-hadoopHADOOP_VERSION/jars
instead of hive-jdbc-1.2.1.spark2.jar
.