Created
August 18, 2016 07:55
Using SQLTransformer to join DataFrames in ML Pipeline
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0 | |
/_/ | |
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) | |
Type in expressions to have them evaluated. | |
Type :help for more information. | |
scala> import org.apache.spark.ml.feature.SQLTransformer | |
import org.apache.spark.ml.feature.SQLTransformer | |
scala> val df1 = spark.createDataFrame(Seq((1, "foo"), (2, "bar"), (3, "baz"))).toDF("id", "text") | |
df1: org.apache.spark.sql.DataFrame = [id: int, text: string] | |
scala> df1.createOrReplaceTempView("table1") | |
scala> val df2 = spark.createDataFrame(Seq((3, 4), (2, 2), (3, 5))).toDF("id", "num") | |
df2: org.apache.spark.sql.DataFrame = [id: int, num: int] | |
scala> val st = new SQLTransformer() | |
st: org.apache.spark.ml.feature.SQLTransformer = sql_35c801a50fba | |
scala> st.setStatement("SELECT t1.id, t1.text, th.num FROM __THIS__ th JOIN table1 t1 ON th.id == t1.id") | |
res1: st.type = sql_35c801a50fba | |
scala> st.transform(df2).show | |
+---+----+---+ | |
| id|text|num| | |
+---+----+---+ | |
| 3| baz| 4| | |
| 2| bar| 2| | |
| 3| baz| 5| | |
+---+----+---+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment