This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val conf = new SparkConf().setAppName(appName) // run on cluster | |
val ssc = new StreamingContext(conf, Seconds(5)) | |
val sc = ssc.sparkContext | |
sc.setLogLevel("ERROR") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
hive -e "insert overwrite local directory '/path/in/local/' | |
row format delimited fields terminated by ',' | |
select * from my_database.my_table" | |
cat /path/in/local/* > /another/path/in/local/my_table.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
hive -e "drop table if exists csv_dump; | |
create table csv_dump ROW FORMAT DELIMITED | |
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' | |
LOCATION '/temp/storage/path' as | |
select * from my_data_table;" | |
hadoop fs -getmerge /temp/storage/path/ /local/path/my.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val mydataframe = ... //put some data in your dataframe, friend | |
mydataframe | |
.write | |
.option("orc.compress", "snappy") | |
.mode(SaveMode.Append) | |
.orc("/this/is/an/hdfs/directory/") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val mydataframe = ... //put some data in your dataframe, friend | |
mydataframe | |
.write | |
.partitionBy("year", "month", "day", "hour") | |
.option("orc.compress", "snappy") | |
.mode(SaveMode.Append) | |
.orc("/this/is/another/hdfs/directory") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// import this guy | |
import org.apache.spark.sql.hive.HiveContext | |
// this should look familiar | |
val conf = new SparkConf() | |
val sc = new SparkContext(conf) | |
// setup this fella | |
val hiveContext = new HiveContext(sc) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val mydstream = ... // these usually come from Spark Streaming apps | |
// they basically contain a chain of RDDs that you can convert to DFs | |
mydstream.foreachRDD(rdd => { | |
hiveContext.createDataFrame(rdd) | |
.write | |
.option("orc.compress", "snappy") | |
.mode(SaveMode.Append) | |
.orc("/this/is/an/hdfs/directory/too") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// create a case class to represent a Transaction (from streaming) | |
case class Transaction( | |
ts: Int, | |
customer_id: Int, | |
transaction_id: String, | |
amount: Double | |
) | |
// create a case class to represent a TransactionDetail (from static) | |
case class TransactionDetail( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TABLE my_database.my_table | |
( | |
column_1 string, | |
column_2 int, | |
column_3 double | |
) | |
STORED AS ORC | |
TBLPROPERTIES('ORC.COMPRESS'='SNAPPY'); -- ensure SNAPPY is uppercase, lowercase triggers a nasty bug in Hive (fixed in later versions) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TABLE my_database.my_table | |
STORED AS ORC TBLPROPERTIES('ORC.COMPRESS'='SNAPPY') as | |
SELECT * FROM my_database.my_other_table WHERE YEAR=2017 AND MONTH=11 AND DAY=30; |
OlderNewer