sqoop help import
sqoop help export
sqoop help merge
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /*--------------------------------------------------------------------------------------------------------------------------------- | |
| Spark RDD API to sort RDD by string ascending and string descending at the same time | |
| akin to DF API "sort by col1 asc, col2 desc" | |
| Created by: https://www.linkedin.com/in/harleenmann1/ | |
| ---------------------------------------------------------------------------------------------------------------------------------*/ | |
| /* | |
| // Problem: Given an RDD[String, String, Int], order the elements by ascending first element and descending second element | |
| // Use RDD API | |
| */ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| val rddkv = sc.parallelize(List(("k1",1),("k2",2),("k1",2),("k3",5),("k3",1))) | |
| //rddkv.collect | |
| //Array[(String, Int)] = Array((k1,1), (k2,2), (k1,2), (k3,5), (k3,1)) | |
| //creating a hash partition | |
| rddkv.partitionBy(new org.apache.spark.HashPartitioner(3)).mapPartitionsWithIndex( (i,iter_p) => iter_p.map(x=>" index="+i+" value="+x)).collect | |
| //Array[String] = Array(" index=0 value=(k1,1)", " index=0 value=(k1,2)", " index=1 value=(k2,2)", " index=2 value=(k3,5)", " index=2 value=(k3,1)") | |
| //creating a range partition |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Fun with RDD transfomations :- | |
| val rdd = sc.parallelize(Array(("a","c",1),("b","c",2),("a","a",5),("a","a",1))) | |
| //rdd.collect | |
| //res44: Array[(String, String, Int)] = Array((a,c,1), (b,c,2), (a,a,5), (a,a,1)) | |
| //------------------------------// | |
| //-- map | |
| rdd.map(p=> (p._1,p._2,p._3,p._3+1)).collect | |
| //res54: Array[(String, String, Int, Int)] = Array((a,c,1,2), (b,c,2,3), (a,a,5,6), (a,a,1,2)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /* | |
| Expected Result: Order_Date , Order_status, total_orders, total_amount. In plain english, please find total orders and total amount per status per day. The result should be sorted by order date in descending, order status in ascending and total amount in descending and total orders in ascending. Aggregation should be done using below methods. | |
| */val df_orders = sqlContext.read.format("com.databricks.spark.avro").load("/user/cloudera/problem1/orders") | |
| /[order_id: int, order_date: bigint, order_customer_id: int, order_status: string] | |
| val df_order_items = sqlContext.read.format("com.databricks.spark.avro").load("/user/cloudera/problem1/order-items") | |
| //[order_item_id: int, order_item_order_id: int, order_item_product_id: int, order_item_quantity: int, order_item_subtotal: float, order_item_product_price: float] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| //--------------------------------//--------------------------------//--------------------------------//-------------------------------- | |
| //--- Textfile, Sequence file | |
| set io.compression.codecs; | |
| //displays all installed codecs | |
| set mapred.output.compression.type=BLOCK; | |
| //-------------------------------- | |
| //-- Intermediate file compression |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| agent3.sources = source1 | |
| agent3.channels = channel1 | |
| agent3.sinks = sink1 | |
| agent3.sources.source1.type = avro | |
| agent3.sources.source1.port = 11112 | |
| agent3.sources.source1.bind = localhost | |
| agent3.sinks.sink1.type = logger |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #avro source to file channel to hdfs sink (with avro snappy codec) | |
| agent4.sources = source1 | |
| agent4.channels = channel1 | |
| agent4.sinks = sink1 | |
| # Source configuration | |
| agent4.sources.source1.type = avro | |
| agent4.sources.source1.port = 11112 | |
| agent4.sources.source1.bind = localhost |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ------------------------ | |
| http://arun-teaches-u-tech.blogspot.my/p/cca-175-prep-problem-scenario-1.html | |
| ------------------------ | |
| 1. | |
| sqoop import --connect 'jdbc:mysql://localhost:3306/retail_db \ | |
| --username root --password cloudera \ | |
| --table orders \ | |
| --target-dir /user/cloudera/problem1/orders \ | |
| --as-avrodatafile --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec |
Complex data types in hive can be tricky.
I struggled to find any decent reference on the internet that would help me 'insert into ...' for array, map and struct data types columns. All I found was examples with the 'load ...' statement.
Here are some working examples that would help anyone with the issue that I faced.
create table x( col1 Int, col2 ARRAY<Int>);
insert into x select 1,array(1,2,3) union all select 2,array(2,3,4);