This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#------------------------------------------------------------------------------ | |
# echo pass params and print them to a log file and terminal | |
# with timestamp and $host_name and $0 PID | |
# usage: | |
# doLog "INFO some info message" | |
# doLog "DEBUG some debug message" | |
#------------------------------------------------------------------------------ | |
doLog(){ | |
type_of_msg=$(echo $*|cut -d" " -f1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#------------------------------------------------------------------------------ | |
# parse the single letter command line args | |
#------------------------------------------------------------------------------ | |
doParseCmdArgs(){ | |
# traverse all the possible cmd args | |
while getopts ":a:c:i:h:" opt; do | |
case $opt in | |
a) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ----------------------------------------------------------------------------- | |
# the constructor | |
# ----------------------------------------------------------------------------- | |
sub new { | |
my $class = shift; # Class name is in the first parameter | |
$appConfig = ${ shift @_ } || { 'foo' => 'bar' ,} ; | |
my $self = {}; # Anonymous hash reference holds instance attributes | |
bless( $self, $class ); # Say: $self is a $class |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@$rs = sort { $a->{ 'SeqId' } <=> $b->{ 'SeqId' } } @$rs; | |
foreach my $row ( @$rs ) { | |
# dof stuff | |
my $var = $row->{'col1'} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
private def runPipeLine(cnf: Configuration): DataFrame = { | |
val dfOut: DataFrame = | |
new Phase1(cnf).process() | |
.transform(new Phase2(cnf).process) | |
return dfOut | |
} | |
class Phase1 extends DataFrameStage { | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
object SingleTon { | |
def apply(): SingleTon = { | |
new SingleTon() | |
} | |
} | |
class SingleTon { |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val spark = SparkSession.builder().getOrCreate() | |
import spark.implicits._ | |
val df = spark | |
.createDataFrame( | |
spark.sparkContext.parallelize( | |
Seq( | |
Row( | |
Map(("key1","val1") -> 1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// note if you could implement withColumn + udf it has been usually over 10x faster ... | |
val rddRows: RDD[Row] = | |
inDf.rdd.map(row => { | |
val lstRow = row.toSeq.toList | |
var lstRowNew = lstRow | |
// do stuff on the new lstRow here | |
Row.fromSeq(lstRowNew) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val lstKeyCols = List("col1" , "col2" , "col3" ) | |
dfLeft | |
.join( | |
dfRight, | |
dfLeft("col1") <=> dfRight("col1_") | |
&& dfLeft("col2") <=> dfRight("col2_") | |
&& dfLeft("col3") <=> dfRight("col3_"), | |
"fullouter" | |
) | |
.drop(lstKeyCols.map(_ + "_"): _*) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// START foldLeft usage | |
val outDf: DataFrame = lstColumnsToIterate | |
.foldLeft(inDf)((tmpDf, iterableColToAdd) => { | |
tmpDf.withColumn(iterableColToAdd,expr(funcToApply).as(iterableColToAdd)) | |
}) | |
.groupBy(lstGroupByCols.distinct.head, lstGroupByCols.distinct.tail: _*) | |
.agg(lstAggregationCols.distinct.head, lstAggregationCols.distinct.tail: _*) | |
// STOP foldLeft usage |