Skip to content

Instantly share code, notes, and snippets.

View YordanGeorgiev's full-sized avatar

Yordan Georgiev YordanGeorgiev

View GitHub Profile
@YordanGeorgiev
YordanGeorgiev / log-output
Last active July 27, 2020 10:53
[log output in bash] how-to log output in bash #bash #functions #bash-funcs
#------------------------------------------------------------------------------
# echo pass params and print them to a log file and terminal
# with timestamp and $host_name and $0 PID
# usage:
# doLog "INFO some info message"
# doLog "DEBUG some debug message"
#------------------------------------------------------------------------------
doLog(){
type_of_msg=$(echo $*|cut -d" " -f1)
@YordanGeorgiev
YordanGeorgiev / parse-cmd-args.sh
Created February 17, 2018 12:14
[parse cmd args in bash] how-to parse cmd args in bash #bash #shell
#------------------------------------------------------------------------------
# parse the single letter command line args
#------------------------------------------------------------------------------
doParseCmdArgs(){
# traverse all the possible cmd args
while getopts ":a:c:i:h:" opt; do
case $opt in
a)
@YordanGeorgiev
YordanGeorgiev / perl-constuctor
Created February 17, 2018 11:18
[constructor in Perl] how-to add an OO constructor in Perl #perl #perl-OO
# -----------------------------------------------------------------------------
# the constructor
# -----------------------------------------------------------------------------
sub new {
my $class = shift; # Class name is in the first parameter
$appConfig = ${ shift @_ } || { 'foo' => 'bar' ,} ;
my $self = {}; # Anonymous hash reference holds instance attributes
bless( $self, $class ); # Say: $self is a $class
@YordanGeorgiev
YordanGeorgiev / perl-iterate-over-hash-ref-of-hash-refs.pm
Created February 17, 2018 11:15
sort hash ref of hash refs by a key on the second level in perl #perl
@$rs = sort { $a->{ 'SeqId' } <=> $b->{ 'SeqId' } } @$rs;
foreach my $row ( @$rs ) {
# dof stuff
my $var = $row->{'col1'}
}
@YordanGeorgiev
YordanGeorgiev / scala-spark-dataframe-pipeline.scala
Created February 17, 2018 10:40
[dataframe pipeline for spark] how-to build a dataframe processing pipeline in scala spark #scala #spark #dataframe #control-flow
private def runPipeLine(cnf: Configuration): DataFrame = {
val dfOut: DataFrame =
new Phase1(cnf).process()
.transform(new Phase2(cnf).process)
return dfOut
}
class Phase1 extends DataFrameStage {
@YordanGeorgiev
YordanGeorgiev / scala-singleton.scala
Created February 17, 2018 10:29
[object singleton] how-to create object singleton in scala #scala
object SingleTon {
def apply(): SingleTon = {
new SingleTon()
}
}
class SingleTon {
@YordanGeorgiev
YordanGeorgiev / create-dataframe-with-schema
Last active February 17, 2018 10:29
[create dataframe with schema] how-to create a dataframe obj with schema in scala spark #scala #spark #dataframe
val spark = SparkSession.builder().getOrCreate()
import spark.implicits._
val df = spark
.createDataFrame(
spark.sparkContext.parallelize(
Seq(
Row(
Map(("key1","val1") -> 1)
@YordanGeorgiev
YordanGeorgiev / iterate-over-rdd-rows.scala
Last active November 30, 2018 10:18
[iterate over rdd rows] how-to iterate over RDD rows and get DataFrame in scala spark #scala #spark
// note if you could implement withColumn + udf it has been usually over 10x faster ...
val rddRows: RDD[Row] =
inDf.rdd.map(row => {
val lstRow = row.toSeq.toList
var lstRowNew = lstRow
// do stuff on the new lstRow here
Row.fromSeq(lstRowNew)
@YordanGeorgiev
YordanGeorgiev / spark-dataframe-fullouter-join-on-nullable-columns.scala
Last active February 17, 2018 10:42
[full outer join on nullable columns for spark dataframe] how-to apply a full outer join on a spark dataframe #scala #spark #dataframe #joins
val lstKeyCols = List("col1" , "col2" , "col3" )
dfLeft
.join(
dfRight,
dfLeft("col1") <=> dfRight("col1_")
&& dfLeft("col2") <=> dfRight("col2_")
&& dfLeft("col3") <=> dfRight("col3_"),
"fullouter"
)
.drop(lstKeyCols.map(_ + "_"): _*)
@YordanGeorgiev
YordanGeorgiev / scala-spark-dataframe-fold-left.scala #scala #spark #fold-left
Created February 17, 2018 10:08
[fold left usage in scala spark] how-to use fold left in scala on a dataframe obj
// START foldLeft usage
val outDf: DataFrame = lstColumnsToIterate
.foldLeft(inDf)((tmpDf, iterableColToAdd) => {
tmpDf.withColumn(iterableColToAdd,expr(funcToApply).as(iterableColToAdd))
})
.groupBy(lstGroupByCols.distinct.head, lstGroupByCols.distinct.tail: _*)
.agg(lstAggregationCols.distinct.head, lstAggregationCols.distinct.tail: _*)
// STOP foldLeft usage