Skip to content

Instantly share code, notes, and snippets.

View YordanGeorgiev's full-sized avatar

Yordan Georgiev YordanGeorgiev

View GitHub Profile
@YordanGeorgiev
YordanGeorgiev / global-build.sbt
Last active April 12, 2019 07:41
[global build.sbt] how-to add jar utils usefull for development, but not wanted for projects' build.sbt's #scala #sbt #cnf #config #configuration
// file: ~/.sbt/0.13/plugins/build.sbt
// or ~/.sbt/1.0/plugins/build.sbt
// or <<proj>>
// purpose: add jar utils useful for development, but not wanted to projects' build.sbt
// much better depts resolvement and fetching
// usage: sbt update
// https://github.com/coursier/coursier#why
@YordanGeorgiev
YordanGeorgiev / scala-testing-get-resource-file-or-dir-path.scala
Last active March 24, 2018 10:29
[get resource file or dir path in scala] how-to get the dir or file resource path with scala #scala #testing #resources
val pathAsStr: String =
Thread
.currentThread()
.getContextClassLoader()
.getResource("string/path/to/resource/dir/or/file")
.toString
// or
val strFilePath: String = getClass.getClassLoader
@YordanGeorgiev
YordanGeorgiev / scala-spark-filter-nulls-nans-min-max-vals-from-df.scala
Created February 21, 2018 09:21
[filter nulls , NaNs, min, max vals with scala spark dataframe] how-to filter nulls , NaNs, min, max vals with scala spark dataframe #scala #spark #dataframe #filter
df.filter(df.col(X).isNotNull)
.filter(df.col(Y).isNotNull)
.filter(df.col(X).isNaN =!= true)
.filter(df.col(Y).isNaN =!= true)
.filter(
df.col(X) >= lit(minX)
&& df.col(X) <= lit(maxX)
&& df.col(Y) >= lit(minY)
&& df.col(Y) <= lit(maxY)
@YordanGeorgiev
YordanGeorgiev / scala-spark-create-df-with-schema.scala
Last active February 21, 2018 09:14
[create dataframe with schema with scala] how-to create harcoded dataframe with schema in scala spark #scala #spark #dataframe #hardcoded
val df = spark
.createDataFrame(
spark.sparkContext.parallelize(
Seq(
Row("row txt content", 1.0d, None, "2018-01-01", "2018-01-29T11:05:50")
)),
StructType(
Seq(
StructField("colOfStringType", StringType),
@YordanGeorgiev
YordanGeorgiev / to-from-xls.sh
Created February 20, 2018 17:08
[edit-csv-files-in-xls] how-to edit csv files in Excel with perl oneliners and bash #bash #perl #oneliners #excel #xls
# usage:
# source zeppelin/sh/funcs/to-from-xls.sh
# export dir=<<path-to-the-root-dir-holding-the-csv-files-obs-recursive!!!!>
# toXls
# fromXls
# use BEFORE you wanto to open the files in xls
toXls(){
set -u -o pipefail
@YordanGeorgiev
YordanGeorgiev / perl-oneliner-add-remove-file-first-line.sh
Created February 20, 2018 10:58
[add and remove the first line of a file oneliner] how-to to add or remove the first line of a file with a perl oneliner #perl #oneliners #shell
# add the "sep=," for xls to open quickly csv files
find $dir -type f -exec perl -pi -e 'print "sep=,\n" if $.==1' {} \;
# remove it
find $dir -type f -exec perl -pi -e '$_ = "" if $. == 1' {} \;
@YordanGeorgiev
YordanGeorgiev / scala-spark-create-nullable-cols-dataframe.scala
Created February 19, 2018 15:43
[scala spark create hardcoded dataframe with nullable cols] how-to create a dataframe with nullable columns in scala spark #scala #spark #dataframe #hardcoded
val spark = SparkSession.builder().getOrCreate()
import spark.implicits._
// format: off
// obs first and second row hardcoded vals are for "teaching" schema !!!
val ds = Seq(
// foo comments
(1,"foo",Some("bar"),Some(1850)) ,
// bar comments
(2,"foo",None,None)
@YordanGeorgiev
YordanGeorgiev / scala-spark-create-hardcoded-schema.scala
Last active February 21, 2018 09:14
[scala spark create harcoded schema] how-to create hardcoded schema in scala spark for a dataframe #scala #spark #dataframe #schema
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.DoubleType
import org.apache.spark.sql.types.StringType
val objSchema: StructType = StructType(
@YordanGeorgiev
YordanGeorgiev / scala-spark-chain-transformations.scala
Created February 19, 2018 13:41
[scala spark chaining transformations] how-to chain transformations in scala spark on a DataFrame obj #scala #spark #dataframe #transformations
import org.apache.spark.sql.SparkSession
trait Phase {
implicit val spark: SparkSession = SparkSession.builder().getOrCreate()
def process(df: DataFrame): DataFrame
}
class Level1Phase ( cnf: Configuration) extends Phase {
override def process(df: DataFrame): DataFrame = {
@YordanGeorgiev
YordanGeorgiev / create-table-item.sql
Last active February 17, 2018 12:21
[create table ddl in postgres] how-to create table in postgres pgsql #sql #pgsql #postgres
-- DROP TABLE IF EXISTS item ;
SELECT 'create the "item" table'
;
CREATE TABLE item (
guid UUID NOT NULL DEFAULT gen_random_uuid()
, level integer NULL
, seq integer NULL
, prio integer NULL
, name varchar (200) NOT NULL