Yordan Georgiev YordanGeorgiev

1-daydreaming 2-know what you really want 3-commit to do it 4-know how-to to do it 5-can do it 6-do it actually 7-no regrets having done it

13 followers · 2 following

CSITEA
Helsinki, Finland
https://csitea.net

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

YordanGeorgiev / global-build.sbt

Last active April 12, 2019 07:41

[global build.sbt] how-to add jar utils usefull for development, but not wanted for projects' build.sbt's #scala #sbt #cnf #config #configuration

	// file: ~/.sbt/0.13/plugins/build.sbt
	// or ~/.sbt/1.0/plugins/build.sbt
	// or <<proj>>


	// purpose: add jar utils useful for development, but not wanted to projects' build.sbt

	// much better depts resolvement and fetching
	// usage: sbt update
	// https://github.com/coursier/coursier#why

YordanGeorgiev / scala-testing-get-resource-file-or-dir-path.scala

Last active March 24, 2018 10:29

[get resource file or dir path in scala] how-to get the dir or file resource path with scala #scala #testing #resources

	val pathAsStr: String =
	Thread
	.currentThread()
	.getContextClassLoader()
	.getResource("string/path/to/resource/dir/or/file")
	.toString

	// or

	val strFilePath: String = getClass.getClassLoader

YordanGeorgiev / scala-spark-filter-nulls-nans-min-max-vals-from-df.scala

Created February 21, 2018 09:21

[filter nulls , NaNs, min, max vals with scala spark dataframe] how-to filter nulls , NaNs, min, max vals with scala spark dataframe #scala #spark #dataframe #filter


	df.filter(df.col(X).isNotNull)
	.filter(df.col(Y).isNotNull)
	.filter(df.col(X).isNaN =!= true)
	.filter(df.col(Y).isNaN =!= true)
	.filter(
	df.col(X) >= lit(minX)
	&& df.col(X) <= lit(maxX)
	&& df.col(Y) >= lit(minY)
	&& df.col(Y) <= lit(maxY)

YordanGeorgiev / scala-spark-create-df-with-schema.scala

Last active February 21, 2018 09:14

[create dataframe with schema with scala] how-to create harcoded dataframe with schema in scala spark #scala #spark #dataframe #hardcoded


	val df = spark
	.createDataFrame(
	spark.sparkContext.parallelize(
	Seq(
	Row("row txt content", 1.0d, None, "2018-01-01", "2018-01-29T11:05:50")
	)),
	StructType(
	Seq(
	StructField("colOfStringType", StringType),

YordanGeorgiev / to-from-xls.sh

Created February 20, 2018 17:08

[edit-csv-files-in-xls] how-to edit csv files in Excel with perl oneliners and bash #bash #perl #oneliners #excel #xls

	# usage:
	# source zeppelin/sh/funcs/to-from-xls.sh
	# export dir=<<path-to-the-root-dir-holding-the-csv-files-obs-recursive!!!!>
	# toXls
	# fromXls

	# use BEFORE you wanto to open the files in xls
	toXls(){
	set -u -o pipefail

YordanGeorgiev / perl-oneliner-add-remove-file-first-line.sh

Created February 20, 2018 10:58

[add and remove the first line of a file oneliner] how-to to add or remove the first line of a file with a perl oneliner #perl #oneliners #shell

 # add the "sep=," for xls to open quickly csv files
 find $dir -type f -exec perl -pi -e 'print "sep=,\n" if $.==1' {} \;
 # remove it
 find $dir -type f -exec perl -pi -e '$_ = "" if $. == 1' {} \;

YordanGeorgiev / scala-spark-create-nullable-cols-dataframe.scala

Created February 19, 2018 15:43

[scala spark create hardcoded dataframe with nullable cols] how-to create a dataframe with nullable columns in scala spark #scala #spark #dataframe #hardcoded

	val spark = SparkSession.builder().getOrCreate()
	import spark.implicits._

	// format: off
	// obs first and second row hardcoded vals are for "teaching" schema !!!
	val ds = Seq(
	// foo comments
	(1,"foo",Some("bar"),Some(1850)) ,
	// bar comments
	(2,"foo",None,None)

YordanGeorgiev / scala-spark-create-hardcoded-schema.scala

Last active February 21, 2018 09:14

[scala spark create harcoded schema] how-to create hardcoded schema in scala spark for a dataframe #scala #spark #dataframe #schema

	import org.apache.spark.rdd.RDD
	import org.apache.spark.sql.DataFrame
	import org.apache.spark.sql.Row
	import org.apache.spark.sql.types.StructType
	import org.apache.spark.sql.types.StructField
	import org.apache.spark.sql.types.DoubleType
	import org.apache.spark.sql.types.StringType


	val objSchema: StructType = StructType(

YordanGeorgiev / scala-spark-chain-transformations.scala

Created February 19, 2018 13:41

[scala spark chaining transformations] how-to chain transformations in scala spark on a DataFrame obj #scala #spark #dataframe #transformations


	import org.apache.spark.sql.SparkSession

	trait Phase {
	implicit val spark: SparkSession = SparkSession.builder().getOrCreate()
	def process(df: DataFrame): DataFrame
	}

	class Level1Phase ( cnf: Configuration) extends Phase {
	override def process(df: DataFrame): DataFrame = {

YordanGeorgiev / create-table-item.sql

Last active February 17, 2018 12:21

[create table ddl in postgres] how-to create table in postgres pgsql #sql #pgsql #postgres

	-- DROP TABLE IF EXISTS item ;

	SELECT 'create the "item" table'
	;
	CREATE TABLE item (
	guid UUID NOT NULL DEFAULT gen_random_uuid()
	, level integer NULL
	, seq integer NULL
	, prio integer NULL
	, name varchar (200) NOT NULL

Newer Older