Skip to content

Instantly share code, notes, and snippets.

View YordanGeorgiev's full-sized avatar

Yordan Georgiev YordanGeorgiev

View GitHub Profile
@YordanGeorgiev
YordanGeorgiev / idea-shortcuts.scala
Created January 23, 2019 09:11
[idea-shortcuts] idea shortcuts #idea #sbt #shortcuts
// Ctrl + Alt + O ::: optimize imports
/// Ctrl + Shift + L ::: scalafmt format
/// F4 ::: open project settings
@YordanGeorgiev
YordanGeorgiev / how-to-iterate-over-df-rdd.scala
Created December 6, 2018 22:45
[iterate over df rdd] how-to iterate over df rdd #dataframe #scala #rdd #iterate
def process(df: DataFrame): DataFrame = {
val encoder = RowEncoder(df.schema) // provide the Catalyst codegen info about the datatypes of the data to avoid reflection
df.map(row => {
val rowIn = row.toArray
var rowOut = rowIn
// ... do here some kind of rowOut modifications
Row.fromSeq(rowOut)
})(encoder) // much faster than simple rdd iteration , because avoids the reflection overhead
@YordanGeorgiev
YordanGeorgiev / spark-df-double-interpolation-in-udf.scala
Created November 23, 2018 12:13
[double-interpolation for udf call in spark] how-to use double interpolation technique with spark dataframes udf call #spark #dataframe #udf #interpolation
class DoubleInterpolationTechniqueForUDFcallInDataFrame {
val colsNum = 100
// this is the func for the udf ...
def getHeight(index: Int, freqs: Seq[Integer]): Option[Double] = {
(0 to colsNum - 1) foreach (n => {
// some logic
})
Option(freqs(index))
@YordanGeorgiev
YordanGeorgiev / how-to-create-df-schema.scala
Created November 16, 2018 13:12
[how-to-create-df-schema] how-to create datagrame schema #dataframe #schema #StructField #StructType
val structType= {
val id = StructField("id", IntegerType)
val name = StructField("name", StringType)
val age = StructField("age", IntegerType)
new StructType(Array(id, name , age))
}
@YordanGeorgiev
YordanGeorgiev / postgres-cheat-sheet.sql
Created November 16, 2018 06:25
[postgres cheat sheet] postgres cheat sheet #postgres #cheat-sheet #sql
-- how-to alter a column
alter table table_name alter column column_name type varchar(30);
@YordanGeorgiev
YordanGeorgiev / s3cmd-cheat-sheet.sh
Created November 15, 2018 11:37
[s3cmd-cheat-sheet] s3cmd-cheat-sheet #s3cmd cheat sheet
# how-to list the files in the bucket recursively
s3cmd ls -r -c ~/.aws/s3cmd/$aws_profile.s3cfg s3://$bucket | sort -nr | less
# how-to upload a file-or-dir to the bucket
s3cmd -r -c ~/.aws/s3cmd/$aws_profile.s3cfg put --acl-public --guess-mime-type /path/to/local/file s3://$bucket/path/to/remote/obj-file-or-dir
# how-to download a file-or-dir-object from the remote s3 bucket
s3cmd -r -c ~/.aws/s3cmd/$aws_profile.s3cfg get s3://$bucket/path/to/remote/obj-file /path/to/local/file-or-dir
@YordanGeorgiev
YordanGeorgiev / how-to-parse-spark-cconf-settings-from-zpln-json-conf.sh
Last active November 8, 2018 15:13
[how-to parse spark configuration settings from zeppelin json conf] how-to parse spark configuration settings from zeppelin json conf #spark #conf #zeppelin #jq
cat interpreter-s3-project_665.json | jq -r '.interpreterSettings[] | objects | select(.name | contains("spark")) | .properties[] | recurse (.children) | "name: \"\(.name)\"; value: \"\(.value)\""' | grep shuffle
# chk: https://programminghistorian.org/en/lessons/json-and-jq#filter-select
@YordanGeorgiev
YordanGeorgiev / how-to-get-postgres-meta-data.sql
Created October 23, 2018 18:24
[how-to get postgres columns meta data] how-to get postgres columns meta data #postgres #meta #columns #metadata
SELECT DISTINCT
ROW_NUMBER () OVER (ORDER BY pgc.relname , a.attnum) as rowid ,
pgc.relname as table_name ,
a.attnum as attr,
a.attname as name,
format_type(a.atttypid, a.atttypmod) as typ,
a.attnotnull as notnull,
com.description as comment,
coalesce(i.indisprimary,false) as primary_key,
@YordanGeorgiev
YordanGeorgiev / how-to-use-implicit-class
Created October 3, 2018 06:16
[how-to use implicit class] how-to use implicit classes #scala #implicits #implicit #Row #scope
package com.corp.dept.spark.dfutils
import org.apache.spark.sql.Row
object RowEnhancements {
implicit class RowExtender(row: Row) {
def isNullAtCol(cellName: String): Boolean = {
val index: Int = row.fieldIndex(cellName)
@YordanGeorgiev
YordanGeorgiev / scala-iso-8601-datetime.scala
Created September 18, 2018 10:58
[get iso-8601 datetime ] how-to get iso-8601 datetime #scala #iso-8601 #datetime #datetimeformat
val ts:String = java.time.LocalDateTime.now().format(java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME).toString.replaceAll("T"," ")