Skip to content

Instantly share code, notes, and snippets.

View saswata-dutta's full-sized avatar
💭
I may be slow to respond.

Saswata Dutta saswata-dutta

💭
I may be slow to respond.
View GitHub Profile
scala> val values = Seq((1, "a",1), (1, "b",2), (2, "c", 2), (3, "d",1), (3, "e", 1), (3, "f",0))
values: Seq[(Int, String, Int)] = List((1,a,1), (1,b,2), (2,c,2), (3,d,1), (3,e,1), (3,f,0))
scala> val df = values.toDF
df: org.apache.spark.sql.DataFrame = [_1: int, _2: string ... 1 more field]
scala> val max_df = df.groupBy("_1").agg(max("_3").alias("_3"))
max_df: org.apache.spark.sql.DataFrame = [_1: int, _3: int]
scala> df.join(max_df, Seq("_1", "_3"), "leftsemi").dropDuplicates("_1", "_3").show
List<List<String>> rows = ordered list of values per row.
connection con.setAutoCommit(false);
PreparedStatement prepStmt = con.prepareStatement(
"insert statement for 1 row with exact ? for place holders");
for row in rows:
for col_i in row.size:
prepStmt.setString(col_i, row[col_i])
@saswata-dutta
saswata-dutta / emrfs.md
Created July 16, 2020 19:23 — forked from snigdhasjg/emrfs.md
Getting started with EMRFS.

Getting started with EMRFS

The EMR File System (EMRFS) is an implementation of HDFS that all Amazon EMR clusters use for reading and writing regular files from Amazon EMR directly to Amazon S3.

How to access a file from S3 using EMRFS

Using Java

Coming from HDFS it is very easy to implement EMRFS. You just need to pass URI("s3://<bucket-name>") object while getting filesystem object.

package com.joe;
@saswata-dutta
saswata-dutta / Create a cluster
Created July 16, 2020 19:23 — forked from BeatriceMoissinac/Create a cluster
[AWS EMR] How to create and manage clusters on AWS EMR #AWS
// vim: syntax=shell
$JAR=/usr/lib/spark/lib/spark-examples.jar
$KEY=MoissinB
# Create cluster with 1st step
aws emr create-cluster --profile $KEY \
--name "Moissinb Cluster" \
--release-label emr-5.10.0 \
--applications Name=Spark \
graph = TinkerGraph.open()
g = graph.traversal()
a1 = g.addV("acc").property(id, 1).next()
a2 = g.addV("acc").property(id, 2).next()
a3 = g.addV("acc").property(id, 3).next()
a4 = g.addV("acc").property(id, 4).next()
a5 = g.addV("acc").property(id, 5).next()
a6 = g.addV("acc").property(id, 6).next()
@saswata-dutta
saswata-dutta / Selections-Sublime.md
Created July 10, 2020 06:55 — forked from dufferzafar/Selections-Sublime.md
Selections and Multiple Cursors in Sublime Text 3

Selections and Multiple Cursors in Sublime Text 3

A handy list of selection shortcuts.

Here are the official docs: Keyboard and Mouse

Mouse

Building blocks:

  • Add to selection: Ctrl
  • Subtract from selection: Alt
@saswata-dutta
saswata-dutta / postgres_log_cloudwatch_filter.sql
Created June 16, 2020 20:23
filter to parse parts of RDS postgres logs in cloudwatch
parse @message /^(?<time>.{23})\:(?<ip>[^:]+):(?<user>[^:]+):\[(?<pid>\d+)\]:(?<level>[A-Z]+):\s(?<mssg>.*)/
| filter pid = 20658
| sort time asc
@saswata-dutta
saswata-dutta / postgres_queries_and_commands.sql
Created June 12, 2020 10:52 — forked from rgreenjr/postgres_queries_and_commands.sql
Useful PostgreSQL Queries and Commands
-- show running queries (pre 9.2)
SELECT procpid, age(clock_timestamp(), query_start), usename, current_query
FROM pg_stat_activity
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
-- show running queries (9.2)
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
package org.codefx.lab.stream;
import java.util.Collection;
import java.util.Objects;
import java.util.Optional;
import java.util.function.BinaryOperator;
import java.util.function.Supplier;
/**
* Finds a certain customer in a collection of customers.
#!/bin/bash
# bash generate random alphanumeric string
#
# bash generate random 32 character alphanumeric string (upper and lowercase) and
NEW_UUID=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
# bash generate random 32 character alphanumeric string (lowercase only)
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1