Skip to content

Instantly share code, notes, and snippets.

View meysampg's full-sized avatar
🖖
bit bit 0 bit

Meysam P. Ganji meysampg

🖖
bit bit 0 bit
View GitHub Profile
@meysampg
meysampg / simulate_sessionid.py
Created June 11, 2020 11:41
Simulate session ID on a Spark dataframe
session_window_seconds = 1800000
my_window = Window.partitionBy('unique_column').orderBy('unique_column','time_column')
df = df.withColumn('sid', F.lit(None))
df = df.withColumn('sid', F.when((df.unique_column == F.lag('unique_column').over(my_window)) & (df.time_column - F.lag('time_column').over(my_window) <= session_window), F.lag('sid').over(my_window)).otherwise(F.monotonically_increasing_id()))
df = df.withColumn('sid', F.when((df.unique_column == F.lag('unique_column').over(my_window)) & (df.time_column - F.lag('time_column').over(my_window) <= session_window), F.last('sid', ignorenulls = True).over(my_window)).otherwise(df.sid))
@meysampg
meysampg / common_kafka_commands_for_dev.sh
Last active December 6, 2020 12:35
Common kafka commands on DEV environment
// Delete All topics
for i in $(kafka-topics --list --zookeeper=zookeeper:2181); do kafka-topics --zookeeper=zookeeper:2181 --delete --topic $i; done
// Consume a topic messages
kafka-console-consumer --bootstrap-server broker:9092 --topic processes --from-beginning
// Create a topic
kafka-topics --zookeeper zookeeper:2181 --create --topic <T> --partitions 8 --replication-factor 1
// Read offsets information of a topic
FILTER="-3" # newer that 3 days
FILTER="3" created on 3 days ago
FILTER="+3" # older than 3 days
FOLDER="." # current directory
find $FOLDER -mtime $FITLER -type d -exec rm -rf {} \;
@meysampg
meysampg / jalaali_utils.go
Last active May 15, 2020 18:47
Calculate some useful infornation on Jalali Calendar
// algorithms from php version of https://jdf.scr.ir/download/
package cal
import (
"log"
"strconv"
"strings"
"time"
"github.com/jalaali/go-jalaali"
@meysampg
meysampg / xkcd.py
Created May 11, 2020 08:30
Plot in xkcd style with matplotlib
from matplotlib import pyplot as plt
from numpy import sin, linspace
plt.xkcd() # Yes...
plt.plot(sin(linspace(0, 10)))
plt.title('Always there is some chances... but!!! huuum')
@meysampg
meysampg / run_pgadmin4_docker.sh
Created May 3, 2020 09:09
Run PGAdmin4 with Docker
docker volume create --driver local --name=pga4volume
docker run -d --publish 5050:80 \
--volume=pga4volume:/var/lib/pgadmin \
--env-file=env \
--name=pgadmin4 dpage/pgadmin4
iconv -f WINDOWS-1256 -t UTF8 in.srt > out.srt
@meysampg
meysampg / .scalafix.conf
Created April 26, 2020 18:32
Just do it in the functional fashion!
rules = [
DisableSyntax
]
DisableSyntax.noVars = true
DisableSyntax.noNulls = true
DisableSyntax.noReturns = true
DisableSyntax.noAsInstanceOf = true
DisableSyntax.noIsInstanceOf = true
DisableSyntax.noXml = true
./bin/kafka-topics.sh --alter --zookeeper <BROKER_IP>:2181 --topic <TOPIC_NAME> --partitions <NEW_PARITION_NUMBERS>
package mypackage_name_space_or_sth_like_that
// some packages are imported here, but they're internally used and not related to kafka or spark
import org.apache.log4j.Logger
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.kafka.common.serialization.StringDeserializer