Skip to content

Instantly share code, notes, and snippets.

View zoltanctoth's full-sized avatar

Zoltan C. Toth zoltanctoth

View GitHub Profile
@zoltanctoth
zoltanctoth / spark-kafka.scala
Last active February 6, 2017 20:23
How to use the Direct Kafka Source in Scala with offset Specification
import org.apache.spark._
import org.apache.spark.sql.Column
import org.apache.spark.streaming._
import _root_.kafka.serializer.StringDecoder
import org.apache.spark.streaming._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.TopicPartition
import org.apache.kafka.common.serialization.StringDeserializer
@zoltanctoth
zoltanctoth / ggplot2-examples.r
Last active January 17, 2021 18:10
ggplot2 examples and exercises
library(ggplot2)
# Take a look at our example dataset
View(diamonds)
# Make a chart from scratch
x = ggplot() +
layer(
data = diamonds, mapping = aes(x=carat,y=price),
stat='identity', position="identity", geom="point"
@zoltanctoth
zoltanctoth / ggplot2-solutions.r
Created May 9, 2017 04:43
Here you can find the solutions for the exercises we used at the class.
library(ggplot2)
# Take a look at our example dataset
View(diamonds)
# Make a chart from scratch
x = ggplot() +
layer(
data = diamonds, mapping = aes(x=carat,y=price),
stat='identity', position="identity", geom="point"
@zoltanctoth
zoltanctoth / newmonth.sh
Last active April 14, 2018 07:40
create new month in datapao admin
gfind 2018-02\ február -type d -print0 | sed 's/2018-02 február/2018-03 március/g' | xargs -0 -I {} mkdir -p {}
@zoltanctoth
zoltanctoth / Monitor Azure Costs.sh
Last active April 7, 2025 23:26
A simple script to approximate Azure spendings. This is far from accurate. Please comment if you know a better way
#! /usr/bin/env bash
set -e
DAILY_SPENDING_LIMIT=3 # USD, per account
DATE_COMMAND="date"
if hash gdate 2>/dev/null
then
DATE_COMMAND="gdate"
fi
@zoltanctoth
zoltanctoth / save-and-load-native-lightgbm-model-mlflow
Created May 20, 2020 06:35
How to save and load a Native LightGBM Model in Sparlk MLlib
import lightgbm as lgb
# Imagine pipelineModel stages are [x, x, x, trainLightGBMModel]
model.stages[-1].saveNativeModel("/tmp/lightgbm")
nativeLGBModel = lgb.Booster(model_file="/dbfs/tmp/lightgbm/part-00000-tid-5517958219000636906-02c16955-a283-4198-a41a-cdbd78f5aae5-455-1-c000.txt")
mlflow.lightgbm.log_model(nativeLGBModel, artifact_path="lightgbm-model")
@zoltanctoth
zoltanctoth / list_run_adf_pipeline.py
Created September 7, 2020 12:00
List Azure Data Factory Pipelines and Run an ADF PIpeline using a Credential-based Service Principal Authentication
from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.datafactory import DataFactoryManagementClient
from azure.mgmt.datafactory.models import *
subscription_id = '8d1dc324-4f8a-4be5-ae74-310e2f5596a5'
credentials = ServicePrincipalCredentials(client_id='dcf2637e-8f81-4bbb-a72e-ac2f291e328b', secret='<< secret >>', tenant='874cd0d6-f21a-4c6e-8239-51287476f635')
adf_client = DataFactoryManagementClient(credentials, subscription_id)
pipelines = adf_client.pipelines.list_by_factory("schneider-test", "Schneider-Test-Data-Factory")
for p in pipelines:
@zoltanctoth
zoltanctoth / batch delete files s3
Created September 22, 2020 07:25
delete thousands or millions of objects in S3
# Hint: If you are stuck by having tens of millions of files under an S3 Prefix, perhaps
# the easiest is to set the prefix's Expiration to one day in the Lifecycle Management
# pane of the bucket in the Web UI and Amazon will take care of the object deletion for you
# A good resource where I've gotten the scripts is this:
https://serverfault.com/questions/679989/most-efficient-way-to-batch-delete-s3-files#comment1200074_917740
# List all objects
aws s3api list-objects --output text --bucket <<BUCKET_NAME>> --query 'Contents[].[Key]' --prefix <<prefix, like tmp/sandbox>> | pv -l
@zoltanctoth
zoltanctoth / print-without-newline.py
Last active October 21, 2020 10:28
Python print without newline. This script shows how you can use python to print a string without adding a newline.
# Print a string without adding a newline
print("Hey, Python prints without a newline.", end ="")
# Alternative solution
import sys
sys.stdout.write("Hey, Python prints without a newline.")
# You are part of an experiment on how well gists can be used as "StackOverflow".
# Please add a comment or a star if you found this useful. :) Thanks!
@zoltanctoth
zoltanctoth / packages.yml
Created September 22, 2022 08:58
dbt-expectations package definition example
packages:
- package: calogica/dbt_expectations
version: [">=0.6.0", "<0.7.0"]