Skip to content

Instantly share code, notes, and snippets.

View xtrmstep's full-sized avatar
🏠
Working from home

Alexander Goida xtrmstep

🏠
Working from home
  • Sofia, Bulgaria
View GitHub Profile
Layer Main guarantees What is still not guaranteed Mainly interesting for
Bronze Data is captured; source fidelity is preserved as much as possible; lineage is possible; replay and recovery are possible; ingestion timing is visible Clean semantics, stable meaning, deduplicated business entities, reporting-safe metrics Dat
@xtrmstep
xtrmstep / article-designing-data-platform-around-guarantees-groups.md
Last active March 22, 2026 11:11
Designing a Data Platform Around Guarantees
Consumer group Bronze Silver Gold
Data Engineers Inspect source input, debug ingestion, replay, trace origin Build stable transformations and integration logic Use as trusted downstream source
Analytics Engineers Can inspect, but not ideal for modeling Main layer for modeling and standardization Serve curated models and metrics
Data Scientists Use selectively for deep exploration or raw feature extraction Good for exploration and feature preparation Useful when stable business meaning matters
BI Developers Usually not suitable
@xtrmstep
xtrmstep / folder_hash.sh
Last active August 22, 2025 16:29
Calculate hash of a folder #bash
#!/bin/bash
set -e
DIR=${1:-.}
# find all files in the DIR and its subfolders, excluding .zip files
HASH=$(
find "$DIR" \( -type d \( -name bin -o -name obj \) -prune \) -o \
-type f -not -name '*.zip' -print0 |
sort -z |
@xtrmstep
xtrmstep / stats-normal-distribution-checks.md
Created December 28, 2024 14:42
for article "Data Series Normalization Techniques" at Medium
Test Name Null Hypothesis p-value Criteria Limitations Use Cases
Shapiro-Wilk Test The data is normally distributed. If p > 0.05,
@xtrmstep
xtrmstep / normalization-techniques.md
Last active December 28, 2024 17:21
for article "Data Series Normalization Techniques" at Medium
Technique Purpose
Z-Score Centers data to mean = 0, std dev = 1; for Gaussian data or regression-based models.
Min-Max Scales data to a specific range (e.g., [0, 1]); for bounded input in neural networks.
Log Transformation Compresses large values and reduces skewness; for data with exponential growth patterns.
Robust Scaling Rescales using median and IQR; for datasets with many outliers.
@xtrmstep
xtrmstep / convert_json_to_avro.py
Created August 26, 2024 12:50
Perform operations: Convert JSON to Avro using schema, store with compression, read Avro with compression and store uncompressed
import json
import fastavro
from fastavro.schema import load_schema
def json_to_avro(json_file_path, avro_file_path, schema_file_path, compression='deflate'):
try:
schema = load_schema(schema_file_path)
except Exception as e:
@xtrmstep
xtrmstep / dynamic_postgresql_command.sql
Created November 27, 2023 10:58
Shows several things about PostgreSQL: how to use multi-statement query in query window, output message and use metadata
DO
$do$
declare
r record;
query_cmd text;
begin
for r in select table_name from information_schema.tables where table_schema = 'public' and table_name like 'prefix%'
loop
query_cmd := format('delete from %s where CONDITION', r.table_name);
-- raise notice '%', query_cmd;
@xtrmstep
xtrmstep / url-query-parameter.js
Created February 20, 2023 08:33
Add or update URL query parameter in JavaScript
// usage:
// 'http://www.website.com/'.urlQueryParameter('id', 2) => http://www.website.com/?id=2
// 'http://www.website.com/?type=1'.urlQueryParameter('id', 2) => http://www.website.com/?type=1&id=2
String.prototype.isString = true;
String.prototype.urlQueryParameter = function(key, value) {
var uri = this;
if (uri.isString) {
var regEx = new RegExp("([?|&])" + key + "=.*?(&|$)", "i");
var separator = uri.indexOf('?') !== -1 ? "&" : "?";
@xtrmstep
xtrmstep / object_dump.js
Created February 20, 2023 07:32
Object dump of an object during execution of JavaScript code
function odump(object, depth, max) {
depth = depth || 0;
max = max || 2;
if (depth > max) return false;
var indent = "";
for (var i = 0; i < depth; i++) indent += " ";
var output = "";
for (var key in object) {
output += "n" + indent + key + ": ";
switch (typeof object[key]) {
@xtrmstep
xtrmstep / get_spark_dataframe_size.py
Created January 26, 2023 11:49
Calculating the size of a Spark data frame
files = [
"file://path"
]
df = spark.read.json(files)
catalyst_plan = df._jdf.queryExecution().logical()
df_size_read = spark._jsparkSession.sessionState().executePlan(catalyst_plan).optimizedPlan().stats().sizeInBytes()