Skip to content

Instantly share code, notes, and snippets.

View bepcyc's full-sized avatar
🙃
Sparkling

Viacheslav Rodionov bepcyc

🙃
Sparkling
  • Qualcomm
  • Germany
View GitHub Profile
@bepcyc
bepcyc / kafka_troubleshoot_broken_message.py
Last active November 27, 2018 07:56
This script helps troubleshooting the "Unexpected error code 2 while fetching data" issue when consuming data from kafka
#!/usr/bin/env python
# pip install kafka-python
from kafka import SimpleConsumer, KafkaClient
servers = ('broker-01:9092,'
'broker-02:9092)
topic_name = "test.topic1"
offsets = {"17":553593369,"8":553142567,"11":562669633,"20":561215743,"2":2661087706,"5":2663616824,"14":561171342,"13":567403099,"4":2653875446,"16":554258518,"7":545144724,"1":2692486549,"10":557397175,"19":534819310,"18":548724039,"9":559537595,"3":2720217023,"12":548273786,"15":547916993,"6":2693124039,"0":2687886815}
group_id = "issue_finder"
@bepcyc
bepcyc / backup_restore_cassandra.sh
Last active December 10, 2018 15:00
Backup and restore cassandra table for a fresh start (e.g. when you're RIPped by tombstones)
#!/usr/bin/env bash
# RUN THIS ON EACH CASSANDRA NODE!
DEBUG=${DEBUG:-true} # change to false or run as 'DEBUG=false backup_restore_cassandra.sh' in prod
CQLSH=${CQLSH:-cqlsh} # pass required parameters if needed
KEYSPACE_NAME=${KEYSPACE_NAME:-profile}
TABLE_NAME=${TABLE_NAME:-device}
SNAPSHOT_TAG=${SNAPSHOT_TAG:-${TABLE_NAME}_`date +%Y%m%d_%H%M%S`}
KEYSPACE_DIRS="/dcos/volume*/${KEYSPACE_NAME}" # change appropriately!
@bepcyc
bepcyc / spark_24_foreachbatch.txt
Last active January 25, 2019 13:14
Jacek Laskowski's code is not-working
scala> println("This answer actually got some points on SO https://stackoverflow.com/a/53981675/918211")
This answer actually got some points on SO https://stackoverflow.com/a/53981675/918211
scala> println(spark.version)
2.4.0
scala> val sq = spark.readStream.format("rate").load
sq: org.apache.spark.sql.DataFrame = [timestamp: timestamp, value: bigint]
scala> :type sq
@bepcyc
bepcyc / AvroSchemaFromDataFrame.scala
Created February 21, 2019 17:28
Generate Avro Schema out of Apache Spark DataFrame
import org.apache.spark.sql.avro.SchemaConverters
SchemaConverters.toAvroType(df.schema) // add .toString if you need JSON here
@bepcyc
bepcyc / kafka_num_of_tags.sh
Created March 11, 2019 18:08
Calculate average number of records per JSON record in Kafka topic
#!/bin/bash
# returns an average number of key-value pairs per json record
# requires jq and kafka-tools
BOOTSTRAP_SERVERS="some-server-1:9092,some-server-2:9092"
TOPIC_NAME=$1
NUM_MESSAGES=${2:-10000}
TOTAL=$(kafka-console-consumer --topic ${TOPIC_NAME} --bootstrap-server ${BOOTSTRAP_SERVERS} \
--from-beginning --max-messages ${NUM_MESSAGES} \
@bepcyc
bepcyc / # cuetools - 2019-06-25_17-45-47.txt
Created June 25, 2019 15:49
cuetools on Debian GNU/Linux 9.9 (stretch) - Homebrew build logs
Homebrew build logs for cuetools on Debian GNU/Linux 9.9 (stretch)
Build date: 2019-06-25 17:45:47
@bepcyc
bepcyc / kafka_topics_sizes.sh
Last active October 27, 2022 12:16
Get kafka topic sizes in GB and sort them by size in ascending order
#!/usr/bin/env bash
topic-size() { kafka-log-dirs --command-config /opt/kafka/ssl/client.txt --bootstrap-server server:9093 --topic-list ${1} --describe | tail -n1 | jq '.brokers[0].logDirs[0].partitions | map(.size/1000000000) | add' | xargs echo ${1} =; }
list-topics() { kafka-topics --command-config /opt/kafka/ssl/client.txt --bootstrap-server server:9093 --list; }
export -f topic-size
TEMP_FILE=$(mktemp)
list-topics | xargs -I{} bash -c 'topic-size "{}"' > $TEMP_FILE
sort -g -k3 $TEMP_FILE
rm $TEMP_FILE
@bepcyc
bepcyc / list_ppas_used.sh
Created October 14, 2019 10:22
Ubuntu find all used PPAs
# "%200O" is required to print a full name of the "origin", i.e. PPA name
aptitude search '?narrow(?installed, ~Oppa)' -F "%200O" | sort -u
@bepcyc
bepcyc / ppas_new_release_check.sh
Created October 14, 2019 11:02
check PPAs availability for a new Ubuntu release
# DIST is a current release codename (e.g. bionic)
# NEXT_DIST is a next release codename (e.g. disco)
# error codes 301 and 200 mean there is a page, 404 - meansthere is not
DIST=$(. /etc/os-release; echo $VERSION_CODENAME); NEXT_DIST=disco; for url in $(grep -h -v "^#" /etc/apt/sources.list.d/*.list|grep "^deb" | sort -u | grep $DIST | awk -v n="$NEXT_DIST" '{print $2"/dists/"n}'); do echo $url $(curl -s -o /dev/null -w "%{http_code}" "$url"); done
@bepcyc
bepcyc / S3_buckets_object_count.sh
Created June 19, 2020 12:40
S3 buckets file count
FILENAME=/tmp/buckets
for b in $(aws s3api list-buckets | jq -r ".Buckets[].Name"); do aws s3api list-objects --bucket $b --output json --query "[length(Contents[])]" | jq -c ".[0]" | xargs -I {} echo -e {}"\t$b" >>$FILENAME ; done