Skip to content

Instantly share code, notes, and snippets.

View bepcyc's full-sized avatar
🙃
Sparkling

Viacheslav Rodionov bepcyc

🙃
Sparkling
  • Qualcomm
  • Germany
View GitHub Profile
@bepcyc
bepcyc / find_by_num_files.sh
Created June 7, 2016 15:30
To list immediate subdirectories containing greater than $NUM files.
# https://superuser.com/questions/617050/find-directories-containing-a-certain-number-of-files/946283#946283
# To list immediate subdirectories containing greater than $NUM files.
# if doesn't work try set $NUM explicitly
find -type f -printf '%h\0' | awk -v num="$NUM" 'BEGIN{RS="\0"} {array[$0]++} END{for (line in array) if (array[line]>num) printf "%s\n", line}'
@bepcyc
bepcyc / hive-partitions-generator.py
Last active January 19, 2017 13:32
When you need the partitions.. for CONCATENATE in Hive or other operations
import datetime
today = datetime.datetime.today()
# redefine parameters
days_delta = 365
month_partiton = 'month'
day_partition = 'day'
date_range = [today - datetime.timedelta(days=x) for x in range(0, days_delta)]
# pairs of ('YYYYMM', 'DD')
partitions = [(str(d.year*100+d.month), str(100+d.day)[1:]) for d in date_range ]
@bepcyc
bepcyc / 2pass_x264_rotate_90.sh
Created January 30, 2017 00:18
2-pass x264 convert with rotation 90 clockwise
# found here: https://wiki.archlinux.org/index.php/MEncoder#Two-pass_x264_.28very_high-quality.29
# sudo apt install mencoder
INPUT_VIDEO="input.avi"
OUTPUT_VIDEO="output.avi"
rm -rf divx2pas.log*
mencoder ${INPUT_VIDEO} -oac copy -vf rotate=1 -ovc x264 -x264encopts pass=1:preset=veryslow:fast_pskip=0:tune=film:frameref=15:bitrate=3000:threads=auto -o /dev/null && \
mencoder ${INPUT_VIDEO} -oac copy -vf rotate=1 -ovc x264 -x264encopts pass=2:preset=veryslow:fast_pskip=0:tune=film:frameref=15:bitrate=3000:threads=auto -o ${OUTPUT_VIDEO}
@bepcyc
bepcyc / prepare_hdfs.sh
Last active August 31, 2017 07:41
Preparing disks for hadoop HDFS
# I use these commands with pdsh assuming my worker nodes look the same
# I also assume that my hard disks are /dev/sdb - /dev/sdj
for d in {b..j};
do
# convert letters b..j to numbers 0..8
dnum=$(python -c "print(ord('${d}')-98)")
disk="/dev/sd${d}"
umount ${disk}1
@bepcyc
bepcyc / spark_210_with_HBase.sh
Created June 6, 2017 16:58
Workaround for Spark 2.1.0 to work with HBase tables mapped to Hive
# this works on Cloudera CDH, but you can easily run it on any path
HBASE_JARS=$(ls -1 /opt/cloudera/parcels/CDH/jars/*hbase*.jar|grep -v test|tr '\n' ',')
spark2-shell ${HBASE_JARS}/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar
@bepcyc
bepcyc / spark2_hive_hbase.sh
Created October 19, 2017 16:50
Make Spark 2.x work with Hive mapped HBase tables on Cloudera CDH 5.12
HBASE_JARS=$(ls -1 /opt/cloudera/parcels/CDH/jars/*hbase*.jar|grep -v test|tr '\n' ',')/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar
# the part with --files is crucial. Otherwise all queries freeze with no error.
spark2-shell --jars $HBASE_JARS --files /etc/hbase/conf/hbase-site.xml
@bepcyc
bepcyc / prepare_grid.sh
Created April 20, 2018 14:26
Prepares HDDs for HDFS or Mesosphere DC/OS or any other clustered environment.
# WARNING: THIS SCRIPT DESTROYS DATA WITH NO QUESTIONS ASKED!
# disks are /dev/sdb - /dev/sdj - fix for your situation
for d in {b..j};
do
# convert letters b..j to numbers 0..8
dnum=$(python -c "print(ord('${d}')-97)")
disk="/dev/sd${d}"
umount ${disk}1
mount_point="/dcos/volume${dnum}"
disk_label="grid0${dnum}"
@bepcyc
bepcyc / check_swap.sh
Last active September 18, 2018 13:34
Show all processes using swap sorted by amount used
#!/bin/bash
# based on https://www.cyberciti.biz/faq/linux-which-process-is-using-swap/
for file in /proc/*/status ; do awk '/VmSwap|Name|Tgid:/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 3 -n -r | less
@bepcyc
bepcyc / kafka-cheat-sheet.md
Created September 22, 2018 15:40 — forked from ursuad/kafka-cheat-sheet.md
Quick command reference for Apache Kafka

Kafka Topics

List existing topics

bin/kafka-topics.sh --zookeeper localhost:2181 --list

Describe a topic

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic mytopic

Purge a topic

bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic mytopic --config retention.ms=1000

... wait a minute ...

@bepcyc
bepcyc / startingOffsets.scala
Created October 11, 2018 14:35
form starting offsets string for spark
"""{"acp_prod.devices": {""" + df.select($"partition", $"offset").groupBy($"partition").agg(max($"offset")).as[(Int, Long)].collect.map{case (p, o) => s""""$p": $o"""}.mkString(",") + "}}"