yash yashk

Utilities you'll care about

All these are already installed on epyc.

kafkacat (conda install -c conda-forge kafkacat)
kt (grab it from https://github.com/fgeller/kt/releases)
kafka-* (come with kafka, if you yum install if from Confluent's repo, or via Docker if you're so inclined). Warning -- JVM based and dreadfully slow.
jq (conda install -c conda-forge jq or use your favorite package manager)

Just wanted to document the steps taken in a specific scenario. There are plenty of references covering the different parts (feel free to Google if you don't believe me).

Scenario:
Problematic Java application running on a Linux server with a large heap (~16 GB).
My devices, Windows Laptop/Desktop with a limited number of RAM.

Goal:
Take a heap dump and analyze with Eclipse Memory Analyzer.

Heap Dump

Tar Usage / Cheat Sheet

Compress a file or directory

e.g: tar -czvf name-of-archive.tar.gz /path/to/directory-or-file

-c: Create an archive.
-z: Compress the archive with gzip.
-v: makes tar talk a lot. Verbose output shows you all the files being archived and much.
-f: Allows you to specify the filename of the archive.

Scaling your API with rate limiters

The following are examples of the four types rate limiters discussed in the accompanying blog post. In the examples below I've used pseudocode-like Ruby, so if you're unfamiliar with Ruby you should be able to easily translate this approach to other languages. Complete examples in Ruby are also provided later in this gist.

In most cases you'll want all these examples to be classes, but I've used simple functions here to keep the code samples brief.

Request rate limiter

This uses a basic token bucket algorithm and relies on the fact that Redis scripts execute atomically. No other operations can run between fetching the count and writing the new count.

	import sys

	# choose() is the same as computing the number of combinations. Normally this is
	# equal to:
	#
	# factorial(N) / (factorial(m) * factorial(N - m))
	#
	# but this is very slow to run and requires a deep stack (without tail
	# recursion).
	#

	import java.util.HashMap;
	import java.util.Map;

	public class Cluster {
	String app;
	String region;
	String env;

	Map<String, Metrics> connectionsFrom = new HashMap<>();
	}

	import org.apache.spark.sql.SparkSession

	object SparkSessionS3 {
	//create a spark session with optimizations to work with Amazon S3.
	def getSparkSession: SparkSession = {
	val spark = SparkSession
	.builder
	.appName("my spark application name")
	.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
	.config("spark.hadoop.fs.s3a.access.key", "my access key")

	// install Eclipse Memory Analyzer (MAT)
	wget http://www.mirrorservice.org/sites/download.eclipse.org/eclipseMirror/mat/1.6.1/rcp/MemoryAnalyzer-1.6.1.20161125-linux.gtk.x86_64.zip
	unzip MemoryAnalyzer-1.6.1.20161125-linux.gtk.x86_64.zip
	cd mat/
	// add more heap to mat
	nano MemoryAnalyzer.ini

	// create a dump
	su - username // if needed
	ps aux \| grep java

	#!/bin/bash

	wget http://mirror.nbtelecom.com.br/apache/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz
	tar zxvf apache-maven-3.5.0-bin.tar.gz
	sudo yum install -y git java-devel
	git clone https://github.com/Parquet/parquet-mr.git
	cd parquet-mr/parquet-tools/
	sed -i 's/1.6.0rc3-SNAPSHOT/1.6.0/g' pom.xml
	~/apache-maven-3.5.0/bin/mvn clean package -Plocal
	java -jar target/parquet-tools-1.6.0.jar schema ~/000000_0

	# this block is in alphabetic order
	caarlos0/git-add-remote kind:path
	caarlos0/jvm
	caarlos0/ports kind:path
	caarlos0/zsh-git-fetch-merge kind:path
	caarlos0/zsh-git-sync kind:path
	caarlos0/zsh-mkc
	caarlos0/zsh-open-pr kind:path
	mafredri/zsh-async
	Tarrasch/zsh-bd