Skip to content

Instantly share code, notes, and snippets.

View yaravind's full-sized avatar
💭
Constraints Liberate. Liberties Constrain.

Aravind Yarram yaravind

💭
Constraints Liberate. Liberties Constrain.
View GitHub Profile
@cb372
cb372 / jargon.md
Last active August 30, 2025 02:11
Category theory jargon cheat sheet

Category theory jargon cheat sheet

A primer/refresher on the category theory concepts that most commonly crop up in conversations about Scala or FP. (Because it's embarassing when I forget this stuff!)

I'll be assuming Scalaz imports in code samples, and some of the code may be pseudo-Scala.

Functor

A functor is something that supports map.

@chinshr
chinshr / Jenkinsfile
Last active May 20, 2025 13:10
Best of Jenkinsfile, a collection of useful workflow scripts ready to be copied into your Jenkinsfile on a per use basis.
#!groovy
# Best of Jenkinsfile
# `Jenkinsfile` is a groovy script DSL for defining CI/CD workflows for Jenkins
node {
}
@arturmkrtchyan
arturmkrtchyan / get_job_status.sh
Last active October 22, 2024 05:45
Apache Spark Hidden REST API
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000
@ashwanthkumar
ashwanthkumar / build.sh
Created February 4, 2016 12:35
Build commands for GoLang project on SnapCI
GITHUB_USERNAME="ashwanthkumar"
GITHUB_REPO="marathonctl"
# Install Golang as part of the build - takes about 30 secs
sudo yum install --assumeyes golang
# Setup GOPATH conventions
mkdir -p /var/snap-ci/src/github.com/${GITHUB_USERNAME}/
# Create symlinks according to go's directory structure
ln -s /var/snap-ci/repo /var/snap-ci/src/github.com/${GITHUB_USERNAME}/${GITHUB_REPO}
# Run your make commands to test and build your project
@jaceklaskowski
jaceklaskowski / spark-jobserver-docker-macos.md
Last active August 1, 2018 11:28
How to run spark-jobserver on Docker and Mac OS (using docker-machine)

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

@wsargent
wsargent / win10-dev.md
Last active August 7, 2024 18:20
Windows Development Environment for Scala

Windows 10 Development Environment for Scala

This is a guide for Scala and Java development on Windows, using Windows Subsystem for Linux, although a bunch of it is applicable to a VirtualBox / Vagrant / Docker subsystem environment. This is not complete, but is intended to be as step by step as possible.

Harden Windows 10

Read the entire Decent Security guide, and follow the instructions, especially:

@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active January 10, 2025 07:36
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@ParthaSSatpathy
ParthaSSatpathy / Introduction to NLP with Python.ipynb
Last active April 11, 2022 23:36
Introduction to Natural Language Processing Using Python
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jdegoes
jdegoes / fpmax.scala
Created July 13, 2018 03:18
FP to the Max — Code Examples
package fpmax
import scala.util.Try
import scala.io.StdIn.readLine
object App0 {
def main: Unit = {
println("What is your name?")
val name = readLine()