Skip to content

Instantly share code, notes, and snippets.

@guenter
guenter / velocity.md
Created September 15, 2014 19:24
velocity
@ceteri
ceteri / 0.setup.sh
Last active April 24, 2019 11:04
Spark Streaming demo
# using four part files to construct "minitweet"
cat rawtweets/part-0000[1-3] > minitweets
# change log4j properties to WARN to reduce noise during demo
mv conf/log4j.properties.template conf/log4j.properties
vim conf/log4j.properties # Change to WARN
# launch Spark shell REPL
./bin/spark-shell
@levinotik
levinotik / GrasswireUrlShortener.scala
Created August 7, 2014 05:45
prototype of a URL shortener
package com.grasswire.grasswireurlshortener
import akka.actor.ActorSystem
import com.typesafe.config.ConfigFactory
import org.apache.commons.validator.routines.UrlValidator
import spray.http.StatusCodes
import spray.routing._
import scala.util.Random
import scalaz.concurrent._
@jkreps
jkreps / benchmark-commands.txt
Last active July 28, 2025 07:14
Kafka Benchmark Commands
Producer
Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
Single thread, no replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
@tsiege
tsiege / The Technical Interview Cheat Sheet.md
Last active November 12, 2025 11:31
This is my technical interview cheat sheet. Feel free to fork it or do whatever you want with it. PLEASE let me know if there are any errors or if anything crucial is missing. I will add more links soon.

ANNOUNCEMENT

I have moved this over to the Tech Interview Cheat Sheet Repo and has been expanded and even has code challenges you can run and practice against!






\

@stuart11n
stuart11n / gist:9628955
Created March 18, 2014 20:34
rename git branch locally and remotely
git branch -m old_branch new_branch # Rename branch locally
git push origin :old_branch # Delete the old branch
git push --set-upstream origin new_branch # Push the new branch, set local branch to track the new remote
package org.apache.spark.examples
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import java.util.Random
import scala.collection.mutable
import org.apache.spark.serializer.KryoRegistrator
import com.esotericsoftware.kryo.Kryo
@stephenll
stephenll / .bash_profile
Created February 2, 2014 02:45 — forked from jernejcic/.bash_profile
.bash_profile file on Mac OS X
# ---------------------------------------------------------------------------
#
# Description: This file holds all my BASH configurations and aliases.
# Much of this was originally copied from:
# http://natelandau.com/my-mac-osx-bash_profile/
#
# Sections:
# 1. Environment Configuration
# 2. Make Terminal Better (remapping defaults and adding functionality)
# 3. File and Folder Management
@debasishg
debasishg / gist:8172796
Last active November 16, 2025 01:54
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@azymnis
azymnis / ItemSimilarity.scala
Created December 13, 2013 05:17
Approximate item similarity using LSH in Scalding.
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }
/**
* Computes similar items (with a string itemId), based on approximate
* Jaccard similarity, using LSH.
*
* Assumes an input data TSV file of the following format:
*
* itemId userId