Skip to content

Instantly share code, notes, and snippets.

@andershammar
andershammar / install-apache-zeppelin-on-amazon-emr.sh
Last active October 9, 2018 03:31
Bootstrap script for installing Apache Zeppelin on an Amazon EMR Cluster. Verfied on Amazon EMR release 4.x.
#!/bin/bash -ex
if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true" ]; then
# Install Git
sudo yum -y install git
# Install Maven
wget -P /tmp http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo mkdir /opt/apache-maven
sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven
@paf31
paf31 / 24days.md
Last active August 8, 2023 05:53
24 Days of PureScript

This blog post series has moved here.

You might also be interested in the 2016 version.

@tokestermw
tokestermw / visualizing_topic_models.py
Last active September 7, 2021 16:57
visualization topic models in four different ways
import json
import urlparse
from itertools import chain
flatten = chain.from_iterable
from nltk import word_tokenize
from gensim.corpora import Dictionary
from gensim.models.ldamodel import LdaModel
from gensim.models.tfidfmodel import TfidfModel
@acolyer
acolyer / service-checklist.md
Last active February 16, 2026 02:23
Internet Scale Services Checklist

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

Basic tenets

  • Does the design expect failures to happen regularly and handle them gracefully?
  • Have we kept things as simple as possible?
@avernet
avernet / Client.java
Last active January 3, 2016 07:29
Connecting to a service and providing a client-side certificate, client using HttpClient 4.2, and server running on Node.js
/*
* ====================================================================
*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
@debasishg
debasishg / gist:8172796
Last active April 12, 2026 23:53
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@ayosec
ayosec / Unpack.scala
Created October 6, 2013 12:41
Extract files with Apache Commons Compress, from any archive
package test
import org.apache.commons.compress.archivers.ArchiveStreamFactory
import org.apache.commons.compress.archivers.ArchiveInputStream
import org.apache.commons.compress.archivers.ArchiveEntry
import org.apache.commons.compress.compressors.CompressorStreamFactory
import scala.util.Try
import scala.util.Success
import scala.util.Failure
import java.io.InputStream
@millermedeiros
millermedeiros / osx_setup.md
Last active April 4, 2026 00:28
Mac OS X setup

Setup Mac OS X

I've done the same process every couple years since 2013 (Mountain Lion, Mavericks, High Sierra, Catalina) and I updated the Gist each time I've done it.

I kinda regret for not using something like Boxen (or anything similar) to automate the process, but TBH I only actually needed to these steps once every couple years...

@mrflip
mrflip / tuning_storm_trident.asciidoc
Last active April 10, 2026 03:50
Notes on Storm+Trident tuning

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
@jbilcke
jbilcke / create_video.py
Last active July 25, 2023 07:14
How to create a video using Gephi + Scripting plugin + ffmpeg
execfile("/your/path/to/videomaker.py")
videomaker(
ts_min=1352261778000, # "from" timestamp..
ts_max=1352262378000, # .."to" timestamp
frames=20, # number of images in the video. eg 200 frames for a video at 20 frames per seconds = 10 seconds of video
output_prefix="/path/to/output/dir/frame_", # path where to write the png. images will be prefixed with "frame_"
output_format=".png" # you probably want to leave png here
)