Skip to content

Instantly share code, notes, and snippets.

@paf31
paf31 / 24days.md
Last active August 8, 2023 05:53
24 Days of PureScript

This blog post series has moved here.

You might also be interested in the 2016 version.

@tokestermw
tokestermw / visualizing_topic_models.py
Last active September 7, 2021 16:57
visualization topic models in four different ways
import json
import urlparse
from itertools import chain
flatten = chain.from_iterable
from nltk import word_tokenize
from gensim.corpora import Dictionary
from gensim.models.ldamodel import LdaModel
from gensim.models.tfidfmodel import TfidfModel
@acolyer
acolyer / service-checklist.md
Last active February 20, 2025 12:04
Internet Scale Services Checklist

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

Basic tenets

  • Does the design expect failures to happen regularly and handle them gracefully?
  • Have we kept things as simple as possible?
@avernet
avernet / Client.java
Last active January 3, 2016 07:29
Connecting to a service and providing a client-side certificate, client using HttpClient 4.2, and server running on Node.js
/*
* ====================================================================
*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
@debasishg
debasishg / gist:8172796
Last active April 20, 2025 12:45
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@ayosec
ayosec / Unpack.scala
Created October 6, 2013 12:41
Extract files with Apache Commons Compress, from any archive
package test
import org.apache.commons.compress.archivers.ArchiveStreamFactory
import org.apache.commons.compress.archivers.ArchiveInputStream
import org.apache.commons.compress.archivers.ArchiveEntry
import org.apache.commons.compress.compressors.CompressorStreamFactory
import scala.util.Try
import scala.util.Success
import scala.util.Failure
import java.io.InputStream
@millermedeiros
millermedeiros / osx_setup.md
Last active March 23, 2025 01:23
Mac OS X setup

Setup Mac OS X

I've done the same process every couple years since 2013 (Mountain Lion, Mavericks, High Sierra, Catalina) and I updated the Gist each time I've done it.

I kinda regret for not using something like Boxen (or anything similar) to automate the process, but TBH I only actually needed to these steps once every couple years...

@mrflip
mrflip / tuning_storm_trident.asciidoc
Last active October 8, 2024 15:18
Notes on Storm+Trident tuning

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
@jbilcke
jbilcke / create_video.py
Last active July 25, 2023 07:14
How to create a video using Gephi + Scripting plugin + ffmpeg
execfile("/your/path/to/videomaker.py")
videomaker(
ts_min=1352261778000, # "from" timestamp..
ts_max=1352262378000, # .."to" timestamp
frames=20, # number of images in the video. eg 200 frames for a video at 20 frames per seconds = 10 seconds of video
output_prefix="/path/to/output/dir/frame_", # path where to write the png. images will be prefixed with "frame_"
output_format=".png" # you probably want to leave png here
)
@why-not
why-not / gist:4582705
Last active November 6, 2024 04:35
Pandas recipe. I find pandas indexing counter intuitive, perhaps my intuitions were shaped by many years in the imperative world. I am collecting some recipes to do things quickly in pandas & to jog my memory.
"""making a dataframe"""
df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
"""quick way to create an interesting data frame to try things out"""
df = pd.DataFrame(np.random.randn(5, 4), columns=['a', 'b', 'c', 'd'])
"""convert a dictionary into a DataFrame"""
"""make the keys into columns"""
df = pd.DataFrame(dic, index=[0])