Liang-Chi Hsieh viirya

Achieving warp speed with Rust

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

Descartes, Berkeley and Functional Reactive Programming

By @dmvaldman

Functional Reactive Programming (FRP) is generating buzz as an alternative to Object Oriented Programming (OOP) for certain use cases. However, an internet search quickly leads a curious and optimistic reader into the rabbit-hole of monads, functors, and other technical jargon. I’ve since emerged from this dark and lonely place with the realization that these words are mere implementation details, and that the core concepts are far more universal. In fact, the groundwork was laid down many centuries before the first computer, and has more to do with interpretations of reality, than structuring programs. Allow me to explain.

There’s an old thought experiment that goes like this:

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t

Debugging & Profiling Node.js - Moved! Click here to be redirected to the new location.

	# Clone llama.cpp
	git clone https://github.com/ggerganov/llama.cpp.git
	cd llama.cpp

	# Build it
	make clean
	LLAMA_METAL=1 make

	# Download model
	export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin

	#!/bin/bash

	# Run this on This AMI on AWS:
	# https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#LaunchInstanceWizard:ami=ami-b36981d8

	# You should get yourself a fully working GPU enabled tensorflow installation.

	cd ~

	# grab cuda 7.0

	/**
	* This file contains the core idea of wrapping an underlying OutputFormat with an OutputFormat
	* with an augmented key that writes to partitions using MultipleOutputs (or something similar)
	*/

	package model.hadoop

	import model.hadoop.HadoopIO.MultipleOutputer
	import model.hadoop.HadoopIO.MultipleOutputer._
	import org.apache.hadoop.io.{DataInputBuffer, NullWritable}

	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# The unreasonable effectiveness of Character-level Language Models\n",
	"## (and why RNNs are still cool)\n",
	"\n",
	"###[Yoav Goldberg](http://www.cs.biu.ac.il/~yogo)\n",

	"""
	This is a batched LSTM forward and backward pass
	"""
	import numpy as np
	import code

	class LSTM:

	@staticmethod
	def init(input_size, hidden_size, fancy_forget_bias_init = 3):