Dreamsome onesuper

Peter Thiel's CS183: Startup - Class 1 Notes Essay

Here is an essay version of my class notes from Class 1 of CS183: Startup. Errors and omissions are my own. Credit for good stuff is Peter’s entirely.

CS183: Startup—Notes Essay—The Challenge of the Future

Purpose and Preamble

Introduction to Kafka

Kafka acts as a kind of write-ahead log (WAL) that records messages to a persistent store (disk) and allows subscribers to read and apply these changes to their own stores in a system appropriate time-frame.

Terminology:

Producers send messages to brokers
Consumers read messages from brokers
Messages are sent to a topic

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination

	#!/usr/bin/python
	# coding=utf-8

	# Python version of Zach Holman's "spark"
	# https://github.com/holman/spark
	# by Stefan van der Walt <[email protected]>

	"""
	USAGE:

	#!/bin/bash
	# herein we backup our indexes! this script should run at like 6pm or something, after logstash
	# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas,
	# compress the data files, create a restore script, and push it all up to S3.
	TODAY=`date +"%Y.%m.%d"`
	INDEXNAME="logstash-$TODAY" # this had better match the index name in ES
	INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/"
	BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put"
	BACKUPDIR="/mnt/es-backups/"
	YEARMONTH=`date +"%Y-%m"`

	// convenient Spring JDBC RowMapper for when you want the flexibility of Jackson's TreeModel API
	// Note: Jackson can also serialize standard Java Collections (Maps and Lists) to JSON: if you don't need JsonNode,
	// it's simpler and more portable to have Spring JDBC simply return a Map or List<Map>.

	package org.springframework.jdbc.core;

	import java.math.BigDecimal;
	import java.sql.ResultSet;
	import java.sql.ResultSetMetaData;
	import java.sql.SQLException;

	sudo apt-get install build-essential libsqlite3-dev zlib1g-dev libncurses5-dev libgdbm-dev libbz2-dev libreadline5-dev libssl-dev libdb-dev

	wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz
	tar -xzf Python-2.7.3.tgz
	cd Python-2.7.3

	./configure --prefix=/usr --enable-shared
	make
	sudo make install
	cd ..

	language: java

	env:
	global:
	- SONATYPE_USERNAME=yourusername
	- secure: "your encrypted SONATYPE_PASSWORD=pass"

	after_success:
	- python addServer.py
	- mvn clean deploy --settings ~/.m2/mySettings.xml

	import spark.streaming.StreamingContext._
	import spark.streaming.{Seconds, StreamingContext}
	import spark.SparkContext._
	import spark.storage.StorageLevel
	import spark.streaming.examples.twitter.TwitterInputDStream
	import com.twitter.algebird.HyperLogLog._
	import com.twitter.algebird._

	/**
	* Example of using HyperLogLog monoid from Twitter's Algebird together with Spark Streaming's

	public static Random random = new Random(DateTime.Now.Millisecond);
	public int chooseWithChance(params int[] args)
	{
	/*
	* This method takes number of chances and randomly chooses
	* one of them considering their chance to be choosen.
	* e.g.
	* chooseWithChance(1,99) will most probably (%99) return 1 since index of 99 is 1
	* chooseWithChance(99,1) will most probably (%99) return 0 since index of 99 is 0
	* chooseWithChance(0,100) will always return 1.