Gerald Rich newsroomdev

Demonstrating jenks natural breaks implemented in simple-statistics. "Ckmeans clustering is an improvement on heuristic-based clustering approaches like Jenks." It is "a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations"

Rendered by d3js, based on an example by Mike Bostock and Tom MacWright's original comparison with quantize.

More on ckmeans is in the Simple-Statistics documentation. Also see the PR removing Jenks here and the original narrative on how Jenks algorithm was reimplemented through Tom's literature review.

Using IBM Watson Speech to Text API to translate a ProPublica podcast

An example of using the Watson Speech to Text API to translate a podcast from ProPublica: How a Reporter Pierced the Hype Behind Theranos

This is just a simpler demo of the same technique I demonstrate to make automated video supercuts in this repo: https://github.com/dannguyen/watson-word-watcher

The transcription takes just a few minutes (less if you parallelize the requests to IBM) and is free...but it isn't perfect by any means. It doesn't fare super well on proper nouns:

Charles Ornstein's last name is transcribed as Orenstein
John Carreyrou's last name becomes John Kerry Roo

	<html>
	<head>
	<!-- Load the d3 library. -->
	<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
	<link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
	<style>
	body { font-family: "Open Sans"; }
	text.stateID { dominant-baseline: middle; text-anchor: middle; }
	</style>
	</head>

	import asyncio
	import aiohttp
	import os
	import random
	import re
	import sys
	import traceback
	from io import StringIO
	from lxml.html import parse, make_links_absolute
	from lxml.cssselect import CSSSelector

	library(tidyverse)
	library(sf)
	library(tigris)

	# start by picking a state from https://github.com/Microsoft/USBuildingFootprints
	# WARNING: these files can be pretty big. using arizona for its copious subdivisions and reasoanable 83MB.
	url_footprint <- "https://usbuildingdata.blob.core.windows.net/usbuildings-v1-1/Arizona.zip"
	download.file(url_footprint, "Arizona.zip")
	unzip("Arizona.zip")

	#!/bin/sh
	curl http://cat.www.$1.com.meowbify.com/ > index.html
	s3cmd put index.html s3://$1.cat/