Chris McKinlay cmk

Keybase proof

I hereby claim:

I am cem3394 on github.
I am cem3394 (https://keybase.io/cem3394) on keybase.
I have a public key whose fingerprint is 2BBC 09A6 A8CE FF00 6BC5 13B8 AFB7 1840 8273 3C0D

To claim this, I am signing this object:

Practical PageRank

Introduction

This tutorial will introduce you to GraphX and guide you through the process of creating the graph representation of a dataset in Spark. You will also learn about relevance scoring in ElasticSearch by tuning the scoring algorithm to boost attribute weights.

Part 0

You will need the source code accompanying this tutorial in order to complete part 3. The source code contains a number of helper functions found in DataFrameUtils, as well as the pipeline used to create the Wikipedia datasets. Look through WikipediaPageRank to gain an understanding of GraphX pipelines.

Part 1: Introducing GraphX

Cats, Algebird, Summingbird Lab

Part 1: Simple Fold Map

code/labExercises/src/main/scala/labExercises/lecture10a/SimpleFoldMap.scala

Read the Typelevel documentation of the Monoid typeclass.

Free Monad + Interpreter Pattern

It's like creating the front end and back end of a compiler inside Haskell without the need of Template Haskell!

Write your DSL AST as a Free Monad, and then interpret the monad any way you like.

The advantage is that you get to swap out your interpreter, and your main code

	def pitDepth(p: Int, q: Int, r: Int) = (p - q) min (r - q)

	def updateDepths(pqrd: (Int,Int,Int,Int), x: Int): (Int,Int,Int,Int) = pqrd match {
	case (p,q,r,d) if /conds/ => /upslope reset/
	case (p,q,r,d) if /conds/ => /update q/
	case (p,q,r,d) if /conds/ => /update r and d/
	case (p,q,r,d) if /conds/ => /update d and reset/
	}

	def maxDepth(list: List[Int]): Int = list match {

	package labAnswers.lecture10a

	import com.twitter.summingbird._
	import com.twitter.summingbird.memory._

	import collection.mutable.{ Map => MutableMap }

	import org.joda.time.DateTime
	import org.joda.time.Months

	--Number of commits for libraries of interest during 2015–09–30 ~ 2016–09–30:

	SELECT date, idname.repo_name AS repo_name, num_commits
	FROM
	(SELECT
	DATE(created_at) AS date,
	repo.id,
	SUM(INTEGER(JSON_EXTRACT(payload, '$.size'))) AS num_commits
	FROM
	(TABLE_DATE_RANGE([githubarchive:day.],

	{-# LANGUAGE RankNTypes, FlexibleInstances, FlexibleContexts, UndecidableInstances #-}
	module Free where

	import Control.Monad.IO.Class
	import Data.IORef
	import Control.Applicative
	import Control.Monad
	import Data.Maybe (fromMaybe)
	import qualified Control.Monad.Trans.State as ST
	import qualified Data.Map as M