Skip to content

Instantly share code, notes, and snippets.

View cmk's full-sized avatar

Chris McKinlay cmk

View GitHub Profile
package labAnswers.lecture10a
import com.twitter.summingbird._
import com.twitter.summingbird.memory._
import collection.mutable.{ Map => MutableMap }
import org.joda.time.DateTime
import org.joda.time.Months

Practical PageRank

Introduction

This tutorial will introduce you to GraphX and guide you through the process of creating the graph representation of a dataset in Spark. You will also learn about relevance scoring in ElasticSearch by tuning the scoring algorithm to boost attribute weights.

Part 0

You will need the source code accompanying this tutorial in order to complete part 3. The source code contains a number of helper functions found in DataFrameUtils, as well as the pipeline used to create the Wikipedia datasets. Look through WikipediaPageRank to gain an understanding of GraphX pipelines.

Part 1: Introducing GraphX

def pitDepth(p: Int, q: Int, r: Int) = (p - q) min (r - q)
def updateDepths(pqrd: (Int,Int,Int,Int), x: Int): (Int,Int,Int,Int) = pqrd match {
case (p,q,r,d) if /*conds*/ => /*upslope reset*/
case (p,q,r,d) if /*conds*/ => /*update q*/
case (p,q,r,d) if /*conds*/ => /*update r and d*/
case (p,q,r,d) if /*conds*/ => /*update d and reset*/
}
def maxDepth(list: List[Int]): Int = list match {

Keybase proof

I hereby claim:

  • I am cem3394 on github.
  • I am cem3394 (https://keybase.io/cem3394) on keybase.
  • I have a public key whose fingerprint is 2BBC 09A6 A8CE FF00 6BC5 13B8 AFB7 1840 8273 3C0D

To claim this, I am signing this object:

@cmk
cmk / gist:b9dd37cce724d43b7892
Created August 24, 2014 19:30
Verifying that +cem3394 is my Bitcoin username. You can send me #bitcoin here: https://onename.io/cem3394
Verifying that +cem3394 is my Bitcoin username. You can send me #bitcoin here: https://onename.io/cem3394