Skip to content

Instantly share code, notes, and snippets.

@BrianLitwin
Created March 3, 2019 16:16
Show Gist options
  • Save BrianLitwin/47b113505e25697019e36bde1dcc02e0 to your computer and use it in GitHub Desktop.
Save BrianLitwin/47b113505e25697019e36bde1dcc02e0 to your computer and use it in GitHub Desktop.

Pagerank computes a distribution representing the likelihood of travelling along a given edge from a given node

Lets say you have a node with 5 evenly weighted edges. You have a 20% change of travelling along each node. Lets say you have a node with 2 edges, node A weighted 1 and node B weighted 2. You have a 33% change of travelling out at A and a 66% change of travelling out at node B.

http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

How is PageRank Calculated?

This is where it gets tricky. The PR of each page depends on the PR of the pages pointing to it. But we won’t know what PR those pages have until the pages pointing to them have their PR calculated and so on… And when you consider that page links can form circles it seems impossible to do this calculation!

But actually it’s not that bad. Remember this bit of the Google paper:

PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.

What that means to us is that we can just go ahead and calculate a page’s PR without knowing the final value of the PR of the other pages. That seems strange but, basically, each time we run the calculation we’re getting a closer estimate of the final value. So all we need to do is remember the each value we calculate and repeat the calculations lots of times until the numbers stop changing much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment