Skip to content

Instantly share code, notes, and snippets.

@ShinNoNoir
Created February 10, 2013 13:13
Show Gist options
  • Save ShinNoNoir/4749548 to your computer and use it in GitHub Desktop.
Save ShinNoNoir/4749548 to your computer and use it in GitHub Desktop.
Simple implementation of the Fleiss' kappa measure in Python
def fleiss_kappa(ratings, n, k):
'''
Computes the Fleiss' kappa measure for assessing the reliability of
agreement between a fixed number n of raters when assigning categorical
ratings to a number of items.
Args:
ratings: a list of (item, category)-ratings
n: number of raters
k: number of categories
Returns:
the Fleiss' kappa score
See also:
http://en.wikipedia.org/wiki/Fleiss'_kappa
'''
items = set()
categories = set()
n_ij = {}
for i, c in ratings:
items.add(i)
categories.add(c)
n_ij[(i,c)] = n_ij.get((i,c), 0) + 1
N = len(items)
p_j = {}
for c in categories:
p_j[c] = sum(n_ij.get((i,c), 0) for i in items) / (1.0*n*N)
P_i = {}
for i in items:
P_i[i] = (sum(n_ij.get((i,c), 0)**2 for c in categories)-n) / (n*(n-1.0))
P_bar = sum(P_i.itervalues()) / (1.0*N)
P_e_bar = sum(p_j[c]**2 for c in categories)
kappa = (P_bar - P_e_bar) / (1 - P_e_bar)
return kappa
example = ( [( 1,5)] * 14 +
[( 2,2)] * 2 + [( 2,3)] * 6 + [( 2,4)] * 4 + [( 2,5)] * 2 +
[( 3,3)] * 3 + [( 3,4)] * 5 + [( 3,5)] * 6 +
[( 4,2)] * 3 + [( 4,3)] * 9 + [( 4,4)] * 2 +
[( 5,1)] * 2 + [( 5,2)] * 2 + [( 5,3)] * 8 + [( 5,4)] * 1 + [( 5,5)] * 1 +
[( 6,1)] * 7 + [( 6,2)] * 7 +
[( 7,1)] * 3 + [( 7,2)] * 2 + [( 7,3)] * 6 + [( 7,4)] * 3 +
[( 8,1)] * 2 + [( 8,2)] * 5 + [( 8,3)] * 3 + [( 8,4)] * 2 + [( 8,5)] * 2 +
[( 9,1)] * 6 + [( 9,2)] * 5 + [( 9,3)] * 2 + [( 9,4)] * 1 +
[(10,2)] * 2 + [(10,3)] * 2 + [(10,4)] * 3 + [(10,5)] * 7 )
print '%.03f' % fleiss_kappa(example, 14, 5) # 0.210
@nucflash
Copy link

nucflash commented Oct 3, 2015

A trivial comment: The signature of the function (and the documentation) needs updating as k is not used anywhere in the function.

@Gnork
Copy link

Gnork commented Jul 14, 2018

Another trivial comment: I would suggest using "k" as a parameter and let the user decide how many categories there are. Just because nobody voted for a category doesn't mean it wasn't available.

@danyaljj
Copy link

A trivial +1 to the 2nd "trivial comment".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment