Skip to content

Instantly share code, notes, and snippets.

@gcr
Created December 13, 2013 20:38
Show Gist options
  • Select an option

  • Save gcr/7950956 to your computer and use it in GitHub Desktop.

Select an option

Save gcr/7950956 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "How many CONSISTENT triplets are there?"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"source": [
"How many CONSISTENT triplets are there?"
]
},
{
"cell_type": "markdown",
"source": [
"For all triplets:\n",
" If consistent:\n",
" Count it\n",
"See if this number changes wrt. the data.\n",
"Theorem: If (a,b,c) is consistent, then (a,c,b) is not.\n",
"Corrolary: (a,b,c) does NOT imply anything about (c,a,b)\n",
"Emperical observation: The above implies NOTHING about ANY other triplet.\n",
"Triangle inequality does NOT hold for perceptual data, but what does it imply about triplets? \n",
"D(a,b)+D(b,c)>D(a,c)\n"
]
},
{
"cell_type": "markdown",
"source": [
"Definition: A triplet (a,b,c) is consistent iff. D(a,b) < D(a,c).\n",
"Theorem: Given k distinct points living in a metric space D, assuming\n",
"that no two pairs of points are equidistant, there are\n",
"\n",
"$$\\frac{k * (k-1) * (k-2)}{2}$$\n",
"\n",
"possible unique, consistent triplets.\n",
"\n",
"Proof: Consider every distinct tuple, (a,b,c). There are `k*(k-1)*(k-2)`\n",
"such tuples. Now note that because the distance matrix is symmetric,\n",
"EITHER (a,b,c) OR (a,c,b) is a CONSISTENT triplet.\n",
"\n",
"At LEAST one is consistent because D(a,b) != D(a,c); we assume that no\n",
"two pairs of points are equidistant and we assume all points are\n",
"distinct.\n",
"\n",
"At MOST one is consistent because if both (a,b,c) and (a,c,b) were\n",
"consistent, then Def. 1 implies that both the following statements\n",
"would be true:\n",
" - D(a,b) < D(a,c), and\n",
" - D(a,c) < D(a,b)\n",
"This cannot happen; any number cannot simultaneously be greater than\n",
"and less than another number.\n",
"\n",
"TODO: What about nonmetric similarity functions? There must be some\n",
"other geometric constraints that influence how many consistent\n",
"triplets we can generate.\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import sklearn.metrics\n",
"def count_violators(K, triplets):\n",
" violators = 0\n",
" for (a,b,c) in triplets:\n",
" if K[a,b] > K[a,c]:\n",
" violators += 1\n",
" return float(violators)\n",
"def all_triplets(K):\n",
" triplets = []\n",
" for a in xrange(len(K)):\n",
" for b in xrange(len(K)):\n",
" for c in xrange(len(K)):\n",
" if a!=b and b!=c and c!=a:\n",
" if K[a,b] < K[a,c]:\n",
" triplets.append( (a,b,c) )\n",
" else:\n",
" triplets.append( (a,c,b) )\n",
" uniq_triplets = sorted(list(set(triplets)))\n",
" print \"We have\",len(uniq_triplets),\"unique, consistent constraints.\"\n",
" return np.array(uniq_triplets)\n"
],
"language": "python",
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"points = np.random.randn(16,2)\n",
"len(all_triplets(sklearn.metrics.euclidean_distances(points,points)))"
],
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"We have 1680 unique, consistent constraints.\n"
]
},
{
"output_type": "pyout",
"prompt_number": 4,
"text": [
"1680"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"16*15*14 /2"
],
"language": "python",
"outputs": [
{
"output_type": "pyout",
"prompt_number": 5,
"text": [
"1680"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"points = np.random.randn(17,2)\n",
"len(all_triplets(sklearn.metrics.euclidean_distances(points,points))) "
],
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"We have 2040 unique, consistent constraints.\n"
]
},
{
"output_type": "pyout",
"prompt_number": 6,
"text": [
"2040"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"17*16*15/2"
],
"language": "python",
"outputs": [
{
"output_type": "pyout",
"prompt_number": 7,
"text": [
"2040"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"points = np.random.randn(20,2)\n",
"(20*19*18/2.)==len(all_triplets(sklearn.metrics.euclidean_distances(points,points)))"
],
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"We have 3420 unique, consistent constraints.\n"
]
},
{
"output_type": "pyout",
"prompt_number": 9,
"text": [
"True"
]
}
],
"prompt_number": 9
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment