Skip to content

Instantly share code, notes, and snippets.

@dch
Forked from amedeo/gist:820412
Created March 28, 2012 06:42
Show Gist options
  • Save dch/2224339 to your computer and use it in GitHub Desktop.
Save dch/2224339 to your computer and use it in GitHub Desktop.
CouchDB and multiple tags

CouchDB and multiple tags

The problem

Given a set of documents, each associated with multiple tags, how can I retrieve the documents tagged with an arbitrary set of tags?

My solution

In order to simplify the problem I have assumed that the array of tags in each document is ordered and that the key to search for documents will be an ordered array of tags as well.

Create the db:

$ DB="http://localhost:5984/db"
$ curl -XPUT $DB

Create some documents with tags:

$ curl -XPOST $DB -d '{"_id":"One", "tags":["A", "B", "C"]}' -H "Content-Type: application/json"
$ curl -XPOST $DB -d '{"_id":"Two", "tags":["B", "C", "D"]}' -H "Content-Type: application/json"
$ curl -XPOST $DB -d '{"_id":"Six", "tags":["C", "D"]}' -H "Content-Type: application/json"

Now create the view:

$ echo '
{
  "_id":"_design/all",
  "language":"javascript",
  "views":{
    "by_tags":{
      "map":"function(doc) {\n
          if (doc.tags == []) return;\n\n

          var emit_sequence = function(base, disp) {\n
            if (disp.length > 1) {\n
              emit(base.concat(disp[0]), 1);\n
              emit_sequence(base.concat(disp[0]), disp.slice(1, disp.length));\n
              emit_sequence(base, disp.slice(1, disp.length));\n
            } else if (disp.length == 1) {\n
              emit(base.concat(disp[0]), 1);\n
            }\n
          }\n\n

          emit_sequence([], doc.tags);\n
        }"
    }
  }
}
' > view.json

$ curl -XPOST $DB -d @view.json -H "Content-Type: application/json"

This view use a recursive function so that will emit all the possible combination of tags required to identify the document with any of the possible ordered subset of tags.

So, if the only document "One" was inserted, the list of keys will be:

$ curl -XGET $DB/_design/all/_view/by_tags

{"total_rows":7,"offset":0,"rows":[
{"id":"One","key":["A"],"value":1},
{"id":"One","key":["A","B"],"value":1},
{"id":"One","key":["A","B","C"],"value":1},
{"id":"One","key":["A","C"],"value":1},
{"id":"One","key":["B"],"value":1},
{"id":"One","key":["B","C"],"value":1},
{"id":"One","key":["C"],"value":1}
]}

With the three given sample documents, if we want now retrieve the list of document tagged with "B" and "C", the request will be:

$ curl -XGET $DB/_design/all/_view/by_tags?key='\["B","C"\]'

{"total_rows":17,"offset":6,"rows":[
{"id":"One","key":["B","C"],"value":1},
{"id":"Two","key":["B","C"],"value":1}
]}

Or, if we are looking for all the documents with tag "C":

$ curl -XGET $DB/_design/all/_view/by_tags?key='\["C"\]'

{"total_rows":17,"offset":10,"rows":[
{"id":"One","key":["C"],"value":1},
{"id":"Six","key":["C"],"value":1},
{"id":"Two","key":["C"],"value":1}
]}

It is worth to note that the view computed in this way will be quite big, the number of rows for each document is related to the number of tags. Given t as the tags size for a document, the rows count can be computed with the following formula:

rows = 2^t - 1

I am sure this is not the most efficient way to solve the original problem, but it works for me. Thanks to Claudio for his help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment