Given a set of documents, each associated with multiple tags, how can I retrieve the documents tagged with an arbitrary set of tags?
In order to simplify the problem I have assumed that the array of tags in each document is ordered and that the key to search for documents will be an ordered array of tags as well.
Create the db:
$ DB="http://localhost:5984/db"
$ curl -XPUT $DB
Create some documents with tags:
$ curl -XPOST $DB -d '{"_id":"One", "tags":["A", "B", "C"]}' -H "Content-Type: application/json"
$ curl -XPOST $DB -d '{"_id":"Two", "tags":["B", "C", "D"]}' -H "Content-Type: application/json"
$ curl -XPOST $DB -d '{"_id":"Six", "tags":["C", "D"]}' -H "Content-Type: application/json"
Now create the view:
$ echo '
{
"_id":"_design/all",
"language":"javascript",
"views":{
"by_tags":{
"map":"function(doc) {\n
if (doc.tags == []) return;\n\n
var emit_sequence = function(base, disp) {\n
if (disp.length > 1) {\n
emit(base.concat(disp[0]), 1);\n
emit_sequence(base.concat(disp[0]), disp.slice(1, disp.length));\n
emit_sequence(base, disp.slice(1, disp.length));\n
} else if (disp.length == 1) {\n
emit(base.concat(disp[0]), 1);\n
}\n
}\n\n
emit_sequence([], doc.tags);\n
}"
}
}
}
' > view.json
$ curl -XPOST $DB -d @view.json -H "Content-Type: application/json"
This view use a recursive function so that will emit all the possible combination of tags required to identify the document with any of the possible ordered subset of tags.
So, if the only document "One" was inserted, the list of keys will be:
$ curl -XGET $DB/_design/all/_view/by_tags
{"total_rows":7,"offset":0,"rows":[
{"id":"One","key":["A"],"value":1},
{"id":"One","key":["A","B"],"value":1},
{"id":"One","key":["A","B","C"],"value":1},
{"id":"One","key":["A","C"],"value":1},
{"id":"One","key":["B"],"value":1},
{"id":"One","key":["B","C"],"value":1},
{"id":"One","key":["C"],"value":1}
]}
With the three given sample documents, if we want now retrieve the list of document tagged with "B" and "C", the request will be:
$ curl -XGET $DB/_design/all/_view/by_tags?key='\["B","C"\]'
{"total_rows":17,"offset":6,"rows":[
{"id":"One","key":["B","C"],"value":1},
{"id":"Two","key":["B","C"],"value":1}
]}
Or, if we are looking for all the documents with tag "C":
$ curl -XGET $DB/_design/all/_view/by_tags?key='\["C"\]'
{"total_rows":17,"offset":10,"rows":[
{"id":"One","key":["C"],"value":1},
{"id":"Six","key":["C"],"value":1},
{"id":"Two","key":["C"],"value":1}
]}
It is worth to note that the view computed in this way will be quite big, the number of rows for each document is related to the number of tags. Given t as the tags size for a document, the rows count can be computed with the following formula:
rows = 2^t - 1
I am sure this is not the most efficient way to solve the original problem, but it works for me. Thanks to Claudio for his help.
Hi,
Nice work. I already got it running but the thing is, How do I query the keys in url format? thanks.. I'm using couchdb-odm by the way.