Cloudant Query API - first impressions

The new Cloudant query API is pretty awesome. It basically looks like Mongo, which makes for much more readable and user-friendly queries than what you can get with standard map/reduce.

Index the field foo:

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_index -d '{
>     "index": {
>         "fields": ["foo"]
>     },
>     "name" : "foo-index",
>     "type" : "json"
> }'
{"result":"created"}

It creates a design document (sigh), but notice that the structure is different from usual:

cinnabar:~ nolan$ acurl -X GET https://pouch.cloudant.com/mydb/_all_docs?include_docs=true | json
{
  "total_rows": 1,
  "offset": 0,
  "rows": [
    {
      "id": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "key": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "value": {
        "rev": "1-17ec75b65cb1ca7d012cd3fcad323848"
      },
      "doc": {
        "_id": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
        "_rev": "1-17ec75b65cb1ca7d012cd3fcad323848",
        "language": "query",
        "views": {
          "foo-index": {
            "map": {
              "fields": {
                "foo": "asc"
              }
            },
            "reduce": "_count",
            "options": {
              "def": {
                "fields": [
                  "foo"
                ]
              },
              "w": 2
            }
          }
        }
      }
    }
  ]
}

Instead of building up a map function, "map" instead maps to an object describing the index structure. Which is great, because as far as I can tell, it'd be impossible to build a map function that covers all the possible selectors ($in, $nin, $gt, $lt, etc.). What options does, I'm not sure.

Now, let's put a doc with a foo field:

cinnabar:~ nolan$ acurl -X PUT https://pouch.cloudant.com/mydb/doc -d '{"foo": "bar"}'
{"ok":true,"id":"doc","rev":"1-4c6114c65e295552ab1019e2b046b10e"}

Check the list of indexes to make sure it's still there:

cinnabar:~ nolan$ acurl -X GET https://pouch.cloudant.com/mydb/_index | json
{
  "indexes": [
    {
      "ddoc": null,
      "name": "_all_docs",
      "type": "special",
      "def": {
        "fields": [
          {
            "_id": "asc"
          }
        ]
      }
    },
    {
      "ddoc": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "name": "foo-index",
      "type": "json",
      "def": {
        "fields": [
          {
            "foo": "asc"
          }
        ]
      }
    }
  ]
}

Find all docs where doc.foo is'bar':

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"foo": "bar"}}'| json
{
  "docs": [
    {
      "_id": "doc",
      "_rev": "1-4c6114c65e295552ab1019e2b046b10e",
      "foo": "bar"
    }
  ]
}

Edge case time! What happens if I ask for a field that's not indexed?

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"baz": "bar"}}'| json
{
  "error": "no_usable_index",
  "reason": "No index exists for this selector, try indexing one of: baz"
}

Good choice, Cloudant. By explicitly returning an error when a field isn't indexed, you protect the developer from accidentally writing unperformant queries. No need to do an EXPLAIN or EXPLAIN QUERY PLAN; the database will tell you if you're being a dumbass.

How about the _id field, though? It is technically indexed:

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"_id": "bar"}}'| json
{
  "docs": []
}
cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"_id": "doc"}}'| json
{
  "docs": [
    {
      "_id": "doc",
      "_rev": "1-4c6114c65e295552ab1019e2b046b10e",
      "foo": "bar"
    }
  ]
}

Awesome. Developers don't want to have to learn a separate API to query by _id, and it's great that they don't have to. Well done.

Other thoughts:

Kinda bizarre that we don't get total_rows or offset when using _find. offset isn't a big loss IMO (we don't even fully implement it for Pouch), but total_rows can be really convenient when doing pagination (e.g. "2 pages of results remaining").
Adam Kocoloski said on IRC that they're open to porting this feature to CouchDB, but it won't be in until post-2.0.
I'm interested in writing a Pouch plugin for this even before it lands in Couch, but it'll require some modifications to core map/reduce. I've already had to fork map/reduce for the quick-search plugin; not too keen to do it again for this.
That being said, this is going to be a beast to implement in PouchDB. Queries like { "selector": {"Actor_name": "Robert De Niro"}, "sort": [{"Movie_year": "asc", "Movie_runtime": "desc"}]} have a lot going on at once. Our toIndexableString idea was pretty clever, but it's not enough to cover every possible combination. In-memory sorting would be unperformant on larger databases, but it may be the only quick and dirty adapter-agnostic solution.
I am not a big fan of design documents, but the fact that this allows you to query by _id obviates my biggest complaint about design docs, which is that they muddy the _all_docs results.
Cloudant released this in the nick of time. I was worried that Lucene would become the de-facto query language, since it was the only one that allowed you to write complex queries that human beings could actually understand.
One view per design doc is a good default.

nolanlawson/query_api.md

nolanlawson commented Jun 26, 2014

Uh oh!