Skip to content

Instantly share code, notes, and snippets.

@nolanlawson
Last active August 29, 2015 14:03
Show Gist options
  • Save nolanlawson/10452539f07b12a73d37 to your computer and use it in GitHub Desktop.
Save nolanlawson/10452539f07b12a73d37 to your computer and use it in GitHub Desktop.
Cloudant Query API - first impressions

The new Cloudant query API is pretty awesome. It basically looks like Mongo, which makes for much more readable and user-friendly queries than what you can get with standard map/reduce.

Index the field foo:

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_index -d '{
>     "index": {
>         "fields": ["foo"]
>     },
>     "name" : "foo-index",
>     "type" : "json"
> }'
{"result":"created"}

It creates a design document (sigh), but notice that the structure is different from usual:

cinnabar:~ nolan$ acurl -X GET https://pouch.cloudant.com/mydb/_all_docs?include_docs=true | json
{
  "total_rows": 1,
  "offset": 0,
  "rows": [
    {
      "id": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "key": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "value": {
        "rev": "1-17ec75b65cb1ca7d012cd3fcad323848"
      },
      "doc": {
        "_id": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
        "_rev": "1-17ec75b65cb1ca7d012cd3fcad323848",
        "language": "query",
        "views": {
          "foo-index": {
            "map": {
              "fields": {
                "foo": "asc"
              }
            },
            "reduce": "_count",
            "options": {
              "def": {
                "fields": [
                  "foo"
                ]
              },
              "w": 2
            }
          }
        }
      }
    }
  ]
}

Instead of building up a map function, "map" instead maps to an object describing the index structure. Which is great, because as far as I can tell, it'd be impossible to build a map function that covers all the possible selectors ($in, $nin, $gt, $lt, etc.). What options does, I'm not sure.

Now, let's put a doc with a foo field:

cinnabar:~ nolan$ acurl -X PUT https://pouch.cloudant.com/mydb/doc -d '{"foo": "bar"}'
{"ok":true,"id":"doc","rev":"1-4c6114c65e295552ab1019e2b046b10e"}

Check the list of indexes to make sure it's still there:

cinnabar:~ nolan$ acurl -X GET https://pouch.cloudant.com/mydb/_index | json
{
  "indexes": [
    {
      "ddoc": null,
      "name": "_all_docs",
      "type": "special",
      "def": {
        "fields": [
          {
            "_id": "asc"
          }
        ]
      }
    },
    {
      "ddoc": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "name": "foo-index",
      "type": "json",
      "def": {
        "fields": [
          {
            "foo": "asc"
          }
        ]
      }
    }
  ]
}

Find all docs where doc.foo is'bar':

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"foo": "bar"}}'| json
{
  "docs": [
    {
      "_id": "doc",
      "_rev": "1-4c6114c65e295552ab1019e2b046b10e",
      "foo": "bar"
    }
  ]
}

Edge case time! What happens if I ask for a field that's not indexed?

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"baz": "bar"}}'| json
{
  "error": "no_usable_index",
  "reason": "No index exists for this selector, try indexing one of: baz"
}

Good choice, Cloudant. By explicitly returning an error when a field isn't indexed, you protect the developer from accidentally writing unperformant queries. No need to do an EXPLAIN or EXPLAIN QUERY PLAN; the database will tell you if you're being a dumbass.

How about the _id field, though? It is technically indexed:

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"_id": "bar"}}'| json
{
  "docs": []
}
cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"_id": "doc"}}'| json
{
  "docs": [
    {
      "_id": "doc",
      "_rev": "1-4c6114c65e295552ab1019e2b046b10e",
      "foo": "bar"
    }
  ]
}

Awesome. Developers don't want to have to learn a separate API to query by _id, and it's great that they don't have to. Well done.

Other thoughts:

  • Kinda bizarre that we don't get total_rows or offset when using _find. offset isn't a big loss IMO (we don't even fully implement it for Pouch), but total_rows can be really convenient when doing pagination (e.g. "2 pages of results remaining").
  • Adam Kocoloski said on IRC that they're open to porting this feature to CouchDB, but it won't be in until post-2.0.
  • I'm interested in writing a Pouch plugin for this even before it lands in Couch, but it'll require some modifications to core map/reduce. I've already had to fork map/reduce for the quick-search plugin; not too keen to do it again for this.
  • That being said, this is going to be a beast to implement in PouchDB. Queries like { "selector": {"Actor_name": "Robert De Niro"}, "sort": [{"Movie_year": "asc", "Movie_runtime": "desc"}]} have a lot going on at once. Our toIndexableString idea was pretty clever, but it's not enough to cover every possible combination. In-memory sorting would be unperformant on larger databases, but it may be the only quick and dirty adapter-agnostic solution.
  • I am not a big fan of design documents, but the fact that this allows you to query by _id obviates my biggest complaint about design docs, which is that they muddy the _all_docs results.
  • Cloudant released this in the nick of time. I was worried that Lucene would become the de-facto query language, since it was the only one that allowed you to write complex queries that human beings could actually understand.
  • One view per design doc is a good default.
@rnewson
Copy link

rnewson commented Jun 26, 2014

total_rows for views has always been the total number of rows in the entire view, not those that match your query parameters, so it's never been very useful for page indication (in that context, offset is not very useful either). (Cloudant Search on the other hand gives the number of actual matches).

@nolanlawson
Copy link
Author

Ah, good point. In that case, it might be useful to invent something new for the case where you do a query with limit and want to know how many results are left. It'd be a shame to have to depend on Lucene for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment