at SWIB13 in Hamburg (2013-11-25)
Notes for the hands-on steps done by the participants during the workshop.
- Part I: Getting to know CouchDB (13:00-14:00)
- Hello
- me = ssp
- https://github.com/ssp/
- earthlingsoft / SUB
- who are you?
- what are you doing?
- how do you think CouchDB could be interesting for you?
- me = ssp
- What is CouchDB?
- History
- 2005 started by Damien Katz (Lotus Notes background)
- »Cluster of unreliable commodity hardware Data Base«
- ~ 2011 development stalled: CouchDB vs CouchBase
- development going on again: v 1.3.0 was released this winter
- perspective: integrate more features like BigCouch, GeoCouch and increase extensibility which is tricky due to Erlang
- other *ouchDBs exist
- NoSQL
- No SQL
- schema less
- Difference to SQL Databases
- pros and cons?
- what do we need schemas and structured queries for?
- Document Database
- JavaScript & JSON
- all data stored as JSON objects
- answers in JSON objects
- URL parameters in JSON format
- everybody familiar with JSON?
- there are a few quirks about it
- JavaScript is used for algorithms provided by the user
- although the database is implemented in Erlang
- note on other implementations?
- although the database is implemented in Erlang
- for/on/of the Web
- not just JavaScript
- http as the API
- REST-like interface
- Map/Reduce
- for index building and analysis
- Google origin
- Replication
- supported by CouchDB
- many interesting applications (e.g. mobile sync)
- History
- A look at Futon
- built into CouchDB at /_utils
- simple browser GUI to look at the database
- DO IT
- who brought their own CouchDB?
- database list
- look into database
- create a record
- look at fields:
- _id
- _rev
- update a record:
- observe _rev
- note history
- http: the »proper« interface to CouchDB
- REST-like
- uses http verbs GET/PUT/POST/DELETE
- can use it with curl
- -X POST [set the http verb]
- -D - [include headers in output]
- -H "Content-Type: application/json" [set content type!]
- -d '{"key": "value"}' [data to send to the server; use -d @/file/path to upload file content]
- http://localhost:5984/elag/document-id
-
- authentication using --netrc (with a ~/.netrc file) or user:password@ in the URL
- when querying use -H "Accept: application/json" to get the correct Content-Type
- »Dev HTTP Client« for Chrome
- pretty and powerful graphical client
- haven’t found an equally powerful Firefox extension yet
- Getting Data in and out of CouchDB
- create a database
- curl -D - -netrc -X PUT http://localhost:5984/swib-demo
- 201 Created
- {"ok":true}
- get info on the new database
- curl -D - --netrc -X GET http://localhost:5984/swib-demo
- {"db_name":"swib-demo","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"data_size":0,"instance_start_time":"1385179213594752","disk_format_version":6,"committed_update_seq":0}
- create a JSON object:
- {"type":"event", "name":"SWIB 2013", "location":{"de":"Hamburg"} "url":"http://swib.org/swib13"}
- PUT JSON into DB as document »swib2013«
- curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"SWIB 2013", "location":{"de":"Hamburg"}, "url":"http://swib.org/swib13"}' http://localhost:5984/swib-demo/swib2013
- 201 Created
- {"ok":true,"id":"swib2013","rev":"1-4eacd9c3b527eeb4ae2219bf553d477c"}
- retrieve record from DB
- curl -D - --netrc -X GET http://localhost:5984/swib-demo/swib2013
- 200 OK
- {"_id":"swib2013","_rev":"1-4eacd9c3b527eeb4ae2219bf553d477c","type":"event","name":"SWIB 2013","location":{"de":"Hamburg"},"url":"http://swib.org/swib13"}
- delete record from DB:
- DELETE giving the current document’s rev:
- curl -D - --netrc -X DELETE 'http://localhost:5984/swib-demo/swib2013?rev=1-4eacd9c3b527eeb4ae2219bf553d477c'
- alternatively:
- curl -D - --netrc -X DELETE -H "If-Match: 1-4eacd9c3b527eeb4ae2219bf553d477c" http://localhost:5984/swib-demo/swib2013
- 200 OK
- {"ok":true,"id":"swib2013","rev":"2-658addb8314908834baa6247ccfe1f61"}
- GET
- curl -D - --netrc -X GET http://localhost:5984/swib-demo/swib2013
- 404 Object Not Found
- {"error":"not_found","reason":"deleted"}
- Re-add the deleted record using PUT
- do not need the revision information because it’s been deleted
- but revision ID increases
- curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"SWIB 2013", "location":{"de":"Hamburg"}, "url":"http://swib.org/swib13"}' http://localhost:5984/swib-demo/swib2013
- 201 Created
- {"ok":true,"id":"swib2013","rev":"3-73932ffd3c46a6a1a16236df249689f2"}
- Change record in DB by sending a modified version:
- need to add the _rev of the existing record to overwrite it
- wihtout rev:
- curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"SWIB 2013", "location":{"de":"Hamburg Wilhelmsburg"}, "url":"http://swib.org/swib13"}' "http://localhost:5984/swib-demo/swib2013"
- 409 Conflict
- {"error":"conflict","reason":"Document update conflict."}
- curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"SWIB 2013", "location":{"de":"Hamburg Wilhelmsburg"}, "url":"http://swib.org/swib13"}' "http://localhost:5984/swib-demo/swib2013?rev=3-73932ffd3c46a6a1a16236df249689f2"
- 201 Created
- {"ok":true,"id":"swib2013","rev":"4-ae91e484504d0df3da5ab0e12b498dd9"}
- POST a bunch of documents to the database: _bulk_docs
- need to send a JSON object with an array of documents: {"docs": […]}
- curl -D - --netrc -X POST -H "Content-Type: application/json" -d '{"docs":[{"_id":"swib2012","type":"event","name":"SWIB 2012","location":{"de":"Köln", "en":"Cologne"}}, {"type":"event","name":"SWIB 2014"}, {"_id":"swib2013","type":"event","name":"SWIB 2013","location":{"de":"Hamburg"},"url":"http://swib.org/swib13","hashtag":"swib2013"} ]}' "http://localhost:5984/swib-demo/_bulk_docs"
- 201 Created
- [{"ok":true,"id":"swib2012","rev":"1-698cd57203f826306f6b66a835f4bab1"},{"ok":true,"id":"9bd2a1b1e59a18aa615f2e4f23000853","rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"},{"id":"swib2013","error":"conflict","reason":"Document update conflict."}]
- → granular update feedback, very powerful/flexible
- GET all documents from a database: _all_docs
- just metadata:
- curl -D- --netrc -X GET http://localhost:5984/swib-demo/_all_docs
- 200 OK
- {"total_rows":3,"offset":0,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f23000853","key":"9bd2a1b1e59a18aa615f2e4f23000853","value":{"rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"}}, {"id":"swib2012","key":"swib2012","value":{"rev":"1-698cd57203f826306f6b66a835f4bab1"}}, {"id":"swib2013","key":"swib2013","value":{"rev":"4-ae91e484504d0df3da5ab0e12b498dd9"}} ]}
- also document content: include_docs=true
- curl -D - --netrc -X GET "http://localhost:5984/swib-demo/_all_docs?include_docs=true"
- 200 OK
- {"total_rows":3,"offset":0,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f23000853","key":"9bd2a1b1e59a18aa615f2e4f23000853","value":{"rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"},"doc":{"_id":"9bd2a1b1e59a18aa615f2e4f23000853","_rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2","type":"event","name":"SWIB 2014"}}, {"id":"swib2012","key":"swib2012","value":{"rev":"1-698cd57203f826306f6b66a835f4bab1"},"doc":{"_id":"swib2012","_rev":"1-698cd57203f826306f6b66a835f4bab1","type":"event","name":"SWIB 2012","location":{"de":"Köln","en":"Cologne"}}}, {"id":"swib2013","key":"swib2013","value":{"rev":"4-ae91e484504d0df3da5ab0e12b498dd9"},"doc":{"_id":"swib2013","_rev":"4-ae91e484504d0df3da5ab0e12b498dd9","type":"event","name":"SWIB 2013","location":{"de":"Hamburg Wilhelmsburg"},"url":"http://swib.org/swib13"}} ]}
- just metadata:
- attachments:
- attach a file to a document
- need to pass:
- file name appended to path
- document revision
- attachment MIME Type
- attachment
- curl -D - --netrc -X PUT -H "Content-Type: image/png" -H "If-Match: 1-698cd57203f826306f6b66a835f4bab1" --data-binary @IIsaBurito.png http://localhost:5984/swib-demo/swib2012/kitty.png
- 100 Continue
- 201 Created
- {"ok":true,"id":"swib2012","rev":"2-e38f040510caae7aaaf3ec45e7816f63"}
- need to pass:
- attachment is not delivered back in JSON, just a »stub« in the _attachements object
- curl -D - -X GET "http://localhost:5984/swib-demo/swib2012/"
- 200 OK
- {"_id":"swib2012","_rev":"2-e38f040510caae7aaaf3ec45e7816f63","type":"event","name":"SWIB 2012","location":{"de":"Köln","en":"Cologne"},"_attachments":{"kitty.png":{"content_type":"image/png","revpos":2,"digest":"md5-1TF4Rlk5CuCasqC1TfvHCg==","length":231873,"stub":true}}}
- for the attachment use: curl -D - http://localhost:5984/swib-demo/swib2012/kitty.png
- include the stub when updating the document the next time to ensure attachments are preserved without having to re-upload them
- attach a file to a document
- play around a little / everybody comfortable? / can always do this in Futon
- Example datasets
- bothmer.json
- couch-marc.json
- ct_sample.json
- gnd-smith.json
- create a database
- Hello
- Part II : (14:00-15:15)
- design documents
- stored as documents _design/NAME (use the NAME swib), e.g. _design/swib
- views
- stored in design documents’s views array
- called with subpath _view/NAME (use the NAME type) of design document, e.g. /swib-demo/_design/swib/_view/type
- Futon has a simple interface for creating and testing them
- map
- map function creates an index entry for the document
- e.g. extract a field
- defined by a JavaScript function: function (doc) {}
- use the emit() command to add something to the index
- use map function
- function(doc) { emit(doc.type); }
- can also pass a second parameter to emit
- it will be available as the document’s value for the key
- available for reduce function, also available when calling views
- Query views: Map (examples use the documents from bothmer.json)
- all IDs and values:
- curl -D - -X GET "http://localhost:5984/ct/_design/swib/_view/fields"
- 200 OK
- {"total_rows":28843,"offset":0,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f230012f0","key":false,"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f23001424","key":false,"value":null}, …]}
- paging:
- skip=X [to skip documents 1-X]
- limit=Y [to only show Y documents]
- curl -D - -X GET "http://localhost:5984/ct/_design/swib/_view/fields?skip=1000&limit=3"
- 200 OK
- {"total_rows":28843,"offset":1000,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f231df8c9","key":false,"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f231e0589","key":false,"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f231e1436","key":false,"value":null} ]}
- only a specific key:
- key=KEY (as JSON; include strings in double quotes)
- curl -D - -X GET 'http://localhost:5984/ct/_design/swib/_view/fields?key=%5B"title"%5D' 200 OK
- {"total_rows":28843,"offset":23439,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f2308b69e","key":["title"],"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f2317c25c","key":["title"],"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f2324aff0","key":["title"],"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f23267bef","key":["title"],"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f2328739c","key":["title"],"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f2328b01f","key":["title"],"value":null}, {"id":"9bd2a1b1e59a18aa615f2e4f23301de6","key":["title"],"value":null} ]}
- a specific range of keys:
- startkey=
- endkey=
- inclusive_end=true
- ordering
- descending=true
- also return the full documents
- include_docs=true
- adds a doc key to each result row object
- curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?skip=2000&limit=1&include_docs=true'
- 200 OK
- {"total_rows":2077,"offset":2000,"rows":[ {"id":"a142db5e52dec15e28589f522d3f28e9","key":"wappen","value":null,"doc":{"_id":"a142db5e52dec15e28589f522d3f28e9","_rev":"1-f10b16c3c7ad916a49f1039f0dc67aa9","infourl":"http://de.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url|size|mime&format=json&titles=Datei:Treskow-Wappen.png\n","url":"http://upload.wikimedia.org/wikipedia/de/d/dd/Treskow-Wappen.png","filename":"Treskow-Wappen.png","height":787,"width":600,"mime":"image/png","descriptionurl":"http://de.wikipedia.org/wiki/Datei:Treskow-Wappen.png","type":"wappen","size":58566}} ]}
- all IDs and values:
- reduce
- reduce function computes a reduced value for all documents with the same mapped key
- e.g. function(key, values) { sum(values); }
- already built in:
- _count
- _sum
- _stats
- demo in Futon: use reduce to find out how many documents of each type are in the DB
- set reduce function of _design/swib/_view/type to: _count
- Query views: Reduce
- group
- curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?group=true'
- 200 OK
- {"rows":[ {"key":"allianzpartner","value":503}, {"key":"bothmerbothmer","value":23}, {"key":"event","value":3}, {"key":"namensträger","value":889}, {"key":"wappen","value":659} ]}
- non-obvious but very powerful:
- we can map to an array and then pick the number of components that should be equal
- function(doc) { if (doc.von) { century = doc.von.substr(0,2); } emit([doc.type, century]); }
- curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?group=true'
- 200 OK
- {"rows":[ {"key":["allianzpartner","11"],"value":2}, {"key":["allianzpartner","12"],"value":1}, {"key":["allianzpartner","13"],"value":13}, {"key":["allianzpartner","14"],"value":30}, {"key":["allianzpartner","15"],"value":82}, {"key":["allianzpartner","16"],"value":93}, {"key":["allianzpartner","17"],"value":108}, {"key":["allianzpartner","18"],"value":171}, {"key":["allianzpartner","19"],"value":3}, {"key":["bothmerbothmer","15"],"value":23}, {"key":["namensträger","11"],"value":4}, {"key":["namensträger","12"],"value":10}, {"key":["namensträger","13"],"value":39}, {"key":["namensträger","14"],"value":52}, {"key":["namensträger","15"],"value":116}, {"key":["namensträger","16"],"value":177}, {"key":["namensträger","17"],"value":246}, {"key":["namensträger","18"],"value":245}, {"key":["wappen","18"],"value":659} ]}
- can get the same result as before using grouplevel=1
- elag curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?group=true&group_level=1'
- 200 OK
- {"rows":[ {"key":["allianzpartner"],"value":503}, {"key":["bothmerbothmer"],"value":23}, {"key":["namensträger"],"value":889}, {"key":["wappen"],"value":659} ]}
- example:
- for dates you could write the timestamp
- if you need grouping by day / month / year it may be helpful to map [2013, 5, 28]
- group
- Map/Reduce Summary
- quite simple
- very powerful
- requires different thinking than SQL, possibly simpler if you are used to things like Solr
- no JOIN
- Intermission: couchapp
- http://couchapp.org/page/index
- using Futon for editing any of the following bits of design documents can be a bit of a pain thanks to escaping issues
- the command line tool couchapp can help with that
- you can clone the design document into a file/folder structure on your hard drive
- use your favourite editor
- then push the design document back into CouchDB
- a bit like git, really
- try to install it
- mkdir myCouchDesign; cd myCouchDesign
- couchapp init
- couchapp clone http://localhost:5984/swib-demo
- set .couchapprc to
- { "env" : { "default" : { "db" : "http://localhost:5984/swib-demo" } } }
- also check out the newer couchapp alternatives listed on the website to see which one suits you the best
- show interface
- a way to implement different output formats
- in particular for different MIME types
- function (doc, req) {}
- return object with
- headers:
- Content-Type:
- body:
- Text
- json:
- JavaScript object
- base64:
- binary data
- headers:
- example:
- function (doc, req) { return { "headers": {"Content-Type": "text/plain"}, "body": "Hello World, this is Document ID: " + doc._id }; }
- curl -D - -X GET 'http://localhost:5984/ct/_design/swib/_show/test/9bd2a1b1e59a18aa615f2e4f233ea6e2'
- 200 OK
- Content-Type: text/plain
- Hello World, this is Document ID: swib2013
- includes automatic MIME Type Handling
-
register MIME Types with internal name using registerType(INTERNALNAME, MIMETYPES)
-
common MIME Types text/json/xml/html/atom/… are predefined
-
provide output for them using provides(INTERNALNAME, function() { return {} });
-
function(doc, req){ provides('json', function(){ return {'json': doc} });
provides('html', function(){ var listItem = function (key, value) { var result = ''; if (key) { result += '
- ' + key + '
- '; } if (typeof value === 'object') { result += '
'; } return result; }- ';
for (var i in value) {
if (i[0] !== '_') {
result += listItem(i, value[i]);
}
}
result += '
return '<html><head><title>'.concat(doc._id, '</title>', '<style type="text/css">dt { font-weight:bold; }</style></head>', '<body><h1>Document »' + doc._id + '«</h1>', listItem(undefined, doc), '</body><html>');
});
registerType('text-json', 'text/json') provides('text-json', function(){ return toJSON(doc); }); }
-
curl -D - -H "Accept: text/html" -X GET 'http://localhost:5984/swib-demo/_design/swib/_show/content-types/swib2013'
-
200 OK Vary: Accept Content-Type: text/html; charset=utf-8
-
<title>swib2013</title><style type="text/css">dt { font-weight:bold; }</style>
- type
- event
- name
- ELAG 2013
- location
- nl
- Gent
- en
- Ghent
- url
- http://swib2013.org/
- hashtag
- swib2013
-
- Idea: Can we use this for a LOD server
- configure the redirection headers as needed
- create some triples for a MARC record?
- e.g. the ones in couch-marc
- list interface
- show / document :: list/view
- can also use provides() / registerType()
- call as /DB/_design/DESIGN/_list/LIST/VIEW, e.g.
- useful for a quick CSV export
- can send output in bits
- start({'headers': {})
- send() repeatable
- Applications:
- simple CSV export
- dot file output for GraphViz
- KML output
- TeX output?
- validation
- validate_doc_update
- each design document can have one
- all of them are applied
- modularity
- e.g.: enforce that each document has a type field:
- function(newDoc, oldDoc, userCtx, secObj) { if (!newDoc._deleted) { if (!newDoc.type) { throw({"forbidden": "documents need to have a type"}); } } }
- return when accepting document
- throw({'forbidden': 'Explanation'}) when not accepting
- use cases
- in web pages:
- DS
- edfu Analyse
- other formats
- bothmer / GraphViz
- crcg with python / KML / GraphViz
- in web pages:
- design documents
- Part III: (16:00-19:00)
- own little project with CouchDB
- we have three data sets available
- did anybody bring their own? / wants to create some?
- Ideas?
- LOD?
- Replication
- CouchDB includes a replicator
- »eventual consistency«
- conceptually a good match for today’s frequently disconnected smartphones
- clone data to the phone, to be fully accessible at any time
- re-sync when network is available
- replication setup stored in _replicator database
- slightly different in older CouchDB versions
- {"_id": "my_rep", "source": "http://myserver.com:5984/foo", "target": "bar", "create_target": true }
- show in Futon
- can replicate continuously
- replication can be filtered
- filter functions at filters/NAME
- return true/false to indicate whether the
- function(doc, req) { if(doc._deleted) { return true; } if (doc.von) { if (parseInt(doc.von) > 1900) { return false; } } return true; }
- Changes feed
- (replication is based on it)
- continuous query also possible
- Bonus: CouchDB River for ElasticSearch
- https://github.com/elasticsearch/elasticsearch-river-couchdb
- sudo plugin -install elasticsearch/elasticsearch-river-couchdb/1.2.0
- to use JS in the River, also need to install lang-javascript:
- sudo ./plugin -install elasticsearch/elasticsearch-lang-javascript/1.4.0
- curl -X PUT http://localhost:9200/ct
- curl -X PUT 'http://localhost:9200/_river/ct/_meta' -d '{ "type" : "couchdb", "couchdb" : { "host" : "localhost", "port" : 5984, "db" : "ct", "filter" : null, "script" : "ctx.doc.ssp = 13" }, "index" : { "index" : "ct", "type" : "ct", "bulk_size" : "100", "bulk_timeout" : "10ms" } }'
- Bonus: CouchDB Lucene
- separate Java application that integrates with CouchDB
- https://github.com/rnewson/couchdb-lucene
- install / config
- brew install couchdb-lucene
- launch it
- configure CouchDB
- external fti /usr/bin/python /usr/local/Cellar/couchdb-lucene/0.9.0/tools/couchdb-external-hook.p
- httpd_db_handlers _fti {couch_httpd_external, handle_external_req, <<"fti">>}
- httpd_global_handlers _fti {couch_httpd_proxy, handle_proxy_req, <<"http://127.0.0.1:5985">>}
- add to design document at path fulltext/INDEXNAME/index
-
function(doc) { var result = new Document(); if (!doc._id.match(/^(_design)/)) { function index(obj, keyPrefix) { for (var key in obj) { var value = obj[key]; switch (typeof(value)) { case 'object': index(value, (keyPrefix ? keyPrefix + "." : "") + key); break; case 'function': break; default: result.add(value); var fieldName; if (obj.constructor === Array) { fieldName = keyPrefix; } else { fieldName = (keyPrefix ? keyPrefix + "." : "") + key; } result.add(value, {"field":fieldName , "store":"yes"}); break; } } };
index(doc, ""); if (doc._attachments) { for (var i in doc._attachments) { result.attachment("default", i); } }
} return result; }
-
- query at DB/_fti/_design/DESIGN/fulltext
- with q=LUCENEQUERY
- http://localhost:5984/ct/_fti/_design/swib/everything?q=Hamburg
- Bonus TouchDB:
- CouchDB compatible library for iOS, Android, Mac, …
- includes replication
- Bonus: hoodie
- own little project with CouchDB
Thanks for your attention.