-
-
Save rnewson/2387973 to your computer and use it in GitHub Desktop.
| A list of features that we want to see in CouchDB. Needs to be voted on so that it can become a priority queue. | |
| User Facing Features | |
| ==================== | |
| 1. Conflicts are the rule, not the exception | |
| All previous versions of CouchDB hide conflicts by default (selecting | |
| an arbitrary but consistent winning revision). Expert users can find | |
| and resolve conflicts. | |
| Instead, expose the true picture by default, this includes; | |
| * Reading a document with conflicts returns all conflicting versions, | |
| not just the winner. This might manifest as the ?conflicts=true | |
| response or could be a 300 (Multiple Choices) response. | |
| * Always accept a write (as long as it passes all validate_doc_update | |
| functions). This means that no response will give a 409 (Conflict) by | |
| default. You can still insist on a matching revision by using the | |
| If-Match header. | |
| * _rev is frequently assumed to be a user-facing revision/versioning | |
| system, our efforts to convince otherwise have failed. Embrace this | |
| also and rename the field to _mvcc. | |
| 2. Replace Futon | |
| A modern interface that has first class support for all features | |
| (proper editing for validate_doc_update, show and list functions, | |
| etc). | |
| 3. Improve the user and security model | |
| * Support distributed identity systems such as OpenID | |
| * Allow for easier external authentication | |
| * Finer grained authorization (instead of the binary _admin or not) | |
| * Instead of exposing /_users as a database, design an API to cover | |
| all expected operations instead. | |
| Fine-grained authorization would allow the ability grant read and write | |
| access independently, among other things. Specifically it should be possible | |
| to grant the ability to write but not read. | |
| 4. Remove reserved metadata from documents | |
| CouchDB treats some fields specially (_id, _rev, _attachments), | |
| requiring a transformation process when reading and writing documents. | |
| Removing the fields would allow higher performance and alternative | |
| data types. A question remains as to where they would go, as not all | |
| map to a standard HTTP header (_rev maps neatly to ETag, | |
| though). Custom HTTP headers is an obvious solution, are there others? | |
| 5. built-in map functions | |
| To complement the built-in reduce functions (_sum, _count, _stats) for | |
| common use cases. | |
| 6. DSL for index creation, validation functions, etc. | |
| The DSL would be very simple, deliberately not capable of expressing | |
| all possible algorithms, but can always be efficiently evaluated | |
| within the native VM. This will be faster in general but also avoids | |
| managing a pool of couchjs processes. | |
| 7. Support CORS | |
| 8. Support WebSockets | |
| 9. Support EventSource | |
| 10. Support SPDY | |
| 11. Support richer reduce functions | |
| Any reduce function with an output that grows no more than logarithmically with the input but might be substantially larger than the current 200 byte threshold | |
| 12. Richer querying model | |
| While CouchDB views are powerful, they are are not as capable as | |
| relational queries. This is largely deliberate as the fully relational | |
| model is hard to scale. | |
| One method to improve things (and provide the ability to sort by | |
| value) is to introduce chained MapReduce (currently only available on cloudant). | |
| This item also includes any other enhancement to the kinds of querying that | |
| CouchDB can perform. | |
| 13. Partial updates of documents | |
| It should be possible to change just a subset of a document's | |
| properties without needing to write an update function. | |
| 14. Partial reads of documents | |
| It should be possible to read just a subset of a document's properties | |
| without needing to write a show function. | |
| 15. Create an exclusive namespace for databases | |
| Databases have a fixed position in the URL (/<dbname>/) but share it | |
| with many other items (/_log, /_replicate, etc). Fix this by | |
| introducing an exclusive namespace for dbs (e.g, /_db/<dbname>) | |
| 16. Improve replication interoperability between implementations and | |
| versions | |
| * Introduce a tiered replication model, starting with a very simple | |
| 'dump' and 'load' tier, all the way up to a highly optimized, but | |
| complex, protocol that reduces redundant data transmission. | |
| * This work would also form the basis of an export/import feature for | |
| incremental and complete backups. | |
| 17. Enhance background task management | |
| Currently, replication tasks can be cancelled, awkwardly. Compaction | |
| tasks and view building tasks cannot, short of restarting the server | |
| or remote shell access to the erland VM. | |
| Provide a consistent and simple API for cancelling any running | |
| task. This API will also provide status/progress information where | |
| appropriate and pause/resume where possible too. | |
| 18. Documentation | |
| Documentation is scattered, dated, and incomplete. Couchbase have | |
| donated their improved docs. We will incorporate these into the new | |
| house style, fill in any gaps, and commit to updating documentation in | |
| line with new releases. | |
| 19. Global changes feed | |
| One or both of; | |
| * A feed of server events like db creation, update and deletion. | |
| * A federated changes feed for a selected set of database and changes feeds. | |
| 20. Allow database renaming | |
| Self-explanatory, but the feature is complicated by the bigcouch | |
| merger as the rename of a sharded database is not atomic without effort. | |
| 21. Database "aliases" (symlinks) | |
| Visiting a database symlink will seamlessly redirect to the target database. | |
| 22. _changes feed for views | |
| It should be possible to subscribe to view changes the same way we can | |
| subscribe to database changes. This will enable many useful things, | |
| chained map-reduce being a notable one. | |
| 23. per-db overrides of server-wide settings | |
| Allow db-specific overrides for otherwise server wide configuration | |
| settings, where sensible. | |
| Developer Facing Features | |
| ========================= | |
| 1. OTP compliance/refactoring | |
| 2. Different HTTP engine (webmachine -> cowboy/yaws/etc) | |
| 3. Have hard dependencies on SpiderMonkey versions. Also simplifies the build. | |
| 4. Test suites for different versions of replication, file formats, etc. | |
| 5. Move attachments out of database files (which removes make_blocks) | |
| 6. Plugin/addon/module interface | |
| 7. View server protocol enhancements/refactoring | |
| 8. Make .ini config files optional: (1) move defaults into the code, (2) instead of local/default, ship a fully complete config with all of its lines commented out | |
| 9. Database corruption detection and repair | |
| While CouchDB's append-only model is very safe, underlying issues with | |
| filesystems and hardware can still corrupt databases. CouchDB can; | |
| * add checksums on everything (btree nodes, documents, etc) | |
| * ship a tool to verify all checksums. | |
| * include a repair tool (that extracts everything extractable) | |
| * perhaps include ECC information to allow recovery from corruption. |
4 . Remove reserved metadata from documents
Removing the fields would allow higher performance and alternative
data types. A question remains as to where they would go, as not all
map to a standard HTTP header (_rev maps neatly to ETag,
though). Custom HTTP headers is an obvious solution, are there others?
I think it would be very handy to have an object with the metadata and the data. For example the following document:
{ _id: "ID", _rev: "1-abc", foo: "bar", ... }
would become:
{ _id: "ID", _rev: "1-abc", _data: { foo: "bar", ... } }
or, better:
{ id: "ID", rev: "1-abc", data: { foo: "bar", ... } }
This would allow for arrays, or even primitive types, as data:
{ id: "ID", rev: "1-abc", data: [ "bar", ... ] }
and it would not require parsing of HTTP headers. This is also very similar to the response of a view, where the returned object has metadata clearly separated from the document, when include_docs=true.
TTL - auto expiring records, and have them completely removed after expiration similar to couchbase.
I'd add a definitions section for the following:
CORS - Cross-origin resource sharing (CORS) is a web browser technology specification, which defines ways for a web server to allow its resources to be accessed by a web page from a different domain. http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
EventSource - Server-sent events is a technology for providing push notifications from a server to a browser client in the form of DOM events. The Server-Sent Events EventSource API is now being standardized as part of HTML5[1] by the W3C. http://en.wikipedia.org/wiki/Server-sent_events
SPDY - SPDY (pronounced speedy)[1] is an experimental networking protocol developed primarily at Google for transporting web content.[1] Although not currently a standard protocol, the group developing SPDY has stated publicly that it is working toward standardization (available now as an Internet Draft[2]), and has reference implementations available in both Google Chrome [3] and Mozilla Firefox.[4] SPDY is similar to HTTP, with particular goals to reduce web page load latency and improve web security. SPDY achieves reduced latency through compression, multiplexing, and prioritization.[1] The name is not an acronym, but is a shortened version of the word "speedy".[5] SPDY is a trademark of Google.[6] http://en.wikipedia.org/wiki/SPDY
I would include the two lists in any upcoming vote, perhaps as separate votes.
A few other comments:
On Number 4 in the first list: Seems to be only a partial description, remove data from record and do what with it? How would it be accessed and used. especially _id, likely to have a major impact on any project using couchdb.
On number 3: Hard dependencies on SpiderMonkey, Should this not be discussed more? Performance being a big issue, V8 may be a better choice or a set of native erlang routines or some other derivative.