Skip to content

Instantly share code, notes, and snippets.

@naneau
Created April 16, 2010 10:26
Show Gist options
  • Save naneau/368253 to your computer and use it in GitHub Desktop.
Save naneau/368253 to your computer and use it in GitHub Desktop.

#Design choices for Springbok

##Problem

We have been struggling for a while with ways to deal with objects and child objects in MongoDB. Basically the problem boils down to the following:

When an object in our domain - say a project - has a number of child objects - say milestones - in what way do we store them?

There are basically two ways of doing it.

  • We can have objects embedded within other objects and use in place inserts/updates to work with them. This means that in order to get the embedded object you have to fetch the parent then retrieve the embedded object from it either by index or by an arbitrary, self-assigned identifier that has no intrinsic value in the backend. This is more in line with traditional "normalized" relational databases, since there are no duplicates of data, although sub-objects are of course foreign to SQL based RDBMS-es.
  • We can have those objects in different collections and use references to couple them. Then we save information about the parent object with it, so that when fetched they will hold all information relevant for displaying. In practice this usually means saving a name or title string of the parent, combined with an identifier or reference. This approach gives us singular fetch access to the domain, instead of requiring the parent id in combination with a separate id/offset for the child object. It means that updates to objects of which properties are stored outside of their own collection require (perhaps asynchronous) updates there, too.

##Service Layer

As a rule, we should end up with a system where we can swap out one set of mappers for another. This means that when we design our service layer, it should be in such a way that it makes sense as an entrance to our business logic / domain. It also means that it should be generic enough that it will work with both relational and non-relational persistence layers. On another level, our DomainObjects should be entirely unaware of their mappers and the persistence layer. Any logic coupling one type of object to another should preferably take place in a way that allows mappers for different types of persistence layers to couple them in a way ideally suited to them.

The more generic approach to the domain access may seem to make things more difficult at first. After all, it is most definitely easier to make the service layer apply to just Mongo-based data access. However, for somebody who is charged with extending our application, this will make things less clear. Although it is highly unlikely we will ever switch persistence layers, a non specific service layer will also force us to make our application simpler and more logical. We should aim to have services for all the access points of our application. At the time of writing these are User, Project, Milestone and Ticket.

##Challenges

The challenge lies in designing our DomainObjects in such a way that they make sense as entities, coupled with a services that presents the easiest access to them as possible. They should be "complete" when they exit their mappers. At the same time, it should be intuitive to couple them. Adding a milestone to a project, or modifying a milestone within a project should be possible with one method call on either the DomainObject or the relevant Service.

There is a balance between things that need to be accessed and modified on their own and things that are embedded by default. Tags are a prime example of items that make little sense on their own. They exist only within the objects that tag them. Yet, at one point we may want to retrieve a list of all tags for auto-completion of tag input.

##Solutions

When looking at the Milestones and Projects types as an example, we would like to retrieve a milestone with a single identifier. We would also want a service for dealing with them separately from the project service, where the milestone and project serves as parameters for functions modifying the milestone inside of a project. Even if we decide to make milestones live inside of a project on the backend, we want to be able to retrieve and modify them from the application as if they were separate entities.

When working with sub-objects this either means manually assigning a non-intrinsic identifier to a milestone, or giving them an identifier that's based on the identifier for the project they are in, coupled with an index. In case of the latter the service layer does not have to be aware that it is receiving or writing to milestones that are part of a project. This solution would, however, not work for tickets, as the size of any object (be it milestone or project) is limited, and large numbers of child objects make the parent too large.

When it comes to the DomainObjects themselves, we can employ two techniques to make them carry the right information. As an example, when modeling the Tickets, we can do the following:

  • Each ticket object comes with a full reference to the the project it's in, and the milestone. This means that it takes more than one fetch to get a ticket. We could employ lazy-loading for these, at the trade-off of complexity. This will not work well when it comes to large sets of things that have references to another large set (think comments), as it will result in large numbers of fetches.
  • The ticket has properties like projectName and milestoneName, or alternatively, a property project with both a name and id. This approach makes most sense in the context of MongoDB, it's basically a precomputed join. It makes updates to projects more complicated, however. It requires that some component listens for events signaling project/milestone updates, and updates the references to the projects in the Ticket collection accordingly. Ultimately, this approach will scale better.

Right now we have a branch that implements the latter to a degree, by having listeners to save events in the service layers. This is not ideal/proper, since these listeners only apply to the MongoDB backend.

I strongly feel that when working with a non-relational database, we should take advantage of the read speed. Trying to model our database in the way we have been doing up until now is bound to fail. We should focus instead in modeling the database after the way we access stuff from the application. Let's define access points, and what we expect back from that. Then model our db in a way that just gives us what we want, and worry about consistency/updates later.

Objects should basically be in a state that makes them usable right from a single fetch. We should aim to make our objects "complete". Objects that belong to a parent will be saved with a "reference" object inside, which includes both the name and a MongoReference to the item. We set up a separate MongoManager (but better named) component that manages updates to parent objects. Ideally we can set up a simple configuration that makes it easier to manage the updates. Since it's typically a singular textual property that identifies the parent, this shouldn't be too hard.

dik-diks rule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment