Skip to content

Instantly share code, notes, and snippets.

@danielballan
Last active September 22, 2022 20:46
Show Gist options
  • Select an option

  • Save danielballan/c37405db4d23c5b05f72a4b4bce9c8fc to your computer and use it in GitHub Desktop.

Select an option

Save danielballan/c37405db4d23c5b05f72a4b4bce9c8fc to your computer and use it in GitHub Desktop.
Databroker Migration Guide

Databroker Migration Guide

What?

This is a draft of a guide for migrating from Databroker v1.x to Databroker v2.x (currently in prerelease). The data storage does not change; only the way it is accessed changes. It is possible to run Databroker v1.x and 2.x against the same MongoDB concurrently. Databroker 1.x was effectively a plugin to Intake. Databroker 2.x refactors Databroker as a plugin to Tiled, and drops any depenedency on Intake.

Databroker 2.x supports backward-compatible* usage:

from databroker import Broker
db = Broker.named("xyz")

as well as access via the Tiled Python API, illustrated below.

*Some methods in Databroker 1.x cannot be supported, but we find that the vast majority of user code runs unchanged.

Why?

By reimagining Databroker as a service we get the following advantages:

  • It is possible to access Databroker data from any language, not just Python, via HTTP.
  • It is possible to enforce granular access controls.
  • Databroker data can be transcoded into many formats.
  • Databroker data can be served alongside data, such as analysis results, that does not lend itself to Bluesky's event-based data model.
  • For many workloads, it is much faster.

How?

Install

pip install --upgrade --pre databroker[all]

To test/debug: use direct access

In this mode, the "server" and the client run in the same process. Data is passed between them via Python function calls. There is no actual networking. This is useful for debugging.

# ~/.config/tiled/profiles/test.yml
xyz:
  direct:
    authentication:
      allow_anonymous_access: true
    trees:
    - tree: databroker.mongo_normalized:Tree.from_uri
      path: /
      args:
        uri: mongodb://{hostname}:{port}/{database}
        asset_registry_uri: mongodb://{hostname}:{port}/{database}  # may be omitted if it's the same as uri above

New API (Tiled):

from tiled.client import from_profile
c = from_profile("xyz")

Backward-compatible API:

from databroker import Broker
db = Broker.named("xyz")

At this stage, you should see the speed benefits.

Common issue: shape metadata problems

All versions of Bluesky/Ophyd have captured the shape and dtype of external (e.g. Area Detector) data. However, until now, nothing actually relied on that information being correct. As such, we have only recently discovered and addressed bugs where the wrong shape or dtype were being recorded. If you encounter errors like BadShapeMetadata, this is why. Fortunately, there is an automated way to fix this: we have a script that opens each data set, looks at the actual shape, and updates the relevant document(s) in MongoDB to reflect. It's not quite ready for sharing, but it can be made ready soon.

Next steps: Start a server

To get proper security and access control, we need to run a real server. The configuration is similar: everything that was under direct: above now goes at top level.

# config.yml
authentication:
  allow_anonymous_access: true
trees:
- tree: databroker.mongo_normalized:Tree.from_uri
  path: /
  args:
    uri: mongodb://{hostname}:{port}/{database}
    asset_registry_uri: mongodb://{hostname}:{port}/{database}  # may be omitted if it's the same as uri above

The config.yml can be placed anywhere. We pass its location to the command below to start the server:

tiled serve config config.yml

And we can connect to it like:

from tiled.client import from_uri
c = from_uri("http://localhost:8000/api")

We can update our profile:

# ~/.config/tiled/profiles/test.yml
xyz:
  uri: http://localhost:8000/api

and now the same client-side usage as before will connect to the actual server instead of running one directly in-process. Exactly as before:

from tiled.client import from_profile
c = from_profile("xyz")

Backward-compatible API:

from databroker import Broker
db = Broker.named("xyz")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment