Hannah's Learn-A-New-Language Exercises

Tag Server

Write a REST backend exposing a user tagging API. Exactly how to store the data backing the API is left as an exercise for the user, and depends on whether there are storage technologies you're interested in exploring. I frequently just use in-memory data structures.

The API

The API must expose one POST endpoint that allows callers to add tags to a user, delete tags from a user, and retrieve tags for a user. Request format:

{
  "user": "bob",
  "timestamp": 1000,
  "add": ["tag1, tag2"],
  "remove": ["tag3"]
}

The user field is always required. The add and delete fields are both optional, but if either is present timestamp is required as well. Assume that timestamps will be a 64bit int holding millis since the epoch. Responses are the list of tags on the user following the operation:

Response format:

{
  "tags": ["tag1", "tag2"]
}

where tags may be empty if the system has never seen the user or if all tags have been removed. The order of the tag list is unspecified.

The Interesting Bit

Assume that requests may arrive out of order relative to their timestamps, that is the API may receive a removal of tag t at timestamp 1000 followed an arbitrary amount of time later by an add of t at timestamp 999. The API must handle this, and following both operations landing the user should not be tagged t.

Assume that requests may be duplicated, possibly with different timestamps, and adding a tag t multiple times must result in only a single t in responses.

Full Dataset Cache Library

Implement a simplified version of Sarlacc Pit, a library that allows cacheing large, but small enough to comfortably fit in memory, datasets. This is an exercise in understanding a language's behavior around locking, as well as generics if using a language with them.

The service can expose the backing data set as several collection types, all of which should be immutable from a client perspective. If possible, I like to use the builtin Map/Set/List interfaces of the language. Updates to the dataset should happen atomically without blocking readers for any longer than a pointer swap if possible.

Each instance incorporates two user-provided objects/functions:

A data source, which should support fetch and fetch-if-newer operations
A processor, which ingests the data source's output and transforms it into the collection to be exposed

Each instance should poll the data source on a configurable interval for new versions of the dataset, ingesting them and swapping them in to be read by client code. Ideally this swap will be atomic without claiming any locks, this is implemented in the Java implementation using an AtomicReference.

Extra Credit:

Implement the capabilities listed under Optional Service Parameters as well.

hkolbeck/TagServer.md

Hannah's Learn-A-New-Language Exercises