Tag Server
Write a REST backend exposing a user tagging API. Exactly how to store the data backing the API is left as an exercise for the user, and depends on whether there are storage technologies you're interested in exploring. I frequently just use in-memory data structures.
The API
The API must expose one POST endpoint that allows callers to add tags to a user, delete tags from a user, and retrieve tags for a user. Request format:
{
"user": "bob",
"timestamp": 1000,
"add": ["tag1, tag2"],
"remove": ["tag3"]
}
The user
field is always required. The add
and delete
fields are both optional, but if either is present timestamp
is required as well. Assume that timestamps will be a 64bit int holding millis since the epoch. Responses are the
list of tags on the user following the operation:
Response format:
{
"tags": ["tag1", "tag2"]
}
where tags
may be empty if the system has never seen the user or if all tags have been removed. The order of the tag list
is unspecified.
The Interesting Bit
Assume that requests may arrive out of order relative to their timestamps, that is the API may receive a removal of tag t
at timestamp 1000
followed an arbitrary amount of time later by an add of t
at timestamp 999
. The API must handle this,
and following both operations landing the user should not be tagged t
.
Assume that requests may be duplicated, possibly with different timestamps, and adding a tag t
multiple times must result
in only a single t
in responses.
Full Dataset Cache Library
Implement a simplified version of Sarlacc Pit, a library that allows cacheing large, but small enough to comfortably fit in memory, datasets. This is an exercise in understanding a language's behavior around locking, as well as generics if using a language with them.
The service can expose the backing data set as several collection types, all of which should be immutable from a client
perspective. If possible, I like to use the builtin Map
/Set
/List
interfaces of the language. Updates to the dataset
should happen atomically without blocking readers for any longer than a pointer swap if possible.
Each instance incorporates two user-provided objects/functions:
- A data source, which should support fetch and fetch-if-newer operations
- A processor, which ingests the data source's output and transforms it into the collection to be exposed
Each instance should poll the data source on a configurable interval for new versions of the dataset, ingesting them and
swapping them in to be read by client code. Ideally this swap will be atomic without claiming any locks, this is implemented
in the Java implementation using an AtomicReference
.
Extra Credit:
Implement the capabilities listed under Optional Service Parameters
as well.