A specification for designing unified search APIs and UIs.
For data-intensive applications with heavy searching/filtering workflows, devs will increasingly find a need to unify both the search API and UI. This document outlines a recommended solution that scales with future requirements, and preserving flexibility of implementation in both API and UI.
Support the following requirements:
- Provide a specification (not an implementation) on how this would work.
- Define the spec of search
filters
and how the system would use these objects. - Design a unified API interface around
filters
. - Design a UI component responsible for rendering and setting
filters
.
A search query/request can be expressed as an array of search filters
. These filters
are simple serializable objects and are human-readable. E.g.
// Filter documents that contain concepts ['cid1', 'cid2'] and matches exactly on text 'drug'
const filters = [
{field: 'CONCEPT', operator: 'CONTAINS', values: ['cid1', 'cid2']},
{field: 'TEXT', operator: 'EQUALS', values: ['drug']},
];
// Filter documents that does NOT contain concepts ['cid1', 'cid2'] and matches inexactly on text 'drug'
const filters = [
{field: 'CONCEPT', operator: 'NOT_CONTAINS', values: ['cid1', 'cid2']},
{field: 'TEXT', operator: 'LIKE', values: ['drug']},
];
// Filter cats that are female and special (without using OPERATORS)
const filters = [
{field: 'IS_SPECIAL', values: ['true']},
{field: 'GENDER', values: ['female']},
];
The examples above use the following spec:
enum Field {
CONCEPT = 'concept',
TEXT = 'text',
METADATA = 'metadata',
ANYTHING_CAN_BE_USED_HERE = 'anything',
}
enum Operator {
EQUALS = 'equals',
NOT_EQUALS = 'not_equals',
ANYTHING_CAN_BE_USED_HERE = 'anything',
}
enum DataType {
STRING = 'string',
NUMBER = 'number',
DATE = 'datetime',
ANYTHING_CAN_BE_USED_HERE = 'anything',
}
interface Filter {
field: Field;
values: string[];
operator?: Operator; // optional
dataType?: DataType; // optional
}
interface SearchRequest {
filters: Filter[];
}
This is a required field typed as an array of strings to capture the filter values. The choice for an array of strings will support the use cases of multiple values to be applied on a given field
(e.g. checkbox selections in the UI). You can choose an alternative design for typing this field, or work with nullable values.
This is a required field informing the API how to apply filter logic in its implementation. It can take on any user-defined values and the spec is unopinionated about mapping this to the structure of the data model.
This is an optional enum that provides additional context on how to apply the filter values of a provided filter. It is optional because you can choose to model your search APIs with just fields
(i.e. fields: ['LIKE_TEXT', 'EQUALS_TEXT']
vs {field: 'TEXT', operators: ['LIKE', 'EQUALS']
), but it is generally a good idea to introduce this if your search complexity grows and you need to better separate the notions of fields
and operators
.
This is an optional enum that describes the data type of the filter field
. The filter value parser can depend on this field for parsing its value, but it is not required. UIs can leverage this field to decide what kind of filter input to render.
NOTE: The spec is unopinionated about how
fields
,operators
,dataTypes
are defined and used. These are defined in the spec to propose a way to scale search API interfaces based on common search requirements and features. At the end of the day, the goal is to pass enough structured data down to the API layer to perform various filter operations.
With a defined search spec and types, we can reason about search requests in simple ways in the API. Here is an example of implementing a search endpoint in Python using the example search request in the earlier section (forgive the terse pseudocode since I am not actually familiar with Python libraries):
"""
example_request = {
filters: [
{field: 'CONCEPT', operator: 'CONTAINS', values: ['cid1', 'cid2']},
{field: 'TEXT', operator: 'EQUALS', values: ['drug']},
]
}
"""
def document_search():
result = []
dataset_id = request.body.pop('id')
filters = request.body.pop('filters')
validate(filters) # custom filter validation logic you can implement
# build ORM query. This can be abstracted into a helper function to better organize code if you prefer
for filter in request.body.filters:
document_query = Document.query(dataset_id)
if filter.field == 'CONCEPT':
concept_ids = parse_concept_ids_filter(filter) # custom parser for concept_ids
document_query.where(concept_ids)
elif filter.field == 'TEXT':
is_exact = filter.operator == 'EXACT'
text = parse_text_filter(filter)
document_query.where(text)
# run query
try:
results = document_query.run()
catch:
raise Exception('Query failed')
return results
The above code implements custom filter value parsers (i.e. parse_concept_ids_filter
, parse_text_filter
), and makes use of the data in the Filter
object (i.e. field
and operator
). The spec remains unopinionated about how these should be organized and written, and it is not a bad practice to model value parsers based on Filter.dataTypes
instead of Filter.fields
as the above example has done.
The following is an example of refactoring an old cats_search
API endpoint using a shared query_builder
method if DB models and search filters have shareable functionalities:
"""
old_request = {
is_special: true,
gender: 'female',
}
new_request = {
filters: [
{field: 'IS_SPECIAL', values: ['true']},
{field: 'GENDER', values: ['female']},
]
}
"""
def old_cats_search():
is_special = request.body.pop('is_special')
gender = request.body.pop('gender')
cats_query = Cats.query() if not is_special else SpecialCats.query()
return cats_query.filter(gender).run()
def new_cats_search():
filters = request.body.pop('filters')
return query_builder(Cats, filters)
def dogs_search():
filters = request.body.pop('filters')
return query_builder(Dogs, filters) # reusable
def monkeys_search():
filters = request.body.pop('filters')
return query_builder(Monkeys, filters) # reusable
def not_db_monkeys_search():
filters = request.body.pop('filters')
es_query = es.query('monkeys')
for filter in filters: # cannot reuse query_builder, so write it explicitly.
es_query.filter(get_es_filter(filter)) # define this accordingly
return es_query.run()
def query_builder(Model, filters, value_parser = default_value_parser):
"""
A generalized query_builder if DB/Models share similar filtering logic.
"""
query = Model.query()
for filter in filters:
values = value_parser(filter.dataType)
field = filter.field
if values: # if valid parsed value exists
query.where(field, values)
return query.run()
This example shows that although the old endpoint was simple and readable, it does not provide a way for reuse in other common search endpoints. Since the cost of creating search API/UI is expensive, having a spec that guides best practices allows the ability to define areas to reuse this piece of logic. As search requirements become more complex (e.g. more fields and operators), this spec allows various code to be abstracted and shared across endpoints.
NOTE: The general rules of abstracting code applies here so strike a balance between implementing shared methods and writing explicit code.
A unified search UI based on this search spec can be built with the design choices in the following section. Note that this search component was heavily utilized at Facebook across multiple data/filter-intensive applications.
NOTE: You can implement the
UnifiedSearch
component in various ways. In the end, the component simply needs a way to render and set thefilters
data. Airtable filters or Slack search are all flavors of rendering a "unified" search component.
Just as the backend has a simple interface to understand serializable and human-readable filters
, the UI component can easily render filters
in the following proposed way:
const filters = [
{field: 'CONCEPT', dataType: 'ENUM', operator: 'CONTAINS', values: ['cid1', 'cid2']},
{field: 'TEXT', dataType: 'STRING', operator: 'EQUALS', values: ['drug']},
];
Note that this UI:
- Summarizes a complete description of what
filters
has been applied in a concise and human-readable UI. - It allows users to visually know what
dataType
,operators
are applied to specificfields
. - All enums for
fields
,operators
,dataTypes
are statically defined on the server, so these can be provided to the UI to decide how to render the data types specifically, as shown above.
The UI component supports the following features to update filters
:
- A
Clear All
button to clear all filters. - Each
filter
"token" can be removed/cleared. - Typing in the component should suggest possible values that can be applied based on the schema of the filters specified by the server.
- Clicking on the
Add Filter
button allows creation of a new filter. Note that clients can choose to implement theAdd Filter
button as a focus action on the component input. When adding a filter, the user is prompted to:- Select a
field
- Select
operators
if they exist - Provide the input
values
. Inputs are rendered based on thedataType
of the filter object.
- Select a
This is pseudocode for the React component that highlights important implementation details and prop API:
enum DataType = { // match with server's definition
BOOLEAN: 'BOOLEAN';
STRING: 'STRING';
}
interface Field = {
id: string;
dataType: DataType;
label: string;
};
interface Operator = {
id: string;
label: string;
};
// Schema defines the static fields and operators that the component can render
// for setting fields/operators when creating filters. It is stored in normalized
// form for easy retrieval
interface Schema = {
fields: {
[fieldId: string]: Field;
}
operators: {
[operatorId: string]: Operator;
}
}
interface Filter = {
id: string;
field: Field;
operator: Operator;
}
const UnifiedSearch = ({
schema: Schema,
filters: Filter[],
onUpdate: ChangeHandler,
}): JSX.Element => {
<div>
<SearchInput />
{filters.map(filter => <Filter filter={filter} schema={schema} />)}
<NewFilter schema={schema} />
</div>
}
The component above uses the associated child components:
const SearchInput = () => {
// renders the basic search input and search suggestions.
// Search suggestions can be statically computed and determined based on `schema` and currently applied `filters`
// Contains a `Clear All` functionality to remove all applied `filters`.
}
const Filter = ({ filter, schema }) => {
const field = schema[filter.field.id];
const operator = schema[filter.operator.id];
return (
<div>
<Field field={field} /> // field name and dataType icon
<Operator operator={operator} /> // operator name
<Values dataType={filter.dataType} values={filter.values} /> // conditionally render the value based on its data type.
</div>
);
}
const Field = ({ field }) => {
return (
<div>
<Icon icon={field.dataType} />
<b>{field.label}</b>
</div>
);
}
const Operator = ({ operator }) => <div>{operator.label}</div>;
const Values = ({ dataType, values }) => {
switch (dataType) {
case 'ENUM':
return values.map(value => <Token label={value} />);
case 'NUMBER':
return <div style={{ color: 'blue' }}>{parseInt(values[0], 10)}</div>
case 'STRING':
default:
return <div>{values[0].toString()}</div>
}
}
const NewFilter = ({ schema, onCreate }) => {
// a component to create new filters.
// Uses statically defined fields and operators in `schema` to decide how they can be created.
const [newFilter, setNewFilter] = useState({});
// code to handle selecting fields, operators, and rendering the right input based on the data type of the field
switch (filter.dataType) {
case 'BOOLEAN':
return <Toggle />;
case 'ENUM':
return <Selector />;
default:
case 'STRING':
return <Input />;
}
}
While this document proposes a spec to design search interfaces through strongly typed interfaces, you can make your own decisions and choices that make more sense depending on the complexity of your search APIs.
Here is an example alternative that uses well-formatted query strings (inspired from Github "human-readable" search query) to encode the same information that APIs can eventually parse relevant filter data:
// Filter documents that contain concepts ['cid1', 'cid2'] and matches exactly on text 'drug'
const queryString = 'concept:cid1,cid2+exact_match:drug';
const filters = parseQueryStringToFilters(queryString);
// [
// {field: 'CONCEPT', operator: 'CONTAINS', values: ['cid1', 'cid2']},
// {field: 'TEXT', operator: 'EQUALS', values: ['drug']},
// ];
While the specific implementation and interface choice here is largely different, the general ideas of the design still holds to build search APIs on top of a common query request layer that both API/UI can communicate to send data.
For various alternative designs of the UI component, it is not difficult to see how this spec allows building the Airtable filter view:
The following are highly abstracted pseudocode on how to cast various popular search interfaces into the spec defined in Filters Spec, demonstrating the flexibility of the spec.
"""
original_query_string='GitHub+Octocat+in:readme+user:defunkt'
"""
def github_query_to_search_filters(query):
# code to transform to relevant filter object
return [
{field: 'TEXT', operator: 'CONTAINS', values: ['Github', 'Octocat']},
{field: 'FILE', operator: 'IN', values: ['readme']},
{field: 'USER', operator: 'CONTAINS', values: ['defunkt']},
]
"""
original_query_string='Explorer is awesome in:#team_dora to:@jordan before:February'
"""
def slack_query_to_search_filters(query):
# code to transform to relevant filter object
return [
{field: 'TEXT', operator: 'FUZZY', values: ['Explorer is awesome']},
{field: 'CHANNEL', operator: 'EQUALS', values: ['team_dora']},
{field: 'TO', operator: 'EQUALS', values: ['jordan']},
{field: 'SENT_TIME', operator: 'BEFORE', values: ['February']},
]
def airtable_filters_to_search_filters(filters):
# code to transform to relevant filter object
return [
{field: 'PRIORITY', dataType: 'ENUM_SET', operator: 'EQUALS', values: ['High Priority']},
{field: 'COMPLETED', dataType: 'BOOLEAN', operator: 'EQUAL', values: ['true']},
]