Unified Search Spec

A specification for designing unified search APIs and UIs.

Context
Goals
Filters Overview
Filters Spec
API
UI
Alternative Designs
Implementations

Context

For data-intensive applications with heavy searching/filtering workflows, devs will increasingly find a need to unify both the search API and UI. This document outlines a recommended solution that scales with future requirements, and preserving flexibility of implementation in both API and UI.

Goals

Support the following requirements:

Provide a specification (not an implementation) on how this would work.
Define the spec of search filters and how the system would use these objects.
Design a unified API interface around filters.
Design a UI component responsible for rendering and setting filters.

Filters Overview

A search query/request can be expressed as an array of search filters. These filters are simple serializable objects and are human-readable. E.g.

// Filter documents that contain concepts ['cid1', 'cid2'] and matches exactly on text 'drug'
const filters = [
  {field: 'CONCEPT', operator: 'CONTAINS', values: ['cid1', 'cid2']},
  {field: 'TEXT', operator: 'EQUALS', values: ['drug']},
];

// Filter documents that does NOT contain concepts ['cid1', 'cid2'] and matches inexactly on text 'drug'
const filters = [
  {field: 'CONCEPT', operator: 'NOT_CONTAINS', values: ['cid1', 'cid2']},
  {field: 'TEXT', operator: 'LIKE', values: ['drug']},
];

// Filter cats that are female and special (without using OPERATORS)
const filters = [
  {field: 'IS_SPECIAL', values: ['true']},
  {field: 'GENDER', values: ['female']},
];

Filters Spec

The examples above use the following spec:

enum Field {
  CONCEPT = 'concept',
  TEXT = 'text',
  METADATA = 'metadata',
  ANYTHING_CAN_BE_USED_HERE = 'anything',
}

enum Operator {
  EQUALS = 'equals',
  NOT_EQUALS = 'not_equals',
  ANYTHING_CAN_BE_USED_HERE = 'anything',
}

enum DataType {
  STRING = 'string',
  NUMBER = 'number',
  DATE = 'datetime',
  ANYTHING_CAN_BE_USED_HERE = 'anything',
}

interface Filter {
  field: Field;
  values: string[];
  operator?: Operator;  // optional
  dataType?: DataType;  // optional
}

interface SearchRequest {
  filters: Filter[];
}

`Filter.values` [required]

This is a required field typed as an array of strings to capture the filter values. The choice for an array of strings will support the use cases of multiple values to be applied on a given field (e.g. checkbox selections in the UI). You can choose an alternative design for typing this field, or work with nullable values.

`Filter.field` [required]

This is a required field informing the API how to apply filter logic in its implementation. It can take on any user-defined values and the spec is unopinionated about mapping this to the structure of the data model.

`Filter.operator` [optional]

This is an optional enum that provides additional context on how to apply the filter values of a provided filter. It is optional because you can choose to model your search APIs with just fields (i.e. fields: ['LIKE_TEXT', 'EQUALS_TEXT'] vs {field: 'TEXT', operators: ['LIKE', 'EQUALS']), but it is generally a good idea to introduce this if your search complexity grows and you need to better separate the notions of fields and operators.

`Filter.dataType` [optional]

This is an optional enum that describes the data type of the filter field. The filter value parser can depend on this field for parsing its value, but it is not required. UIs can leverage this field to decide what kind of filter input to render.

NOTE: The spec is unopinionated about how fields, operators, dataTypes are defined and used. These are defined in the spec to propose a way to scale search API interfaces based on common search requirements and features. At the end of the day, the goal is to pass enough structured data down to the API layer to perform various filter operations.

API

With a defined search spec and types, we can reason about search requests in simple ways in the API. Here is an example of implementing a search endpoint in Python using the example search request in the earlier section (forgive the terse pseudocode since I am not actually familiar with Python libraries):

"""
example_request = {
  filters: [
    {field: 'CONCEPT', operator: 'CONTAINS', values: ['cid1', 'cid2']},
    {field: 'TEXT', operator: 'EQUALS', values: ['drug']},
  ]
}
"""

def document_search():
  result = []
  dataset_id = request.body.pop('id')
  filters = request.body.pop('filters')
  validate(filters)  # custom filter validation logic you can implement

  # build ORM query.  This can be abstracted into a helper function to better organize code if you prefer
  for filter in request.body.filters:
    document_query = Document.query(dataset_id)
    if filter.field == 'CONCEPT':
      concept_ids = parse_concept_ids_filter(filter)  # custom parser for concept_ids
      document_query.where(concept_ids)
    elif filter.field == 'TEXT':
      is_exact = filter.operator == 'EXACT'
      text = parse_text_filter(filter)
      document_query.where(text)
  
  # run query
  try:
    results = document_query.run()
  catch:
    raise Exception('Query failed')
  
  return results

The above code implements custom filter value parsers (i.e. parse_concept_ids_filter, parse_text_filter), and makes use of the data in the Filter object (i.e. field and operator). The spec remains unopinionated about how these should be organized and written, and it is not a bad practice to model value parsers based on Filter.dataTypes instead of Filter.fields as the above example has done.

The following is an example of refactoring an old cats_search API endpoint using a shared query_builder method if DB models and search filters have shareable functionalities:

"""
old_request = {
  is_special: true,
  gender: 'female',
}

new_request = {
  filters: [
    {field: 'IS_SPECIAL', values: ['true']},
    {field: 'GENDER', values: ['female']},
  ]
}
"""
def old_cats_search():
  is_special = request.body.pop('is_special')
  gender = request.body.pop('gender')
  cats_query = Cats.query() if not is_special else SpecialCats.query()
  return cats_query.filter(gender).run()
  
def new_cats_search():
  filters = request.body.pop('filters')
  return query_builder(Cats, filters)

def dogs_search():
  filters = request.body.pop('filters')
  return query_builder(Dogs, filters)  # reusable

def monkeys_search():
  filters = request.body.pop('filters')
  return query_builder(Monkeys, filters)  # reusable

def not_db_monkeys_search():
  filters = request.body.pop('filters')
  es_query = es.query('monkeys')
  for filter in filters:  # cannot reuse query_builder, so write it explicitly.
    es_query.filter(get_es_filter(filter))  # define this accordingly
  return es_query.run()

def query_builder(Model, filters, value_parser = default_value_parser):
  """
  A generalized query_builder if DB/Models share similar filtering logic. 
  """
  query = Model.query()
  for filter in filters:
    values = value_parser(filter.dataType)
    field = filter.field
    if values:  # if valid parsed value exists
      query.where(field, values)
  return query.run()

This example shows that although the old endpoint was simple and readable, it does not provide a way for reuse in other common search endpoints. Since the cost of creating search API/UI is expensive, having a spec that guides best practices allows the ability to define areas to reuse this piece of logic. As search requirements become more complex (e.g. more fields and operators), this spec allows various code to be abstracted and shared across endpoints.

NOTE: The general rules of abstracting code applies here so strike a balance between implementing shared methods and writing explicit code.

UI

A unified search UI based on this search spec can be built with the design choices in the following section. Note that this search component was heavily utilized at Facebook across multiple data/filter-intensive applications.

Design Mocks

NOTE: You can implement the UnifiedSearch component in various ways. In the end, the component simply needs a way to render and set the filters data. Airtable filters or Slack search are all flavors of rendering a "unified" search component.

Viewing `filters`

Just as the backend has a simple interface to understand serializable and human-readable filters, the UI component can easily render filters in the following proposed way:

const filters = [
  {field: 'CONCEPT', dataType: 'ENUM', operator: 'CONTAINS', values: ['cid1', 'cid2']},
  {field: 'TEXT', dataType: 'STRING', operator: 'EQUALS', values: ['drug']},
];

Note that this UI:

Summarizes a complete description of what filters has been applied in a concise and human-readable UI.
It allows users to visually know what dataType, operators are applied to specific fields.
All enums for fields, operators, dataTypes are statically defined on the server, so these can be provided to the UI to decide how to render the data types specifically, as shown above.

Setting `filters`

The UI component supports the following features to update filters:

A Clear All button to clear all filters.
Each filter "token" can be removed/cleared.
Typing in the component should suggest possible values that can be applied based on the schema of the filters specified by the server.
Clicking on the Add Filter button allows creation of a new filter. Note that clients can choose to implement the Add Filter button as a focus action on the component input. When adding a filter, the user is prompted to:
- Select a field
- Select operators if they exist
- Provide the input values. Inputs are rendered based on the dataType of the filter object.

Pseudo React Component

This is pseudocode for the React component that highlights important implementation details and prop API:

enum DataType = {  // match with server's definition
  BOOLEAN: 'BOOLEAN';
  STRING: 'STRING';
}

interface Field = {
  id: string;
  dataType: DataType;
  label: string;
};

interface Operator = {
  id: string;
  label: string;
};

// Schema defines the static fields and operators that the component can render
// for setting fields/operators when creating filters.  It is stored in normalized
// form for easy retrieval
interface Schema = {
  fields: {
    [fieldId: string]: Field;
  }
  operators: {
    [operatorId: string]: Operator;  
  }
}

interface Filter = {
  id: string;
  field: Field;
  operator: Operator;
}

const UnifiedSearch = ({
  schema: Schema,
  filters: Filter[],
  onUpdate: ChangeHandler,
}): JSX.Element => {
  <div>
    <SearchInput />
    {filters.map(filter => <Filter filter={filter} schema={schema} />)}
    <NewFilter schema={schema} />
  </div>
}

The component above uses the associated child components:

const SearchInput = () => {
  // renders the basic search input and search suggestions.
  // Search suggestions can be statically computed and determined based on `schema` and currently applied `filters`
  // Contains a `Clear All` functionality to remove all applied `filters`.
}

const Filter = ({ filter, schema }) => {
  const field = schema[filter.field.id];
  const operator = schema[filter.operator.id];
  return (
    <div>
      <Field field={field} /> // field name and dataType icon
      <Operator operator={operator} /> // operator name
      <Values dataType={filter.dataType} values={filter.values} /> // conditionally render the value based on its data type.
    </div>
  );
}

const Field = ({ field }) => {
  return (
    <div>
      <Icon icon={field.dataType} />
      <b>{field.label}</b>
    </div>
  );
}

const Operator = ({ operator }) => <div>{operator.label}</div>;

const Values = ({ dataType, values }) => {
  switch (dataType) {
    case 'ENUM':
      return values.map(value => <Token label={value} />);
    case 'NUMBER':
      return <div style={{ color: 'blue' }}>{parseInt(values[0], 10)}</div>
    case 'STRING':
    default:
      return <div>{values[0].toString()}</div>
  }
}

const NewFilter = ({ schema, onCreate }) => {
  // a component to create new filters.
  // Uses statically defined fields and operators in `schema` to decide how they can be created.
  const [newFilter, setNewFilter] = useState({});

  // code to handle selecting fields, operators, and rendering the right input based on the data type of the field
  switch (filter.dataType) {
    case 'BOOLEAN':
      return <Toggle />;
    case 'ENUM':
      return <Selector />;
    default:
    case 'STRING':
      return <Input />;
  }
}

Alternative Designs

While this document proposes a spec to design search interfaces through strongly typed interfaces, you can make your own decisions and choices that make more sense depending on the complexity of your search APIs.

Here is an example alternative that uses well-formatted query strings (inspired from Github "human-readable" search query) to encode the same information that APIs can eventually parse relevant filter data:

// Filter documents that contain concepts ['cid1', 'cid2'] and matches exactly on text 'drug'
const queryString = 'concept:cid1,cid2+exact_match:drug';

const filters = parseQueryStringToFilters(queryString);
// [
//   {field: 'CONCEPT', operator: 'CONTAINS', values: ['cid1', 'cid2']},
//   {field: 'TEXT', operator: 'EQUALS', values: ['drug']},
// ];

While the specific implementation and interface choice here is largely different, the general ideas of the design still holds to build search APIs on top of a common query request layer that both API/UI can communicate to send data.

For various alternative designs of the UI component, it is not difficult to see how this spec allows building the Airtable filter view:

Implementations

The following are highly abstracted pseudocode on how to cast various popular search interfaces into the spec defined in Filters Spec, demonstrating the flexibility of the spec.

Github "human-readable" search query

"""
original_query_string='GitHub+Octocat+in:readme+user:defunkt'
"""
def github_query_to_search_filters(query):
  # code to transform to relevant filter object
  return [
    {field: 'TEXT', operator: 'CONTAINS', values: ['Github', 'Octocat']},
    {field: 'FILE', operator: 'IN', values: ['readme']},
    {field: 'USER', operator: 'CONTAINS', values: ['defunkt']},
  ]

Slack search

"""
original_query_string='Explorer is awesome in:#team_dora to:@jordan before:February'
"""
def slack_query_to_search_filters(query):
  # code to transform to relevant filter object
  return [
    {field: 'TEXT', operator: 'FUZZY', values: ['Explorer is awesome']},
    {field: 'CHANNEL', operator: 'EQUALS', values: ['team_dora']},
    {field: 'TO', operator: 'EQUALS', values: ['jordan']},
    {field: 'SENT_TIME', operator: 'BEFORE', values: ['February']},
  ]

Airtable filters

def airtable_filters_to_search_filters(filters):
  # code to transform to relevant filter object
  return [
    {field: 'PRIORITY', dataType: 'ENUM_SET', operator: 'EQUALS', values: ['High Priority']},
    {field: 'COMPLETED', dataType: 'BOOLEAN', operator: 'EQUAL', values: ['true']},
  ]

chrisrzhou/unified-search-spec.md

Table of Contents

Context

Goals

Filters Overview

Filters Spec

`Filter.values` [required]

`Filter.field` [required]

`Filter.operator` [optional]

`Filter.dataType` [optional]

API

UI

Viewing `filters`

Setting `filters`

Pseudo React Component

Alternative Designs

Implementations

Github "human-readable" search query

Slack search

Airtable filters

chrisrzhou/unified-search-spec.md

Table of Contents

Context

Goals

Filters Overview

Filters Spec

Filter.values [required]

Filter.field [required]

Filter.operator [optional]

Filter.dataType [optional]

API

UI

Viewing filters

Setting filters

Pseudo React Component

Alternative Designs

Implementations

Github "human-readable" search query

Slack search

Airtable filters

`Filter.values` [required]

`Filter.field` [required]

`Filter.operator` [optional]

`Filter.dataType` [optional]

Viewing `filters`

Setting `filters`