Azure Cognitive Search Index Management

Azure Cognitive Search is a fully managed cloud search service that provides a rich search experience to custom applications. The service provides a REST API with operations that create and manage indexes, load data, implement search features, execute queries, and handle results.

This document describes the start of the JS SDK for Search service operations, which include all operations that are not directly related to document operations.

Design

`SearchServiceClient` API

The search service client is the interface for developers to interact with configuring and customizing an Azure Cognitive Search instance. SearchServiceClient initially will have support for creating and managing search indexes and will later expand to support creating and managing other service entities such as indexers, synonym maps, cognitive skillsets, and data sources.

export class SearchServiceClient {
    constructor(endpoint: string, credential: KeyCredential, options?: SearchServiceClientOptions);
    analyzeText(indexName: string, options: AnalyzeTextOptions): Promise<AnalyzeResult>;
    createIndex(index: Index, options?: CreateIndexOptions): Promise<Index>;
    createOrUpdateIndex(index: Index, options?: CreateOrUpdateIndexOptions): Promise<Index>;
    deleteIndex(indexName: string, options?: DeleteIndexOptions): Promise<void>;
    getIndex(indexName: string, options?: GetIndexOptions): Promise<Index>;
    getIndexStatistics(indexName: string, options?: GetIndexStatisticsOptions): Promise<GetIndexStatisticsResult>;
    listIndexes(options?: ListIndexesOptions): Promise<Index[]>;
}

The primary model used by these APIs is Index, which describes the shape of an index in the search service. This is a analogous to a user-defined table in a database. It will hold all metadata about documents that are uploaded for search.

export interface Index {
    analyzers?: Analyzer[];
    charFilters?: CharFilter[];
    corsOptions?: CorsOptions;
    defaultScoringProfile?: string;
    encryptionKey?: EncryptionKey;
    etag?: string;
    fields: Field[];
    name: string;
    scoringProfiles?: ScoringProfile[];
    suggesters?: Suggester[];
    tokenFilters?: TokenFilter[];
    tokenizers?: Tokenizer[];
}

There are many configuration options on Index, mostly related to ways to tweak how content is indexed and scored. The most important configuration is related to how fields are defined.

Each Field is either "simple" or "complex". Simple fields can hold either a single primitive value or a collection (array) of simple values. Complex fields allow for nested subdocuments and have their own child Field definitions.

export type Field = SimpleField | ComplexField;

export type SimpleDataType = "Edm.String" | "Edm.Int32" | "Edm.Int64" | "Edm.Double" | "Edm.Boolean" | "Edm.DateTimeOffset" | "Edm.GeographyPoint" | "Collection(Edm.String)" | "Collection(Edm.Int32)" | "Collection(Edm.Int64)" | "Collection(Edm.Double)" | "Collection(Edm.Boolean)" | "Collection(Edm.DateTimeOffset)" | "Collection(Edm.GeographyPoint)";

export interface SimpleField {
    analyzer?: string;
    facetable?: boolean;
    filterable?: boolean;
    hidden?: boolean;
    indexAnalyzer?: string;
    key?: boolean;
    name: string;
    searchable?: boolean;
    searchAnalyzer?: string;
    sortable?: boolean;
    synonymMaps?: string[];
    type: SimpleDataType;
}

export type ComplexDataType = "Edm.ComplexType" | "Collection(Edm.ComplexType)";

export interface ComplexField {
    fields: Field[];
    name: string;
    type: ComplexDataType;
}

Scenarios

1. Define a search index to query against

Developers must define an index in order to upload documents and perform queries. They may do so manually in the portal, but will often wish to achieve this with code.

const { SearchServiceClient, AzureKeyCredential } = require("@azure/search");

const client = new SearchServiceClient("<endpoint>", new AzureKeyCredential("<Admin Key>"));

const result = await client.createIndex({
    name: "example-index",
    fields: [
      {
        type: "Edm.String",
        name: "id",
        key: true
      },
      {
        type: "Edm.Double",
        name: "awesomenessLevel",
        sortable: true,
        filterable: true,
        facetable: true
      },
      {
        type: "Edm.String",
        name: "description",
        searchable: true
      },
      {
        type: "Edm.ComplexType",
        name: "details",
        fields: [
          {
            type: "Collection(Edm.String)",
            name: "tags",
            searchable: true
          }
        ]
      },
      {
        type: "Edm.Int32",
        name: "hiddenWeight",
        hidden: true
      }
    ]
  });

  console.log(result);

2. Retrieve an existing index and add a new field to it

A common scenario is extending an existing index definition with an additional field. This can be done without repopulating the index, as all fields that are not key fields are nullable.

const { SearchServiceClient, AzureKeyCredential } = require("@azure/search");

const client = new SearchServiceClient("<endpoint>", new AzureKeyCredential("<Admin Key>"));

const index = await client.getIndex("example-index");

index.fields.push({
  type: "Edm.DateTimeOffset",
  name: "lastUpdatedOn",
  filterable: true
});

const updatedIndex = await client.createOrUpdateIndex(index);

console.log("Fields after updating:");

for (const field of updatedIndex.fields) {
  console.log(`\t ${field.name}`);
}

3. Define a custom analyzer and test its output

Custom analyzers can be defined per-index and then referenced by name when defining a field in order to influence how searching is performed against that field.

In order to ensure that analysis is configured correctly, developers can directly ask the service to analyze a given input string to check the result.

const { SearchServiceClient, AzureKeyCredential, KnownTokenFilterNames } = require("@azure/search");

const client = new SearchServiceClient("<endpoint>", new AzureKeyCredential("<Admin Key>"));

const index = await client.getIndex("example-index");
index.tokenizers.push({
  name: "example-tokenizer",
  odatatype: "#Microsoft.Azure.Search.StandardTokenizerV2",
  maxTokenLength: 125
});
index.charFilters.push({
  name: "example-char-filter",
  odatatype: "#Microsoft.Azure.Search.MappingCharFilter",
  mappings: ["MSFT=>Microsoft"]
});
index.tokenFilters.push({
  name: "example-token-filter",
  odatatype: "#Microsoft.Azure.Search.StopwordsTokenFilter",
  stopwords: ["xyzzy"]
});
index.analyzers.push({
  name: "example-analyzer",
  odatatype: "#Microsoft.Azure.Search.CustomAnalyzer",
  tokenizer: "example-tokenizer",
  charFilters: ["example-char-filter"],
  tokenFilters: [KnownTokenFilterNames.Lowercase, "example-token-filter"]
});

await client.createOrUpdateIndex(index, { allowIndexDowntime: true });

const result = await client.analyzeText("example-index", {
  text: "MSFT is xyzzy Great!",
  analyzer: "example-analyzer"
});

console.log(result.tokens);
// Output looks like
// [
//   { token: 'microsoft', startOffset: 0, endOffset: 4, position: 0 },
//   { token: 'is', startOffset: 5, endOffset: 7, position: 1 },
//   { token: 'great', startOffset: 14, endOffset: 19, position: 3 }
// ]

xirzec/search-index-js.md