Semantic Search with OpenSearch and Cohere

Cluster Settings:

PUT /_cluster/settings
{
    "persistent": {
        "plugins.ml_commons.allow_registering_model_via_url": true,
        "plugins.ml_commons.only_run_on_ml_node": false,
        "plugins.ml_commons.connector_access_control_enabled": true,
        "plugins.ml_commons.model_access_control_enabled": true,
        "plugins.ml_commons.trusted_connector_endpoints_regex": [
          "^https://runtime\\.sagemaker\\..*[a-z0-9-]\\.amazonaws\\.com/.*$",
          "^https://api\\.openai\\.com/.*$",
          "^https://api\\.cohere\\.ai/.*$"
        ]
    }
}

Create Model Group:

POST /_plugins/_ml/model_groups/_register
{
    "name": "Cohere_Group",
    "description": "Public Cohere Model Group",
    "access_mode": "public"
}
# MODEL_GROUP_ID:

Create Connector:

POST /_plugins/_ml/connectors/_create
{
   "name": "Cohere Connector",
   "description": "External connector for connections into Cohere",
   "version": "1.0",
   "protocol": "http",
   "credential": {
           "cohere_key": "<COHERE KEY HERE>"
       },
    "parameters": {
      "model": "embed-english-v2.0",
      "truncate": "END"
    },
   "actions": [{
       "action_type": "predict",
       "method": "POST",
       "url": "https://api.cohere.ai/v1/embed",
       "headers": {
               "Authorization": "Bearer ${credential.cohere_key}"
           },
			"request_body": "{ \"texts\": ${parameters.prompt}, \"truncate\": \"${parameters.truncate}\", \"model\": \"${parameters.model}\" }",
			"pre_process_function": "connector.pre_process.cohere.embedding",
			 "post_process_function": "connector.post_process.cohere.embedding"
       }]
}
# CONNECTOR_ID:

Register and deploy a model to the cluster:

POST /_plugins/_ml/models/_register?deploy=true
{
    "name": "embed-english-v2.0",
    "function_name": "remote",
    "description": "test model",
    "model_group_id": "<MODEL_GROUP_ID>",
    "connector_id": "<CONNECTOR_ID>"
}
# TASK_ID:

Should see the model created and get Model ID:

GET /_plugins/_ml/tasks/<TASK_ID>
# MODEL_ID:

Test the model embedding

POST /_plugins/_ml/_predict/text_embedding/<MODEL_ID>
{
  "text_docs": ["This should get embedded."],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}

Create Ingestion Pipeline

PUT _ingest/pipeline/cohere-ingest-pipeline
{
  "description": "Cohere Neural Search Pipeline",
  "processors" : [
    {
      "text_embedding": {
        "model_id": "<MODEL_ID>",
        "field_map": {
          "content": "content_embedding"
        }
      }
    }
  ]
}

Create KNN index. Note* need to match space to model space. eg embed-english-v2.0 recommends cosine similarity:

PUT /cohere-index
{
	"settings": {
		"index.knn": true,
		"default_pipeline": "cohere-ingest-pipeline"
	},
	"mappings": {
		"properties": {
			"content_embedding": {
				"type": "knn_vector",
				"dimension": 4096,
				"method": {
					"name": "hnsw",
					"space_type": "cosinesimil",
					"engine": "nmslib"
				}
			},
			"content": {
				"type": "text"
			}
		}
	}
}

Hydrate index with `_bulk`

POST _bulk
{ "create" : { "_index" : "cohere-index", "_id" : "1" }}
{ "content":"Testing neural search"}
{ "create" : { "_index" : "cohere-index", "_id" : "2" }}
{ "content": "What are we doing"}
{ "create" : { "_index" : "cohere-index", "_id" : "3" } }
{ "content": "This should exist"}

Search

GET /cohere-index/_search
{
  "query": {
    "bool" : {
      "should" : [
        {
          "script_score": {
              "neural": {
                "content_embedding": {
                  "query_text": "How do I ingest to opensearch",
                  "k": 10
              }
            },
            "script": {
              "source": "_score * 1.5"
            }
          }
        }
        ,
        {
          "script_score": {
            "query": {
              "match": { "content": "I want information about the new compression algorithems in OpenSearch" }
            },
            "script": {
              "source": "_score * 1.7"
            }
          }
        }
      ]
    }
  }
}

Cleanup

POST /_plugins/_ml/models/<MODEL_ID>/_undeploy
DELETE /_plugins/_ml/models/<MODEL_ID>
DELETE /_plugins/_ml/connectors/<CONNECTOR_ID>
DELETE _ingest/pipeline/cohere-ingest-pipeline
DELETE cohere-index

Troubleshoot:

POST /_plugins/_ml/models/<MODEL_ID>/_predict
{
  "parameters": {
    "texts": ["This should exist"]
  }
}

GET /cohere-index/_search
{
  "query": {
    "match_all": {}
  }
}

dtaivpp/cohere_semantic_search.md

Semantic Search with OpenSearch and Cohere

Cluster Settings:

Create Model Group:

Create Connector:

Register and deploy a model to the cluster:

Should see the model created and get Model ID:

Test the model embedding

Create Ingestion Pipeline

Create KNN index. Note* need to match space to model space. eg embed-english-v2.0 recommends cosine similarity:

Hydrate index with `_bulk`

Search

Cleanup

Troubleshoot:

cloudsmithy commented Dec 8, 2023

Uh oh!

dtaivpp commented Dec 8, 2023

Uh oh!

jonwiggins commented Dec 12, 2023 •

edited

Loading

Uh oh!

dtaivpp commented Dec 13, 2023

Uh oh!

jonwiggins commented Dec 13, 2023

Uh oh!

dtaivpp commented Dec 14, 2023

Uh oh!

jonwiggins commented Dec 14, 2023

Uh oh!

dtaivpp commented Dec 22, 2023

Uh oh!

jonwiggins commented Dec 22, 2023

Uh oh!

dtaivpp commented Dec 22, 2023

Uh oh!

dtaivpp/cohere_semantic_search.md

Semantic Search with OpenSearch and Cohere

Cluster Settings:

Create Model Group:

Create Connector:

Register and deploy a model to the cluster:

Should see the model created and get Model ID:

Test the model embedding

Create Ingestion Pipeline

Create KNN index. Note* need to match space to model space. eg embed-english-v2.0 recommends cosine similarity:

Hydrate index with _bulk

Search

Cleanup

Troubleshoot:

cloudsmithy commented Dec 8, 2023

Uh oh!

dtaivpp commented Dec 8, 2023

Uh oh!

jonwiggins commented Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtaivpp commented Dec 13, 2023

Uh oh!

jonwiggins commented Dec 13, 2023

Uh oh!

dtaivpp commented Dec 14, 2023

Uh oh!

jonwiggins commented Dec 14, 2023

Uh oh!

dtaivpp commented Dec 22, 2023

Uh oh!

jonwiggins commented Dec 22, 2023

Uh oh!

dtaivpp commented Dec 22, 2023

Uh oh!

Hydrate index with `_bulk`

jonwiggins commented Dec 12, 2023 •

edited

Loading