This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| You are an expert-level data analyst and a master specialist in search systems. Your mission is to generate a highly realistic and diverse TSV dataset of approximately 20 natural language queries based on the provided Elasticsearch indices. | |
| This dataset's primary purpose is to benchmark an AI model's ability to precisely identify the correct target index, their respective fields and generate accurate search and filter clauses. Your generated queries must therefore include a range of difficulties designed to test the model's understanding of nuance, context, and a user's true intent. | |
| ## 1. CONTEXT: INPUT ELASTICSEARCH INDICES | |
| You will receive the index schemas as a JSON object. Note that indices may be from the same domain and share similar field names to create challenging evaluation scenarios. | |
| ## 2. PRE-GENERATION REASONING (MANDATORY INTERNAL STEPS) | |
| Before you generate the TSV, you MUST perform the following internal reasoning steps. This analysis is critical for creating relevant and challenging queries |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| You are an expert in Elasticsearch data modeling and a seasoned data architect. Your primary task is to generate two or three new Elasticsearch index mappings that are **similar but not identical** to a provided input index mapping. These newly generated indices are intended to create a more challenging evaluation scenario for a Natural Language to Elasticsearch Query Language (NL2ESQL) agent that needs to identify the correct index and fields from a user's query. Therefore, the similarity should be high, but with clear, plausible distinctions. **Input for this Task:** Here is the existing Elasticsearch index mapping as a JSON object: | |
| ``` | |
| {{input_index_mapping_json}} | |
| ``` | |
| INSTRUCTIONS: Your Goal for Each Generated Index: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| You are an expert in Elasticsearch data modeling and an experienced data architect. Your task is to generate a dataset of 5 diverse and realistic Elasticsearch index mappings. These mappings will be used to develop and evaluate a Natural Language to Elasticsearch Query Language (NL2ESQL) agent. The agent focuses on translating natural language queries (NLQ) with search and exploratory intent into ESQL queries for single Elasticsearch indices. A key follow-up step will be generating ~10 diverse query examples per index. | |
| **Pre-Generation Reasoning Steps (Your Internal Thought Process):** | |
| Before you generate the JSONL output, please follow these internal reasoning steps to ensure the indices are realistic, diverse, and well-considered: | |
| 1. **Domain and Use Case Brainstorming & Selection:** | |
| * First, think broadly about common real-world domains and primary use cases where Elasticsearch is heavily utilized. Examples include, but are not limited to: website/intranet search, e-commerce product catalogs and re |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "retriever": { | |
| "linear": { | |
| "retrievers": [ | |
| { | |
| "retriever": { | |
| "rrf": { | |
| "retrievers": [ | |
| { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import asyncio | |
| from contextlib import asynccontextmanager | |
| from datetime import datetime, timedelta | |
| import aiohttp | |
| from azure.identity.aio import CertificateCredential | |
| tenant_id = "<id here>" | |
| tenant_name = "<name here>" | |
| site_name = "<site name here>" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import asyncio | |
| from contextlib import asynccontextmanager | |
| from datetime import datetime, timedelta | |
| import aiohttp | |
| from azure.identity.aio import CertificateCredential | |
| tenant_id = "<id here>" | |
| tenant_name = "<name here>" | |
| site_name = "<site name here>" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| echo "What is your Access Token? (Example: cfcbcbae003c1ff47cd338edb32da3749b6652f3739ed2c55384542a1dc9cccd)" | |
| read ACCESS_TOKEN | |
| echo "What is your Key? (Example: 5e7bce38f74c32c93d685d4f)" | |
| read KEY | |
| echo "What is your Host? (Example: workplace-search-1.ea-eden-3-staging.elastic.dev)" | |
| read HOST | |
| ACCESS_TOKEN=`echo $ACCESS_TOKEN | tr -d '[:blank:]'` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [ | |
| { | |
| "business_id": "b72ed9bb-d74d-44f6-bb23-83ed7e51121d", | |
| "business_name": "Jane", | |
| "city": "New York", | |
| "street_address": "100 W Houston St", | |
| "phone_number": "(212) 254-7000", | |
| "tags": [ | |
| "Breakfast & Brunch", | |
| "American (New)", |