Skip to content

Instantly share code, notes, and snippets.

View seanstory's full-sized avatar

Sean Story seanstory

  • Elastic
  • Nashville, TN USA
View GitHub Profile
@seanstory
seanstory / query-generation-prompt.jinja2
Created September 22, 2025 13:57
Building off of the https://gist.github.com/seanstory/a08db2e149897da656db3a1ca72e17ac and https://gist.github.com/seanstory/a280a85d067e61bfeb5911bf2654e6e2, this prompt considers the generated indices, then generates a set of test questions and expected results. The result can then be used to evaluate a tool that tries to pick the best index t…
You are an expert-level data analyst and a master specialist in search systems. Your mission is to generate a highly realistic and diverse TSV dataset of approximately 20 natural language queries based on the provided Elasticsearch indices.
This dataset's primary purpose is to benchmark an AI model's ability to precisely identify the correct target index, their respective fields and generate accurate search and filter clauses. Your generated queries must therefore include a range of difficulties designed to test the model's understanding of nuance, context, and a user's true intent.
## 1. CONTEXT: INPUT ELASTICSEARCH INDICES
You will receive the index schemas as a JSON object. Note that indices may be from the same domain and share similar field names to create challenging evaluation scenarios.
## 2. PRE-GENERATION REASONING (MANDATORY INTERNAL STEPS)
Before you generate the TSV, you MUST perform the following internal reasoning steps. This analysis is critical for creating relevant and challenging queries
@seanstory
seanstory / similar-index-mapping-generation-prompt.jinja2
Created September 22, 2025 13:55
Building off of https://gist.github.com/seanstory/a08db2e149897da656db3a1ca72e17ac, this prompt introduces “hard negatives” into the sample dataset.
You are an expert in Elasticsearch data modeling and a seasoned data architect. Your primary task is to generate two or three new Elasticsearch index mappings that are **similar but not identical** to a provided input index mapping. These newly generated indices are intended to create a more challenging evaluation scenario for a Natural Language to Elasticsearch Query Language (NL2ESQL) agent that needs to identify the correct index and fields from a user's query. Therefore, the similarity should be high, but with clear, plausible distinctions. **Input for this Task:** Here is the existing Elasticsearch index mapping as a JSON object:
```
{{input_index_mapping_json}}
```
INSTRUCTIONS: Your Goal for Each Generated Index:
@seanstory
seanstory / index-generation-prompt.jinja2
Created September 22, 2025 13:52
Initial prompt to generate an evaluation dataset. This prompt attempts to cover a broad set of realistic domains and data modeling to realistically mimic enterprise data sprawl in Elasticsearch.
You are an expert in Elasticsearch data modeling and an experienced data architect. Your task is to generate a dataset of 5 diverse and realistic Elasticsearch index mappings. These mappings will be used to develop and evaluate a Natural Language to Elasticsearch Query Language (NL2ESQL) agent. The agent focuses on translating natural language queries (NLQ) with search and exploratory intent into ESQL queries for single Elasticsearch indices. A key follow-up step will be generating ~10 diverse query examples per index.
**Pre-Generation Reasoning Steps (Your Internal Thought Process):**
Before you generate the JSONL output, please follow these internal reasoning steps to ensure the indices are realistic, diverse, and well-considered:
1. **Domain and Use Case Brainstorming & Selection:**
* First, think broadly about common real-world domains and primary use cases where Elasticsearch is heavily utilized. Examples include, but are not limited to: website/intranet search, e-commerce product catalogs and re
@seanstory
seanstory / gist:d704443120e20f6c844db10e30066860
Created September 12, 2025 20:41
Linear and RRF retrievers for hybrid search across several semantic fields
{
"retriever": {
"linear": {
"retrievers": [
{
"retriever": {
"rrf": {
"retrievers": [
{
@seanstory
seanstory / ms_auth_comparison.py
Created March 18, 2025 20:19
Python script that compares Client Secret vs Certificate credentials for the Sharepoint roleassignments API
import asyncio
from contextlib import asynccontextmanager
from datetime import datetime, timedelta
import aiohttp
from azure.identity.aio import CertificateCredential
tenant_id = "<id here>"
tenant_name = "<name here>"
site_name = "<site name here>"
import asyncio
from contextlib import asynccontextmanager
from datetime import datetime, timedelta
import aiohttp
from azure.identity.aio import CertificateCredential
tenant_id = "<id here>"
tenant_name = "<name here>"
site_name = "<site name here>"
#!/bin/bash
echo "What is your Access Token? (Example: cfcbcbae003c1ff47cd338edb32da3749b6652f3739ed2c55384542a1dc9cccd)"
read ACCESS_TOKEN
echo "What is your Key? (Example: 5e7bce38f74c32c93d685d4f)"
read KEY
echo "What is your Host? (Example: workplace-search-1.ea-eden-3-staging.elastic.dev)"
read HOST
ACCESS_TOKEN=`echo $ACCESS_TOKEN | tr -d '[:blank:]'`
@seanstory
seanstory / demo_business_records.json
Last active June 9, 2022 21:58
A json listing of fake businesses
[
{
"business_id": "b72ed9bb-d74d-44f6-bb23-83ed7e51121d",
"business_name": "Jane",
"city": "New York",
"street_address": "100 W Houston St",
"phone_number": "(212) 254-7000",
"tags": [
"Breakfast & Brunch",
"American (New)",