seanstory’s gists

seanstory / query-generation-prompt.jinja2

Created September 22, 2025 13:57

Building off of the https://gist.github.com/seanstory/a08db2e149897da656db3a1ca72e17ac and https://gist.github.com/seanstory/a280a85d067e61bfeb5911bf2654e6e2, this prompt considers the generated indices, then generates a set of test questions and expected results. The result can then be used to evaluate a tool that tries to pick the best index t…

	You are an expert-level data analyst and a master specialist in search systems. Your mission is to generate a highly realistic and diverse TSV dataset of approximately 20 natural language queries based on the provided Elasticsearch indices.

	This dataset's primary purpose is to benchmark an AI model's ability to precisely identify the correct target index, their respective fields and generate accurate search and filter clauses. Your generated queries must therefore include a range of difficulties designed to test the model's understanding of nuance, context, and a user's true intent.

	## 1. CONTEXT: INPUT ELASTICSEARCH INDICES
	You will receive the index schemas as a JSON object. Note that indices may be from the same domain and share similar field names to create challenging evaluation scenarios.

	## 2. PRE-GENERATION REASONING (MANDATORY INTERNAL STEPS)
	Before you generate the TSV, you MUST perform the following internal reasoning steps. This analysis is critical for creating relevant and challenging queries

seanstory / similar-index-mapping-generation-prompt.jinja2

Created September 22, 2025 13:55

Building off of https://gist.github.com/seanstory/a08db2e149897da656db3a1ca72e17ac, this prompt introduces “hard negatives” into the sample dataset.

	You are an expert in Elasticsearch data modeling and a seasoned data architect. Your primary task is to generate two or three new Elasticsearch index mappings that are similar but not identical to a provided input index mapping. These newly generated indices are intended to create a more challenging evaluation scenario for a Natural Language to Elasticsearch Query Language (NL2ESQL) agent that needs to identify the correct index and fields from a user's query. Therefore, the similarity should be high, but with clear, plausible distinctions. Input for this Task: Here is the existing Elasticsearch index mapping as a JSON object:

	```

	{{input_index_mapping_json}}

	```

	INSTRUCTIONS: Your Goal for Each Generated Index:

seanstory / index-generation-prompt.jinja2

Created September 22, 2025 13:52

Initial prompt to generate an evaluation dataset. This prompt attempts to cover a broad set of realistic domains and data modeling to realistically mimic enterprise data sprawl in Elasticsearch.

	You are an expert in Elasticsearch data modeling and an experienced data architect. Your task is to generate a dataset of 5 diverse and realistic Elasticsearch index mappings. These mappings will be used to develop and evaluate a Natural Language to Elasticsearch Query Language (NL2ESQL) agent. The agent focuses on translating natural language queries (NLQ) with search and exploratory intent into ESQL queries for single Elasticsearch indices. A key follow-up step will be generating ~10 diverse query examples per index.

	Pre-Generation Reasoning Steps (Your Internal Thought Process):

	Before you generate the JSONL output, please follow these internal reasoning steps to ensure the indices are realistic, diverse, and well-considered:

	1. Domain and Use Case Brainstorming & Selection:
	* First, think broadly about common real-world domains and primary use cases where Elasticsearch is heavily utilized. Examples include, but are not limited to: website/intranet search, e-commerce product catalogs and re

seanstory / gist:d704443120e20f6c844db10e30066860

Created September 12, 2025 20:41

Linear and RRF retrievers for hybrid search across several semantic fields


	{
	"retriever": {
	"linear": {
	"retrievers": [
	{
	"retriever": {
	"rrf": {
	"retrievers": [
	{

seanstory / ms_auth_comparison.py

Created March 18, 2025 20:19

Python script that compares Client Secret vs Certificate credentials for the Sharepoint roleassignments API

	import asyncio
	from contextlib import asynccontextmanager
	from datetime import datetime, timedelta

	import aiohttp
	from azure.identity.aio import CertificateCredential

	tenant_id = "<id here>"
	tenant_name = "<name here>"
	site_name = "<site name here>"

seanstory / gist:44f6b482c9c6bf21e82792adcb865918

Created March 18, 2025 20:18

	import asyncio
	from contextlib import asynccontextmanager
	from datetime import datetime, timedelta

	import aiohttp
	from azure.identity.aio import CertificateCredential

	tenant_id = "<id here>"
	tenant_name = "<name here>"
	site_name = "<site name here>"

seanstory / populate_business_source.sh

Last active May 14, 2020 16:24

	#!/bin/bash


	echo "What is your Access Token? (Example: cfcbcbae003c1ff47cd338edb32da3749b6652f3739ed2c55384542a1dc9cccd)"
	read ACCESS_TOKEN
	echo "What is your Key? (Example: 5e7bce38f74c32c93d685d4f)"
	read KEY
	echo "What is your Host? (Example: workplace-search-1.ea-eden-3-staging.elastic.dev)"
	read HOST
	ACCESS_TOKEN=`echo $ACCESS_TOKEN \| tr -d '[:blank:]'`

seanstory / demo_business_records.json

Last active June 9, 2022 21:58

A json listing of fake businesses

	[
	{
	"business_id": "b72ed9bb-d74d-44f6-bb23-83ed7e51121d",
	"business_name": "Jane",
	"city": "New York",
	"street_address": "100 W Houston St",
	"phone_number": "(212) 254-7000",
	"tags": [
	"Breakfast & Brunch",
	"American (New)",

Sean Story seanstory