Elasticsearch supports vector search, but when implementing vector search, it is essentially expected that all data resides in RAM (off-heap memory). Previously, there was no way to know how much memory was required by indexes storing vector data, but starting from v9.1, metrics related to vector data can now be obtained.
This article introduces how to obtain these metrics and their meanings. Additionally, we compare the metrics when storing vectors with four types of index options: Flat, HNSW, Int8 HNSW, and BBQ HNSW, and verify the impact of each index option on RAM.
Vector data in Elasticsearch is stored in off-heap memory. Off-heap refers to native memory areas outside of the JVM's heap memory. By using off-heap memory, Elasticsearch/Lucene can efficiently handle large amounts of vector data. However, since it is managed separately from the JVM's heap memory, it is not included in regular JVM memory usage metrics. Therefore, it is necessary to obtain off-heap memory usage through alternative methods.
Elasticsearch provides several index options when storing vector data. These options affect how vector data is stored and search performance. By referring to the following, you can check the theoretical memory usage required for vector data for each index option.
Summarized in a table, it looks like this:
| element_type | Quantization | Theoretical Memory Usage |
|---|---|---|
| float | None | num_vectors * num_dimensions * 4 |
| float | int8 | num_vectors * (num_dimensions + 4) |
| float | int4 | num_vectors * (num_dimensions / 2 + 4) |
| float | bbq | num_vectors * (num_dimensions / 8 + 14) |
| byte | None | num_vectors * num_dimensions |
| bit | None | num_vectors * (num_dimensions / 8) |
Additionally, when using HNSW, extra memory is required for the HNSW graph. The theoretical memory usage for the HNSW graph is as follows:
num_vectors * 4 * HNSW.m
Here, HNSW.m is a parameter of the HNSW algorithm, with a default value of 16.
When creating a new index to store vector data, you can use these theoretical values as a reference to estimate how much off-heap memory the vector data stored in the index will actually require.
So, how can we obtain the off-heap memory usage of vector data used in an Elasticsearch index that is actually in operation?
Starting from Elasticsearch v9.1, metrics related to vector memory can be obtained using the Get index statistics API. As shown below, you can use the filter_path parameter to extract only vector-related metrics.
GET my_vector_index/_stats?filter_path=*.primaries.dense_vector
Using this API, you can obtain vector-related metrics like the following:
{
"_all": {
"primaries": {
"dense_vector": {
"value_count": 764,
"off_heap": {
"total_size_bytes": 1229092,
"total_vec_size_bytes": 1173504,
"total_veq_size_bytes": 0,
"total_veb_size_bytes": 47368,
"total_vex_size_bytes": 8220
}
}
}
}
}
The meaning of each element is shown in the table below.
| Metric Name | Description |
|---|---|
| value_count | Total number of vectors in the index |
| total_size_bytes | Total size of vector data used in off-heap memory |
| total_vec_size_bytes | Size of non-quantized vector data |
| total_veq_size_bytes | Size of quantized vector data (int4 or int8). The 'q' in veq stands for quantization. |
| total_veb_size_bytes | Size of binary quantized vector data (bbq). The 'b' in veb stands for binary. |
| total_vex_size_bytes | Size of HNSW graph |
In the above example, bbq quantized vectors are used, so total_veq_size_bytes is 0. If int4 or int8 is used, total_veb_size_bytes will be 0, and the size will be displayed in total_veq_size_bytes.
Among these, the items that should fit in RAM can be summarized as follows:
| Index type | Items that should fit in RAM |
|---|---|
| flat | vec |
| hnsw | vec, vex |
| int8_hnsw | veq, vex |
| bbq_hnsw | veb, vex |
However, note that this value is a theoretical value based on the actual data count and settings. It means that this much memory is required when calculated from the total amount of stored vectors. For example, RAM may already be consumed by other processes, so there is no accurate way to determine exactly how much memory vector data is actually using at the OS level.
Nevertheless, Elasticsearch assumes that this vector data (total_size_bytes) is fully expanded in RAM when performing searches. By referring to these metrics, you can understand the resources that Elasticsearch requires.
We actually loaded data into Elasticsearch and verified how closely the above metrics match the theoretical values.
We compared the metrics when registering 100 64-dimensional vectors with four types of index options: Flat, HNSW, Int8 HNSW, and BBQ HNSW. The results are as follows:
| Index type | value_count | total_size_bytes | vec | veq | veb | vex |
|---|---|---|---|---|---|---|
| flat | 100 | 25600 | 25600 | 0 | 0 | 0 |
| hnsw | 100 | 26780 | 25600 | 0 | 0 | 1180 |
| int8_hnsw | 100 | 33601 | 25600 | 6800 | 0 | 1201 |
| bbq_hnsw | 100 | 28982 | 25600 | 0 | 2200 | 1182 |
Here, the items that should fit in memory are shown in bold. The units are in bytes.
Comparing the theoretical values with the actual metrics for each index option yields the following results:
- The vector data (vec) values are all 25,600, which perfectly matches the theoretical value (num_vectors * num_dimensions * 4 = 100 * 64 * 4).
- The HNSW graph (vex) values are considerably smaller than the theoretical value (num_vectors * 4 * HNSW.m = 100 * 4 * 16 = 6400). This may be because the number of HNSW graph connections was kept low due to the small number of vectors. For actual operations, we recommend verifying with an appropriate size. Also, since this value depends on the structure of the created graph, it appears to vary depending on the values of the registered vectors.
- The Int8 quantization (veq) value perfectly matches the theoretical value (num_vectors * (num_dimensions + 4) = 100 * (64 + 4) = 6800).
- The BBQ quantization (veb) value also perfectly matches the theoretical value (num_vectors * (num_dimensions / 8 + 14) = 100 * (64 / 8 + 14) = 2200).
The above results were verified with the following code. By setting the ES_URL and ES_API_KEY in the environment variables or .env file and running it, the above table will be displayed.
# Test the vector quantizations
import os
from elasticsearch import Elasticsearch
from dotenv import load_dotenv
import numpy as np
from tqdm import tqdm
# Load environment variables from .env file (ES_URL and ES_API_KEY)
load_dotenv()
TEST_SPECS = [
# Format: (index_name, index_options.type)
("vec_float_flat", "flat"),
("vec_float_hnsw", "hnsw"),
("vec_int8_hnsw", "int8_hnsw"),
("vec_bbq_hnsw", "bbq_hnsw"),
]
NUM_VECTORS = 100
DIM = 64
M = 32
EF_CONSTRUCTION = 100
BULK_BATCH_SIZE = 500 # Number of documents per bulk request
"""
Create index with given name and type.
"""
def create_index(es, index_name, index_type):
# Delete index if it exists
if es.indices.exists(index=index_name):
es.indices.delete(index=index_name)
body = {
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"dims": DIM,
"similarity": "cosine",
"element_type": "float",
"index_options": {
"type": index_type
}
}
}
}
}
if index_type == "hnsw" or index_type == "int8_hnsw" or index_type == "bbq_hnsw":
body["mappings"]["properties"]["vector"]["index_options"].update({
"m": M,
"ef_construction": EF_CONSTRUCTION
})
es.indices.create(index=index_name, body=body)
print(f"Created index: {index_name} with type: {index_type}")
"""
Ingest sample vectors into the given index.
"""
def ingest_vector(es, index_name, vectors):
total_vectors = len(vectors)
num_batches = (total_vectors + BULK_BATCH_SIZE - 1) // BULK_BATCH_SIZE
print(f"Ingesting {total_vectors} vectors into {index_name}")
# Process vectors in batches with progress bar
with tqdm(total=total_vectors, desc=f"Bulk indexing to {index_name}", unit="docs") as pbar:
for batch_num in range(num_batches):
start_idx = batch_num * BULK_BATCH_SIZE
end_idx = min(start_idx + BULK_BATCH_SIZE, total_vectors)
bulk_body = []
for i in range(start_idx, end_idx):
bulk_body.append({"index": {"_index": index_name, "_id": str(i)}})
bulk_body.append({"vector": vectors[i]})
es.bulk(body=bulk_body)
pbar.update(end_idx - start_idx)
# Refresh index to make documents searchable
es.indices.refresh(index=index_name)
# Flush to disk
es.indices.flush(index=index_name)
# Force merge to combine all segments into 1
es.indices.forcemerge(index=index_name, max_num_segments=1)
print(f"Completed ingestion of {total_vectors} vectors into {index_name}")
"""
Check the off-heap metrics for the given index.
Print the results as markdown table.
"""
def test_vector_quantizations(es):
print("## Parameters\n")
print(f"- Number of vectors: {NUM_VECTORS}")
print(f"- Dimensions: {DIM}")
print(f"- HNSW M: {M}")
print(f"- HNSW ef_construction: {EF_CONSTRUCTION}")
print("\n## Off-heap Memory Usage\n")
print("| Index type | value_count | total_size_bytes | vec | veq | veb | vex |")
print("|------------|-------------|------------------|-------------|-------------|-------------|-------------|")
def get_formatted_off_heap_size(off_heap, key):
size = off_heap.get(key, 0)
return f"{size:,}"
for index_name, index_type in TEST_SPECS:
# Get index stats with dense_vector metrics
stats = es.indices.stats(index=index_name)
# Extract dense_vector information
dense_vector = stats['indices'][index_name]['primaries'].get('dense_vector', {})
off_heap = dense_vector.get('off_heap', {})
# Get off-heap memory breakdown
vec_fmt = get_formatted_off_heap_size(off_heap, 'total_vec_size_bytes')
veq_fmt = get_formatted_off_heap_size(off_heap, 'total_veq_size_bytes')
veb_fmt = get_formatted_off_heap_size(off_heap, 'total_veb_size_bytes')
vex_fmt = get_formatted_off_heap_size(off_heap, 'total_vex_size_bytes')
total_fmt = get_formatted_off_heap_size(off_heap, 'total_size_bytes')
# Get document count
count = dense_vector.get('value_count', 0)
count_fmt = f"{count:,}"
print(f"| {index_type:10} | {count_fmt:>11} | {total_fmt:>16} | {vec_fmt:>11} | {veq_fmt:>11} | {veb_fmt:>11} | {vex_fmt:>11} |")
if __name__ == "__main__":
es_url = os.getenv("ES_URL", "http://localhost:9200")
es_api_key = os.getenv("ES_API_KEY")
# Connect to Elasticsearch
if es_api_key:
es = Elasticsearch(es_url, api_key=es_api_key)
else:
es = Elasticsearch(es_url)
# Generate vectors once for all tests
print(f"Generating {NUM_VECTORS} random vectors with {DIM} dimensions...")
vectors = [np.random.rand(DIM).tolist() for _ in range(NUM_VECTORS)]
for index_name, index_type in TEST_SPECS:
create_index(es, index_name, index_type)
ingest_vector(es, index_name, vectors)
test_vector_quantizations(es)Metrics for off-heap memory usage related to Elasticsearch vector data became available via the Get index statistics API starting from v9.1. By using these metrics, you can understand how much off-heap memory the vector data stored in an index actually requires. The results of comparing theoretical values with actual metrics for each index option confirmed that in most cases they match the theoretical values. Please leverage this information to effectively operate Elasticsearch's vector search functionality.