Skip to content

Instantly share code, notes, and snippets.

@Kondasamy
Created September 13, 2024 06:46
Show Gist options
  • Select an option

  • Save Kondasamy/4dcf1c2d0bc67851024b4a4f0ed3e079 to your computer and use it in GitHub Desktop.

Select an option

Save Kondasamy/4dcf1c2d0bc67851024b4a4f0ed3e079 to your computer and use it in GitHub Desktop.
Export ElasticSearch index to CSV

To export an Elasticsearch (ES) index as a CSV file, you can follow these steps:

  1. Use the Elasticsearch Scroll API to retrieve large amounts of data.
  2. Process the retrieved data and format it as CSV.
  3. Write the formatted data to a file.

Attaching the Python script that does this process.

To use this script:

  1. Install the required libraries:

    pip install elasticsearch
    
  2. Replace the example values in the script:

    • es_host: Your Elasticsearch host URL
    • index_name: The name of the index you want to export
    • csv_file_path: The desired output file path
    • fields: List of field names you want to export
  3. Run the script.

This script uses the scan helper from elasticsearch.helpers to efficiently retrieve large amounts of data from Elasticsearch. It then writes this data to a CSV file using Python's built-in csv module.

import csv
from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
def es_to_csv(es_host, index_name, csv_file_path, fields):
# Initialize Elasticsearch client
es = Elasticsearch([es_host])
# Prepare the query
query = {
"_source": fields,
"query": {
"match_all": {}
}
}
# Use the scan helper to retrieve all documents
results = scan(es, query=query, index=index_name)
# Write to CSV
with open(csv_file_path, 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fields)
writer.writeheader()
for result in results:
writer.writerow(result['_source'])
print(f"Export completed. Data saved to {csv_file_path}")
# Example usage
es_host = "http://localhost:9200"
index_name = "your_index_name"
csv_file_path = "output.csv"
fields = ["field1", "field2", "field3"] # Replace with your actual field names
es_to_csv(es_host, index_name, csv_file_path, fields)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment