Created
March 6, 2025 22:34
-
-
Save jeffvestal/8b1fe3f092d041c575404e88bdcd66e1 to your computer and use it in GitHub Desktop.
Semantic Text pre and post 8.18
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DELETE old_semantic_text | |
## Semantic Text Field Type 8.17 and Earlier | |
############################################################ | |
############ Create Mapping - Can Not create multifield | |
PUT old_semantic_text | |
{ | |
"mappings": { | |
"properties": { | |
"text": { | |
"type": "text" | |
}, | |
"semantic_text": { | |
"type": "semantic_text" | |
} | |
} | |
} | |
} | |
############################################################ | |
## Index a big document to both `text` and `semantic_text` | |
POST old_semantic_text/_doc/123 | |
{ | |
"page_number": 0, | |
"text": """Summary of Risks Affecting our Business | |
The following is a summary of the key risks and uncertainties associated with our business, industry, and ownership of our ordinary shares. The | |
summary below does not contain all of the information that may be important to you, and you should read this summary together with the more complete | |
discussion of the risks and uncertainties we face, which are set forth in Part I, | |
“Item 1A— Risk Factors” in this Annual Report on Form 10-K. | |
• If we do not appropriately manage our future growth or are unable to improve our systems and processes, our business and results of operations may | |
be adversely affected. | |
• We have a history of losses and may not be able to achieve profitability or positive operating cash flow on a consistent basis. | |
• Information technology (“IT”) spending, sales cycles, and other factors affecting the demand for our offerings and our results of operations have been, | |
and may continue to be, negatively impacted by current macroeconomic conditions, including declining rates of economic growth, inflationary | |
pressures, increased interest rates, and other conditions discussed in this report, and by the evolving conflict in Israel and Gaza and Russia’s war with | |
Ukraine. | |
• We may not be successful in our artificial intelligence initiatives, and social, ethical, and regulatory issues relating to the use of AI in our offerings | |
may result in new or enhanced governmental or regulatory scrutiny, reputational harm, damage to our competitive position, and liability. | |
• We and our third-party vendors are vulnerable to cybersecurity risks, including phishing attacks, viruses, malware, ransomware, hacking, or similar | |
incidents that may disrupt or damage our business. | |
• Our ability to grow our business will suffer if we do not expand and increase adoption of our Elastic Cloud offerings. | |
• Our operating results may fluctuate from quarter to quarter. | |
• Our future growth, business and results of operations will be harmed if we are not able to keep pace with technological and competitive developments, | |
increase sales of our subscriptions to new and existing customers, renew existing customers’ subscriptions, increase adoption of our cloud-based | |
offerings, respond effectively to evolving markets or offer high-quality support services. | |
• Our limited history with consumption-based arrangements for our Elastic Cloud offerings is not adequate to enable us to accurately predict the long- | |
term rate of customer adoption or renewal, or the impact those arrangements will have on our near-term or long-term revenue and operating results. | |
• Because we recognize the vast majority of our revenue from subscriptions, downturns or upturns in sales are not immediately reflected in full in our | |
results of operations. | |
• We may not be able to effectively develop and expand our sales, marketing and customer support capabilities. | |
• If our partners, including cloud providers, systems integrators, channel partners, referral partners, original equipment manufacturing and managed | |
service provider partners, and technology partners, fail to perform or we are unable to maintain successful relationships with them, our ability to | |
market, sell, and distribute our solution may be limited. | |
• We may not be able to realize the benefits of our marketing strategies where we offer some of our product features free of charge and provide free | |
trials to some of our paid features. | |
• Our international business exposes us to a variety of risks, and if we are not successful in sustaining and expanding our international business, we may | |
incur additional losses and our revenue growth could be harmed. | |
• We are subject to risks associated with our receipt of revenue from sales to government entities. | |
• Our decision to no longer offer Elasticsearch and Kibana under an open source license may harm the adoption of those products. | |
• We could be negatively impacted if the Elastic License or the Server Side Public License Version 1.0 (“SSPL”) under which some of our software is | |
licensed is not enforceable. | |
• Our reputation could be harmed if third parties offer inadequate or defective implementations of software that we have previously made available | |
under an open source license. | |
• Limited technological barriers to entry into the markets in which we compete may facilitate entry by other enterprises into our markets to compete | |
with us. | |
• A real or perceived defect, security vulnerability, error, or performance failure in our software could cause us to lose revenue, damage our reputation, | |
and expose us to financial liability. | |
3 | |
Table of Contents | |
• Interruptions or performance problems, and our reliance on technologies from third parties, may adversely affect our business operations and financial | |
results. | |
• Incorrect implementation or use of our software could negatively affect our business, operations, financial results, and growth prospects. | |
• Failure to protect our proprietary technology and intellectual property rights could substantially harm our business and results of operations. | |
• We could incur substantial costs as a result of any claim of infringement, misappropriation or violation of another party’s intellectual property rights, | |
including as a result of the indemnity provisions in various agreements. | |
• Our use of third-party open source software within our products could negatively affect our ability to sell our products and subject us to litigation. | |
• An investment in our company is subject to tax risks based on our status as a non-U.S. corporation. | |
• Any actual or perceived failure by us to comply with regulations or any other obligations relating to privacy, data protection or information security | |
could adversely affect our business. | |
• Our business is subject to a variety of government and industry regulations, as well as other obligations, including compliance with export control, | |
trade sanctions, anti-bribery, anti-corruption, and anti-money laundering laws. | |
• The market price for our ordinary shares has been and is likely to continue to be volatile. | |
• The concentration of our share ownership with insiders will likely limit your ability to influence corporate matters. | |
• Dutch law and our articles of association include anti-takeover provisions, which may impact the value of our ordinary shares. | |
• Claims of U.S. civil liabilities may not be enforceable against us. | |
• If industry or financial analysts do not publish research or reports about our business, or if they issue inaccurate or unfavorable research regarding our | |
ordinary shares, our share price and trading volume could decline. | |
• We have a substantial amount of indebtedness and may not be able to generate sufficient cash to service all of our indebtedness. | |
• Our reputation or business could be negatively impacted by environmental, social, and governance (“ESG”) matters and our reporting of such matters. | |
• We may fail to maintain an effective system of disclosure controls and internal control over financial reporting. | |
4 | |
Table of Contents | |
PART I | |
Item 1. Business | |
Elastic, the Search AI Company, enables our customers to find the answers they need in real time, using all of their data, at scale. The Elastic Search | |
AI Platform (“our platform”), combines the power of search with AI to help companies solve real-time business problems, unlock potential value, and achieve | |
better outcomes. Our platform, available as both a hosted, managed service across public clouds as well as self-managed software, allows our customers to find | |
insights and drive AI and machine learning use cases from large amounts of data. | |
We offer three search-powered solutions – Search, Observability, and Security – that are built on the platform. We help organizations, their employees, | |
and their customers find what they need faster, while keeping mission-critical applications running smoothly, and protecting against cyber threats. | |
As digital transformation drives mission-critical business functions to the cloud, we believe that every company must incorporate search AI | |
capabilities across IT and line-of-business organizations to find the answers that matter from all of its data in real-time and at scale. | |
Our platform is built on the Elastic Stack, a powerful set of software products that ingest data from any source, in any format, and perform search, | |
analysis, and visualization of that data. At the core of the Elastic Stack is Elasticsearch - a highly scalable document store and search engine, and the unified | |
data store for all of our solutions and use cases. Another component of the Elastic Stack is Kibana, which delivers a common user interface across all of our | |
solutions, with powerful drag-and-drop visual analytics, and centralized management of the platform. Our platform also includes the Elasticsearch Relevance | |
Engine™ (“ESRE”), which combines advanced AI with Elastic’s text search to give developers a full suite of sophisticated retrieval algorithms and the ability | |
to integrate with large language models. Our out-of-the-box solutions deliver fast time to value for common use cases and, paired with our developer-centric | |
platform which is extensible and customizable, allow us to innovate fast and differentiate our offerings at every level. | |
We make our platform available as a hosted, managed service across major cloud providers (Amazon Web Services (“AWS”), Google Cloud Platform | |
(“GCP”), and Microsoft Azure) in more than 55 public cloud regions globally. Customers can also deploy our platform across hybrid clouds, public or private | |
clouds, and multi-cloud environments. | |
Our business model is based primarily on a combination of a paid Elastic-managed hosted service offering and paid and free proprietary self-managed | |
software. Our paid offerings for our platform are sold via subscription through resource-based pricing, and all customers and users have access to varying | |
levels of features across all solutions. In Elastic Cloud, our family of cloud-based offerings, we offer various subscription tiers tied to different features. For | |
users who download our software, we make some of the features of our software available free of charge, allowing us to engage with a broad community of | |
developers and practitioners and introduce them to the value of the Elastic Stack. We believe in the importance of an open software development model, and | |
we develop the majority of our software in public repositories as open code under a proprietary license. Unlike some companies, we do not build an enterprise | |
version that is separate from our free distribution. We maintain a single code base across both our self-managed software and Elastic-hosted services. All of | |
these actions help us build a powerful commercial business model that we believe is optimized for product-driven growth. | |
Our customers often significantly expand their usage of our products and services over time. Expansion includes increasing the number of developers | |
and practitioners using our products, increasing the utilization of our products for a particular use case, and utilizing our products to address new use cases. We | |
focus some of our direct sales efforts on encouraging this type of expansion within our customer base, both within as well as across solutions. Because our | |
business model provides access to all solutions with resource-based pricing, we make it easy for customers to expand across use cases. | |
Our business has experienced rapid growth around the world. As of April 30, 2024, we had approximately 21,000 customers compared to | |
approximately 20,200 customers and over 18,600 customers as of April 30, 2023 and 2022, respectively. Our total revenue was $1.267 billion, $1.069 billion, | |
and $862.4 million for the years ended April 30, 2024, 2023 and 2022, respectively, representing year-over-year growth of 19% for the year ended April 30, | |
2024 and 24% for the year ended April 30, 2023. Subscriptions accounted for 93%, 92% and 93% of our total revenue for the years ended April 30, 2024, 2023 | |
and 2022, respectively. Revenue from outside the United States accounted for 42%, 41% and 44% of our total revenue for the years ended April 30, 2024, 2023 | |
and 2022, respectively. | |
While we recorded net income of $61.7 million for the year ended April 30, 2024, we incurred net losses of $236.2 million and $203.8 million for the | |
years ended April 30, 2023 and 2022, respectively. Although we recorded net income for the year ended April 30, 2024, we expect to incur net losses for the | |
foreseeable future. Our net cash provided by operating activities was $148.8 million, $35.7 million, and $5.7 million for the years ended April 30, 2024, 2023, | |
and 2022 respectively. | |
5 | |
Table of Contents | |
Our Products | |
Our products enable our customers and users to find relevant information and insights nearly instantly in large amounts of data across a broad range of | |
business and consumer use cases. | |
We offer the Elastic Stack, a powerful set of software products that ingest and store data from any source, in any format, and perform search, analysis, | |
and visualization, usually in milliseconds. The Elastic Stack can be used by developers and IT decision makers to power a variety of use cases. We also offer | |
software solutions built in the Elastic Stack that address a wide variety of use cases. The Elastic Stack and our solutions are designed to run in public or private | |
clouds, in hybrid environments, or in multi-cloud environments. | |
The Elastic Stack | |
The Elastic Stack is primarily composed of the following products: | |
• Elasticsearch. Elasticsearch is the heart of the Elastic Stack. It is a distributed, real-time vector search and analytics engine and data store for all | |
types of data, including textual, numerical, geospatial, structured, and unstructured. | |
• Kibana. Kibana is the user interface for the Elastic Stack. It is the visualization layer for data stored in Elasticsearch. It is also the management | |
and configuration interface for all parts of the Elastic Stack. | |
Elastic has spent years infusing both Elasticsearch and Kibana with a foundation of AI and machine learning built on ESRE, from support for external | |
machine learning models to native vector search capabilities, supervised and unsupervised machine learning, and solution capabilities that improve search | |
relevance and identify anomalies. Elastic enables organizations to integrate generative AI and large language models by building key capabilities into its | |
products. | |
The Elastic Stack also supports data ingest with a number of products: | |
• Elastic Agent. Elastic Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to each host. Elastic Agent | |
includes integrated host protection and central management. | |
• Beats. Beats is the family of lightweight, single-purpose data shippers for sending data from edge machines to Elasticsearch or Logstash. | |
• Logstash. Logstash is the dynamic data processing pipeline for ingesting data into Elasticsearch or other storage systems from a multitude of | |
sources simultaneously. | |
Paid proprietary features in the Elastic Stack enable capabilities such as automating anomaly detection on time series data at scale through machine | |
learning, facilitating compliance with data security and privacy regulations, supporting search across low cost cold and frozen data tiers, and allowing real-time | |
notifications and alerts. The source code of features in the Elastic Stack is generally visible to the public in the form of “open code. | |
” | |
Our Solutions | |
We have built a number of solutions into the Elastic Stack to make it easier for organizations to use our software for common use cases. Our solutions | |
include the following: | |
• Search. Our Search solution provides a powerful search platform for building search AI applications. Key use cases for Search include: | |
generative AI and retrieval-augmented generation, search applications, and foundational capabilities for building search experiences to support | |
websites and portals, e-commerce, mobile app search, customer support, and workplace search. | |
• Observability. Our Observability solution enables unified analysis across the IT ecosystem of applications, networks, and infrastructure. | |
Observability includes: Logs, to search and analyze petabytes of structured and unstructured logs; Metrics, to search and analyze numeric and | |
time series data; Application Performance Monitoring (“APM”), to deliver insight into application performance and health metrics and provide | |
developers with confidence in their code; and Synthetic Monitoring, to proactively monitor the availability and functionality of user journeys. | |
• Security. Our Security solution provides unified protection to prevent, detect, and respond to threats. Our AI-driven security analytics solution | |
includes: Security Information and Event Management (“SIEM”), with integrations to network, host, user, and cloud data sources, as well as | |
workflow and operations, shareable analytics, incident management, and investigations; extended protection with both third party integrations as | |
well as first party protections for both Endpoint Security (prevention, detection, and response) and Cloud Security (cloud posture assessment, | |
vulnerability management, and cloud workload protection). | |
6 | |
""", | |
"semantic_text": """Summary of Risks Affecting our Business | |
The following is a summary of the key risks and uncertainties associated with our business, industry, and ownership of our ordinary shares. The | |
summary below does not contain all of the information that may be important to you, and you should read this summary together with the more complete | |
discussion of the risks and uncertainties we face, which are set forth in Part I, | |
“Item 1A— Risk Factors” in this Annual Report on Form 10-K. | |
• If we do not appropriately manage our future growth or are unable to improve our systems and processes, our business and results of operations may | |
be adversely affected. | |
• We have a history of losses and may not be able to achieve profitability or positive operating cash flow on a consistent basis. | |
• Information technology (“IT”) spending, sales cycles, and other factors affecting the demand for our offerings and our results of operations have been, | |
and may continue to be, negatively impacted by current macroeconomic conditions, including declining rates of economic growth, inflationary | |
pressures, increased interest rates, and other conditions discussed in this report, and by the evolving conflict in Israel and Gaza and Russia’s war with | |
Ukraine. | |
• We may not be successful in our artificial intelligence initiatives, and social, ethical, and regulatory issues relating to the use of AI in our offerings | |
may result in new or enhanced governmental or regulatory scrutiny, reputational harm, damage to our competitive position, and liability. | |
• We and our third-party vendors are vulnerable to cybersecurity risks, including phishing attacks, viruses, malware, ransomware, hacking, or similar | |
incidents that may disrupt or damage our business. | |
• Our ability to grow our business will suffer if we do not expand and increase adoption of our Elastic Cloud offerings. | |
• Our operating results may fluctuate from quarter to quarter. | |
• Our future growth, business and results of operations will be harmed if we are not able to keep pace with technological and competitive developments, | |
increase sales of our subscriptions to new and existing customers, renew existing customers’ subscriptions, increase adoption of our cloud-based | |
offerings, respond effectively to evolving markets or offer high-quality support services. | |
• Our limited history with consumption-based arrangements for our Elastic Cloud offerings is not adequate to enable us to accurately predict the long- | |
term rate of customer adoption or renewal, or the impact those arrangements will have on our near-term or long-term revenue and operating results. | |
• Because we recognize the vast majority of our revenue from subscriptions, downturns or upturns in sales are not immediately reflected in full in our | |
results of operations. | |
• We may not be able to effectively develop and expand our sales, marketing and customer support capabilities. | |
• If our partners, including cloud providers, systems integrators, channel partners, referral partners, original equipment manufacturing and managed | |
service provider partners, and technology partners, fail to perform or we are unable to maintain successful relationships with them, our ability to | |
market, sell, and distribute our solution may be limited. | |
• We may not be able to realize the benefits of our marketing strategies where we offer some of our product features free of charge and provide free | |
trials to some of our paid features. | |
• Our international business exposes us to a variety of risks, and if we are not successful in sustaining and expanding our international business, we may | |
incur additional losses and our revenue growth could be harmed. | |
• We are subject to risks associated with our receipt of revenue from sales to government entities. | |
• Our decision to no longer offer Elasticsearch and Kibana under an open source license may harm the adoption of those products. | |
• We could be negatively impacted if the Elastic License or the Server Side Public License Version 1.0 (“SSPL”) under which some of our software is | |
licensed is not enforceable. | |
• Our reputation could be harmed if third parties offer inadequate or defective implementations of software that we have previously made available | |
under an open source license. | |
• Limited technological barriers to entry into the markets in which we compete may facilitate entry by other enterprises into our markets to compete | |
with us. | |
• A real or perceived defect, security vulnerability, error, or performance failure in our software could cause us to lose revenue, damage our reputation, | |
and expose us to financial liability. | |
3 | |
Table of Contents | |
• Interruptions or performance problems, and our reliance on technologies from third parties, may adversely affect our business operations and financial | |
results. | |
• Incorrect implementation or use of our software could negatively affect our business, operations, financial results, and growth prospects. | |
• Failure to protect our proprietary technology and intellectual property rights could substantially harm our business and results of operations. | |
• We could incur substantial costs as a result of any claim of infringement, misappropriation or violation of another party’s intellectual property rights, | |
including as a result of the indemnity provisions in various agreements. | |
• Our use of third-party open source software within our products could negatively affect our ability to sell our products and subject us to litigation. | |
• An investment in our company is subject to tax risks based on our status as a non-U.S. corporation. | |
• Any actual or perceived failure by us to comply with regulations or any other obligations relating to privacy, data protection or information security | |
could adversely affect our business. | |
• Our business is subject to a variety of government and industry regulations, as well as other obligations, including compliance with export control, | |
trade sanctions, anti-bribery, anti-corruption, and anti-money laundering laws. | |
• The market price for our ordinary shares has been and is likely to continue to be volatile. | |
• The concentration of our share ownership with insiders will likely limit your ability to influence corporate matters. | |
• Dutch law and our articles of association include anti-takeover provisions, which may impact the value of our ordinary shares. | |
• Claims of U.S. civil liabilities may not be enforceable against us. | |
• If industry or financial analysts do not publish research or reports about our business, or if they issue inaccurate or unfavorable research regarding our | |
ordinary shares, our share price and trading volume could decline. | |
• We have a substantial amount of indebtedness and may not be able to generate sufficient cash to service all of our indebtedness. | |
• Our reputation or business could be negatively impacted by environmental, social, and governance (“ESG”) matters and our reporting of such matters. | |
• We may fail to maintain an effective system of disclosure controls and internal control over financial reporting. | |
4 | |
Table of Contents | |
PART I | |
Item 1. Business | |
Elastic, the Search AI Company, enables our customers to find the answers they need in real time, using all of their data, at scale. The Elastic Search | |
AI Platform (“our platform”), combines the power of search with AI to help companies solve real-time business problems, unlock potential value, and achieve | |
better outcomes. Our platform, available as both a hosted, managed service across public clouds as well as self-managed software, allows our customers to find | |
insights and drive AI and machine learning use cases from large amounts of data. | |
We offer three search-powered solutions – Search, Observability, and Security – that are built on the platform. We help organizations, their employees, | |
and their customers find what they need faster, while keeping mission-critical applications running smoothly, and protecting against cyber threats. | |
As digital transformation drives mission-critical business functions to the cloud, we believe that every company must incorporate search AI | |
capabilities across IT and line-of-business organizations to find the answers that matter from all of its data in real-time and at scale. | |
Our platform is built on the Elastic Stack, a powerful set of software products that ingest data from any source, in any format, and perform search, | |
analysis, and visualization of that data. At the core of the Elastic Stack is Elasticsearch - a highly scalable document store and search engine, and the unified | |
data store for all of our solutions and use cases. Another component of the Elastic Stack is Kibana, which delivers a common user interface across all of our | |
solutions, with powerful drag-and-drop visual analytics, and centralized management of the platform. Our platform also includes the Elasticsearch Relevance | |
Engine™ (“ESRE”), which combines advanced AI with Elastic’s text search to give developers a full suite of sophisticated retrieval algorithms and the ability | |
to integrate with large language models. Our out-of-the-box solutions deliver fast time to value for common use cases and, paired with our developer-centric | |
platform which is extensible and customizable, allow us to innovate fast and differentiate our offerings at every level. | |
We make our platform available as a hosted, managed service across major cloud providers (Amazon Web Services (“AWS”), Google Cloud Platform | |
(“GCP”), and Microsoft Azure) in more than 55 public cloud regions globally. Customers can also deploy our platform across hybrid clouds, public or private | |
clouds, and multi-cloud environments. | |
Our business model is based primarily on a combination of a paid Elastic-managed hosted service offering and paid and free proprietary self-managed | |
software. Our paid offerings for our platform are sold via subscription through resource-based pricing, and all customers and users have access to varying | |
levels of features across all solutions. In Elastic Cloud, our family of cloud-based offerings, we offer various subscription tiers tied to different features. For | |
users who download our software, we make some of the features of our software available free of charge, allowing us to engage with a broad community of | |
developers and practitioners and introduce them to the value of the Elastic Stack. We believe in the importance of an open software development model, and | |
we develop the majority of our software in public repositories as open code under a proprietary license. Unlike some companies, we do not build an enterprise | |
version that is separate from our free distribution. We maintain a single code base across both our self-managed software and Elastic-hosted services. All of | |
these actions help us build a powerful commercial business model that we believe is optimized for product-driven growth. | |
Our customers often significantly expand their usage of our products and services over time. Expansion includes increasing the number of developers | |
and practitioners using our products, increasing the utilization of our products for a particular use case, and utilizing our products to address new use cases. We | |
focus some of our direct sales efforts on encouraging this type of expansion within our customer base, both within as well as across solutions. Because our | |
business model provides access to all solutions with resource-based pricing, we make it easy for customers to expand across use cases. | |
Our business has experienced rapid growth around the world. As of April 30, 2024, we had approximately 21,000 customers compared to | |
approximately 20,200 customers and over 18,600 customers as of April 30, 2023 and 2022, respectively. Our total revenue was $1.267 billion, $1.069 billion, | |
and $862.4 million for the years ended April 30, 2024, 2023 and 2022, respectively, representing year-over-year growth of 19% for the year ended April 30, | |
2024 and 24% for the year ended April 30, 2023. Subscriptions accounted for 93%, 92% and 93% of our total revenue for the years ended April 30, 2024, 2023 | |
and 2022, respectively. Revenue from outside the United States accounted for 42%, 41% and 44% of our total revenue for the years ended April 30, 2024, 2023 | |
and 2022, respectively. | |
While we recorded net income of $61.7 million for the year ended April 30, 2024, we incurred net losses of $236.2 million and $203.8 million for the | |
years ended April 30, 2023 and 2022, respectively. Although we recorded net income for the year ended April 30, 2024, we expect to incur net losses for the | |
foreseeable future. Our net cash provided by operating activities was $148.8 million, $35.7 million, and $5.7 million for the years ended April 30, 2024, 2023, | |
and 2022 respectively. | |
5 | |
Table of Contents | |
Our Products | |
Our products enable our customers and users to find relevant information and insights nearly instantly in large amounts of data across a broad range of | |
business and consumer use cases. | |
We offer the Elastic Stack, a powerful set of software products that ingest and store data from any source, in any format, and perform search, analysis, | |
and visualization, usually in milliseconds. The Elastic Stack can be used by developers and IT decision makers to power a variety of use cases. We also offer | |
software solutions built in the Elastic Stack that address a wide variety of use cases. The Elastic Stack and our solutions are designed to run in public or private | |
clouds, in hybrid environments, or in multi-cloud environments. | |
The Elastic Stack | |
The Elastic Stack is primarily composed of the following products: | |
• Elasticsearch. Elasticsearch is the heart of the Elastic Stack. It is a distributed, real-time vector search and analytics engine and data store for all | |
types of data, including textual, numerical, geospatial, structured, and unstructured. | |
• Kibana. Kibana is the user interface for the Elastic Stack. It is the visualization layer for data stored in Elasticsearch. It is also the management | |
and configuration interface for all parts of the Elastic Stack. | |
Elastic has spent years infusing both Elasticsearch and Kibana with a foundation of AI and machine learning built on ESRE, from support for external | |
machine learning models to native vector search capabilities, supervised and unsupervised machine learning, and solution capabilities that improve search | |
relevance and identify anomalies. Elastic enables organizations to integrate generative AI and large language models by building key capabilities into its | |
products. | |
The Elastic Stack also supports data ingest with a number of products: | |
• Elastic Agent. Elastic Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to each host. Elastic Agent | |
includes integrated host protection and central management. | |
• Beats. Beats is the family of lightweight, single-purpose data shippers for sending data from edge machines to Elasticsearch or Logstash. | |
• Logstash. Logstash is the dynamic data processing pipeline for ingesting data into Elasticsearch or other storage systems from a multitude of | |
sources simultaneously. | |
Paid proprietary features in the Elastic Stack enable capabilities such as automating anomaly detection on time series data at scale through machine | |
learning, facilitating compliance with data security and privacy regulations, supporting search across low cost cold and frozen data tiers, and allowing real-time | |
notifications and alerts. The source code of features in the Elastic Stack is generally visible to the public in the form of “open code. | |
” | |
Our Solutions | |
We have built a number of solutions into the Elastic Stack to make it easier for organizations to use our software for common use cases. Our solutions | |
include the following: | |
• Search. Our Search solution provides a powerful search platform for building search AI applications. Key use cases for Search include: | |
generative AI and retrieval-augmented generation, search applications, and foundational capabilities for building search experiences to support | |
websites and portals, e-commerce, mobile app search, customer support, and workplace search. | |
• Observability. Our Observability solution enables unified analysis across the IT ecosystem of applications, networks, and infrastructure. | |
Observability includes: Logs, to search and analyze petabytes of structured and unstructured logs; Metrics, to search and analyze numeric and | |
time series data; Application Performance Monitoring (“APM”), to deliver insight into application performance and health metrics and provide | |
developers with confidence in their code; and Synthetic Monitoring, to proactively monitor the availability and functionality of user journeys. | |
• Security. Our Security solution provides unified protection to prevent, detect, and respond to threats. Our AI-driven security analytics solution | |
includes: Security Information and Event Management (“SIEM”), with integrations to network, host, user, and cloud data sources, as well as | |
workflow and operations, shareable analytics, incident management, and investigations; extended protection with both third party integrations as | |
well as first party protections for both Endpoint Security (prevention, detection, and response) and Cloud Security (cloud posture assessment, | |
vulnerability management, and cloud workload protection). | |
""", | |
"filename": "elastic-10q-nov-2024-pt1.pdf", | |
"doc_type": "summary" | |
} | |
GET old_semantic_text/_search | |
############################################################ | |
## Perform a query and return top 1 chunk | |
GET old_semantic_text/_search | |
{ | |
"query": { | |
"nested": { | |
"path": "semantic_text.inference.chunks", | |
"query": { | |
"sparse_vector": { | |
"inference_id": ".elser-2-elasticsearch", | |
"field": "semantic_text.inference.chunks.embeddings", | |
"query": "how do users easily use elastic" | |
} | |
}, | |
"inner_hits": { | |
"size": 1, | |
"_source": [ | |
"semantic_text.inference.chunks.text" | |
] | |
} | |
} | |
}, | |
"_source": false | |
} | |
############################################################ | |
## Perform BM25 query | |
# Can't query the semantic_text field | |
GET old_semantic_text/_search | |
{ | |
"query": { | |
"match": { | |
"semantic_text.text": "how do users easily use elastic" | |
} | |
} | |
} | |
# Have to query separate | |
GET old_semantic_text/_search | |
{ | |
"query": { | |
"match": { | |
"text": "how do users easily use elastic" | |
} | |
} | |
} | |
############################################################ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DELETE new_semantic_text | |
GET / | |
## Semantic Text Field Type 8.18+ and Serverless NOW | |
############################################################ | |
############ Create Mapping | |
### When you need BOTH Lexical and Semantic the Standard Approach should be multifield | |
# `text` - full text | |
# `text.semantic` - semantic_text | |
PUT new_semantic_text | |
{ | |
"mappings": { | |
"properties": { | |
"text": { | |
"type": "text", | |
"fields": { | |
"semantic": { | |
"type": "semantic_text" | |
} | |
} | |
} | |
} | |
} | |
} | |
############################################################ | |
## Index a big document to both just the `text` field | |
POST new_semantic_text/_doc/123 | |
{ | |
"page_number": 0, | |
"text": """Summary of Risks Affecting our Business | |
The following is a summary of the key risks and uncertainties associated with our business, industry, and ownership of our ordinary shares. The | |
summary below does not contain all of the information that may be important to you, and you should read this summary together with the more complete | |
discussion of the risks and uncertainties we face, which are set forth in Part I, | |
“Item 1A— Risk Factors” in this Annual Report on Form 10-K. | |
• If we do not appropriately manage our future growth or are unable to improve our systems and processes, our business and results of operations may | |
be adversely affected. | |
• We have a history of losses and may not be able to achieve profitability or positive operating cash flow on a consistent basis. | |
• Information technology (“IT”) spending, sales cycles, and other factors affecting the demand for our offerings and our results of operations have been, | |
and may continue to be, negatively impacted by current macroeconomic conditions, including declining rates of economic growth, inflationary | |
pressures, increased interest rates, and other conditions discussed in this report, and by the evolving conflict in Israel and Gaza and Russia’s war with | |
Ukraine. | |
• We may not be successful in our artificial intelligence initiatives, and social, ethical, and regulatory issues relating to the use of AI in our offerings | |
may result in new or enhanced governmental or regulatory scrutiny, reputational harm, damage to our competitive position, and liability. | |
• We and our third-party vendors are vulnerable to cybersecurity risks, including phishing attacks, viruses, malware, ransomware, hacking, or similar | |
incidents that may disrupt or damage our business. | |
• Our ability to grow our business will suffer if we do not expand and increase adoption of our Elastic Cloud offerings. | |
• Our operating results may fluctuate from quarter to quarter. | |
• Our future growth, business and results of operations will be harmed if we are not able to keep pace with technological and competitive developments, | |
increase sales of our subscriptions to new and existing customers, renew existing customers’ subscriptions, increase adoption of our cloud-based | |
offerings, respond effectively to evolving markets or offer high-quality support services. | |
• Our limited history with consumption-based arrangements for our Elastic Cloud offerings is not adequate to enable us to accurately predict the long- | |
term rate of customer adoption or renewal, or the impact those arrangements will have on our near-term or long-term revenue and operating results. | |
• Because we recognize the vast majority of our revenue from subscriptions, downturns or upturns in sales are not immediately reflected in full in our | |
results of operations. | |
• We may not be able to effectively develop and expand our sales, marketing and customer support capabilities. | |
• If our partners, including cloud providers, systems integrators, channel partners, referral partners, original equipment manufacturing and managed | |
service provider partners, and technology partners, fail to perform or we are unable to maintain successful relationships with them, our ability to | |
market, sell, and distribute our solution may be limited. | |
• We may not be able to realize the benefits of our marketing strategies where we offer some of our product features free of charge and provide free | |
trials to some of our paid features. | |
• Our international business exposes us to a variety of risks, and if we are not successful in sustaining and expanding our international business, we may | |
incur additional losses and our revenue growth could be harmed. | |
• We are subject to risks associated with our receipt of revenue from sales to government entities. | |
• Our decision to no longer offer Elasticsearch and Kibana under an open source license may harm the adoption of those products. | |
• We could be negatively impacted if the Elastic License or the Server Side Public License Version 1.0 (“SSPL”) under which some of our software is | |
licensed is not enforceable. | |
• Our reputation could be harmed if third parties offer inadequate or defective implementations of software that we have previously made available | |
under an open source license. | |
• Limited technological barriers to entry into the markets in which we compete may facilitate entry by other enterprises into our markets to compete | |
with us. | |
• A real or perceived defect, security vulnerability, error, or performance failure in our software could cause us to lose revenue, damage our reputation, | |
and expose us to financial liability. | |
3 | |
Table of Contents | |
• Interruptions or performance problems, and our reliance on technologies from third parties, may adversely affect our business operations and financial | |
results. | |
• Incorrect implementation or use of our software could negatively affect our business, operations, financial results, and growth prospects. | |
• Failure to protect our proprietary technology and intellectual property rights could substantially harm our business and results of operations. | |
• We could incur substantial costs as a result of any claim of infringement, misappropriation or violation of another party’s intellectual property rights, | |
including as a result of the indemnity provisions in various agreements. | |
• Our use of third-party open source software within our products could negatively affect our ability to sell our products and subject us to litigation. | |
• An investment in our company is subject to tax risks based on our status as a non-U.S. corporation. | |
• Any actual or perceived failure by us to comply with regulations or any other obligations relating to privacy, data protection or information security | |
could adversely affect our business. | |
• Our business is subject to a variety of government and industry regulations, as well as other obligations, including compliance with export control, | |
trade sanctions, anti-bribery, anti-corruption, and anti-money laundering laws. | |
• The market price for our ordinary shares has been and is likely to continue to be volatile. | |
• The concentration of our share ownership with insiders will likely limit your ability to influence corporate matters. | |
• Dutch law and our articles of association include anti-takeover provisions, which may impact the value of our ordinary shares. | |
• Claims of U.S. civil liabilities may not be enforceable against us. | |
• If industry or financial analysts do not publish research or reports about our business, or if they issue inaccurate or unfavorable research regarding our | |
ordinary shares, our share price and trading volume could decline. | |
• We have a substantial amount of indebtedness and may not be able to generate sufficient cash to service all of our indebtedness. | |
• Our reputation or business could be negatively impacted by environmental, social, and governance (“ESG”) matters and our reporting of such matters. | |
• We may fail to maintain an effective system of disclosure controls and internal control over financial reporting. | |
4 | |
Table of Contents | |
PART I | |
Item 1. Business | |
Elastic, the Search AI Company, enables our customers to find the answers they need in real time, using all of their data, at scale. The Elastic Search | |
AI Platform (“our platform”), combines the power of search with AI to help companies solve real-time business problems, unlock potential value, and achieve | |
better outcomes. Our platform, available as both a hosted, managed service across public clouds as well as self-managed software, allows our customers to find | |
insights and drive AI and machine learning use cases from large amounts of data. | |
We offer three search-powered solutions – Search, Observability, and Security – that are built on the platform. We help organizations, their employees, | |
and their customers find what they need faster, while keeping mission-critical applications running smoothly, and protecting against cyber threats. | |
As digital transformation drives mission-critical business functions to the cloud, we believe that every company must incorporate search AI | |
capabilities across IT and line-of-business organizations to find the answers that matter from all of its data in real-time and at scale. | |
Our platform is built on the Elastic Stack, a powerful set of software products that ingest data from any source, in any format, and perform search, | |
analysis, and visualization of that data. At the core of the Elastic Stack is Elasticsearch - a highly scalable document store and search engine, and the unified | |
data store for all of our solutions and use cases. Another component of the Elastic Stack is Kibana, which delivers a common user interface across all of our | |
solutions, with powerful drag-and-drop visual analytics, and centralized management of the platform. Our platform also includes the Elasticsearch Relevance | |
Engine™ (“ESRE”), which combines advanced AI with Elastic’s text search to give developers a full suite of sophisticated retrieval algorithms and the ability | |
to integrate with large language models. Our out-of-the-box solutions deliver fast time to value for common use cases and, paired with our developer-centric | |
platform which is extensible and customizable, allow us to innovate fast and differentiate our offerings at every level. | |
We make our platform available as a hosted, managed service across major cloud providers (Amazon Web Services (“AWS”), Google Cloud Platform | |
(“GCP”), and Microsoft Azure) in more than 55 public cloud regions globally. Customers can also deploy our platform across hybrid clouds, public or private | |
clouds, and multi-cloud environments. | |
Our business model is based primarily on a combination of a paid Elastic-managed hosted service offering and paid and free proprietary self-managed | |
software. Our paid offerings for our platform are sold via subscription through resource-based pricing, and all customers and users have access to varying | |
levels of features across all solutions. In Elastic Cloud, our family of cloud-based offerings, we offer various subscription tiers tied to different features. For | |
users who download our software, we make some of the features of our software available free of charge, allowing us to engage with a broad community of | |
developers and practitioners and introduce them to the value of the Elastic Stack. We believe in the importance of an open software development model, and | |
we develop the majority of our software in public repositories as open code under a proprietary license. Unlike some companies, we do not build an enterprise | |
version that is separate from our free distribution. We maintain a single code base across both our self-managed software and Elastic-hosted services. All of | |
these actions help us build a powerful commercial business model that we believe is optimized for product-driven growth. | |
Our customers often significantly expand their usage of our products and services over time. Expansion includes increasing the number of developers | |
and practitioners using our products, increasing the utilization of our products for a particular use case, and utilizing our products to address new use cases. We | |
focus some of our direct sales efforts on encouraging this type of expansion within our customer base, both within as well as across solutions. Because our | |
business model provides access to all solutions with resource-based pricing, we make it easy for customers to expand across use cases. | |
Our business has experienced rapid growth around the world. As of April 30, 2024, we had approximately 21,000 customers compared to | |
approximately 20,200 customers and over 18,600 customers as of April 30, 2023 and 2022, respectively. Our total revenue was $1.267 billion, $1.069 billion, | |
and $862.4 million for the years ended April 30, 2024, 2023 and 2022, respectively, representing year-over-year growth of 19% for the year ended April 30, | |
2024 and 24% for the year ended April 30, 2023. Subscriptions accounted for 93%, 92% and 93% of our total revenue for the years ended April 30, 2024, 2023 | |
and 2022, respectively. Revenue from outside the United States accounted for 42%, 41% and 44% of our total revenue for the years ended April 30, 2024, 2023 | |
and 2022, respectively. | |
While we recorded net income of $61.7 million for the year ended April 30, 2024, we incurred net losses of $236.2 million and $203.8 million for the | |
years ended April 30, 2023 and 2022, respectively. Although we recorded net income for the year ended April 30, 2024, we expect to incur net losses for the | |
foreseeable future. Our net cash provided by operating activities was $148.8 million, $35.7 million, and $5.7 million for the years ended April 30, 2024, 2023, | |
and 2022 respectively. | |
5 | |
Table of Contents | |
Our Products | |
Our products enable our customers and users to find relevant information and insights nearly instantly in large amounts of data across a broad range of | |
business and consumer use cases. | |
We offer the Elastic Stack, a powerful set of software products that ingest and store data from any source, in any format, and perform search, analysis, | |
and visualization, usually in milliseconds. The Elastic Stack can be used by developers and IT decision makers to power a variety of use cases. We also offer | |
software solutions built in the Elastic Stack that address a wide variety of use cases. The Elastic Stack and our solutions are designed to run in public or private | |
clouds, in hybrid environments, or in multi-cloud environments. | |
The Elastic Stack | |
The Elastic Stack is primarily composed of the following products: | |
• Elasticsearch. Elasticsearch is the heart of the Elastic Stack. It is a distributed, real-time vector search and analytics engine and data store for all | |
types of data, including textual, numerical, geospatial, structured, and unstructured. | |
• Kibana. Kibana is the user interface for the Elastic Stack. It is the visualization layer for data stored in Elasticsearch. It is also the management | |
and configuration interface for all parts of the Elastic Stack. | |
Elastic has spent years infusing both Elasticsearch and Kibana with a foundation of AI and machine learning built on ESRE, from support for external | |
machine learning models to native vector search capabilities, supervised and unsupervised machine learning, and solution capabilities that improve search | |
relevance and identify anomalies. Elastic enables organizations to integrate generative AI and large language models by building key capabilities into its | |
products. | |
The Elastic Stack also supports data ingest with a number of products: | |
• Elastic Agent. Elastic Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to each host. Elastic Agent | |
includes integrated host protection and central management. | |
• Beats. Beats is the family of lightweight, single-purpose data shippers for sending data from edge machines to Elasticsearch or Logstash. | |
• Logstash. Logstash is the dynamic data processing pipeline for ingesting data into Elasticsearch or other storage systems from a multitude of | |
sources simultaneously. | |
Paid proprietary features in the Elastic Stack enable capabilities such as automating anomaly detection on time series data at scale through machine | |
learning, facilitating compliance with data security and privacy regulations, supporting search across low cost cold and frozen data tiers, and allowing real-time | |
notifications and alerts. The source code of features in the Elastic Stack is generally visible to the public in the form of “open code. | |
” | |
Our Solutions | |
We have built a number of solutions into the Elastic Stack to make it easier for organizations to use our software for common use cases. Our solutions | |
include the following: | |
• Search. Our Search solution provides a powerful search platform for building search AI applications. Key use cases for Search include: | |
generative AI and retrieval-augmented generation, search applications, and foundational capabilities for building search experiences to support | |
websites and portals, e-commerce, mobile app search, customer support, and workplace search. | |
• Observability. Our Observability solution enables unified analysis across the IT ecosystem of applications, networks, and infrastructure. | |
Observability includes: Logs, to search and analyze petabytes of structured and unstructured logs; Metrics, to search and analyze numeric and | |
time series data; Application Performance Monitoring (“APM”), to deliver insight into application performance and health metrics and provide | |
developers with confidence in their code; and Synthetic Monitoring, to proactively monitor the availability and functionality of user journeys. | |
• Security. Our Security solution provides unified protection to prevent, detect, and respond to threats. Our AI-driven security analytics solution | |
includes: Security Information and Event Management (“SIEM”), with integrations to network, host, user, and cloud data sources, as well as | |
workflow and operations, shareable analytics, incident management, and investigations; extended protection with both third party integrations as | |
well as first party protections for both Endpoint Security (prevention, detection, and response) and Cloud Security (cloud posture assessment, | |
vulnerability management, and cloud workload protection). | |
6 | |
""", | |
"filename": "elastic-10q-nov-2024-pt1.pdf", | |
"doc_type": "summary" | |
} | |
############################################################ | |
## FYI - You will NOT see chunks or vectors | |
GET new_semantic_text/_doc/123 | |
############################################################ | |
## Semantic Query | |
GET new_semantic_text/_search | |
{ | |
"query": { | |
"semantic": { | |
"field": "text.semantic", | |
"query": "how do users easily use elastic" | |
} | |
} | |
} | |
## Return "chunks" - matching section of text | |
GET new_semantic_text/_search | |
{ | |
"query": { | |
"semantic": { | |
"field": "text.semantic", | |
"query": "how do users easily use elastic" | |
} | |
}, | |
"highlight": { | |
"fields": { | |
"text.semantic": { | |
"order": "score", | |
"number_of_fragments": 1 | |
} | |
} | |
}, | |
"_source": "false" | |
} | |
############################################################ | |
## Perform BM25 query - using a match query as example | |
GET new_semantic_text/_search | |
{ | |
"query": { | |
"match": { | |
"text": "how do users easily use elastic" | |
} | |
} | |
} | |
## Can highlight with BM25 (Not New) | |
GET new_semantic_text/_search | |
{ | |
"query": { | |
"match": { | |
"text": "how do users easily use elastic" | |
} | |
}, | |
"highlight": { | |
"fields": { | |
"text": { | |
"order": "score", | |
"number_of_fragments": 1 | |
} | |
} | |
}, | |
"_source": "false" | |
} | |
############################################################ | |
## `match` query has a new trick! | |
# https://www.elastic.co/search-labs/blog/semantic-search-match-knn-sparse-vector | |
GET new_semantic_text/_search | |
{ | |
"query": { | |
"match": { | |
"text.semantic": "how do users easily use elastic" | |
} | |
}, | |
"highlight": { | |
"fields": { | |
"text.semantic": { | |
"order": "score", | |
"number_of_fragments": 1 | |
} | |
} | |
}, | |
"_source": "false" | |
} | |
############################################################ | |
GET bonk/_search | |
{ | |
"query": { | |
"semantic": { | |
"field": "text", | |
"query": "how do users easily use elastic" | |
} | |
} | |
} | |
GET bonk/_search | |
{ | |
"query": { | |
"match": { | |
"text": "how do users easily use elastic" | |
} | |
} | |
} | |
GET _inference | |
# new semantic_text with multi-field | |
# GET elastic_lm_docs-jeff03/_mapping | |
# full text - text | |
# semantic - semantic_text | |
DELETE new_semantic_text | |
PUT new_semantic_text | |
{ | |
"mappings": { | |
"properties": { | |
"text": { | |
"type": "text", | |
"fields": { | |
"semantic": { | |
"type": "semantic_text" | |
} | |
} | |
} | |
} | |
} | |
} | |
PUT bonk | |
{ | |
"mappings": { | |
"properties": { | |
"text": { | |
"type": "semantic_text" | |
} | |
} | |
} | |
} | |
POST _reindex?wait_for_completion=false | |
{ | |
"source": { | |
"index": "elastic_lm_docs-jeff03", | |
"_source": ["text"] | |
}, | |
"dest": { | |
"index": "new_semantic_text" | |
} | |
} | |
GET new_semantic_text/_count | |
GET new_semantic_text/_search | |
GET new_semantic_text/_search | |
{ | |
"query": { | |
"exists": { | |
"field": "text" | |
} | |
}, | |
"fields": [ | |
"_inference_fields" | |
] | |
} | |
############################################################################################################ | |
############################################################################################################ | |
############################################################################################################ | |
############################################################################################################ | |
GET _inference | |
DELETE elastic-labs-main | |
#PUT elastic_blogs-small_sample | |
PUT elastic-labs-main | |
{ | |
"settings": { | |
"index": { | |
"default_pipeline": "parse_author_and_publish_date" | |
} | |
}, | |
"mappings": { | |
"properties": { | |
"body": { | |
"type": "text" | |
}, | |
"headings": { | |
"type": "text" | |
}, | |
"id": { | |
"type": "keyword" | |
}, | |
"meta_description": { | |
"type": "text" | |
}, | |
"title": { | |
"type": "text" | |
}, | |
"publish_date": { | |
"type": "date" | |
}, | |
"last_crawled_at": { | |
"type": "date" | |
}, | |
"url_host": { | |
"type": "keyword" | |
}, | |
"url_path_dir1": { | |
"type": "keyword" | |
}, | |
"url_path_dir2": { | |
"type": "keyword" | |
}, | |
"url_path": { | |
"type": "keyword" | |
} | |
} | |
} | |
} | |
GET elastic-labs-main/_search | |
{ | |
"query": { | |
"term": { | |
"publish_date": "1970-01-01T00:00:00.000Z" | |
} | |
} | |
} | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM elastic-labs-main | |
| WHERE publish_date == "1970-01-01T00:00:00.000Z" | |
| KEEP title | |
""" | |
} | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM elastic-labs-main | |
| WHERE publish_date == '1970-01-01T00:00:00.000' | |
| KEEP title | |
""" | |
} | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM elastic-labs-main | |
| EVAL combined = concat(url_path_dir1, "/", url_path_dir2) | |
| STATS count = count() BY combined | |
| KEEP combined, count | |
| SORT count DESC | |
""" | |
} | |
GET elastic-labs-main/_search | |
{ | |
"size": 0, | |
"aggs": { | |
"NAME": { | |
"terms": { | |
"field": "url_path_dir1" | |
} | |
} | |
} | |
} | |
GET elastic-labs-main/_search | |
GET elastic-labs-main/_search | |
{ | |
"query": { | |
"bool": { | |
"must": [ | |
{ | |
"match": { | |
"url_path_dir1": "security-labs" | |
} | |
} | |
] | |
} | |
}, | |
"_source": false, | |
"fields": [ | |
"first_author", | |
"publish_date", | |
"title", | |
"url.keyword" | |
], | |
"aggs": { | |
"NAME": { | |
"terms": { | |
"field": "url_path_dir1" | |
} | |
} | |
} | |
} | |
GET elastic-labs-main/_search | |
{ | |
"query": { | |
"bool": { | |
"must": [ | |
{ | |
"match": { | |
"publish_date": "1970-01-01T00:00:00.000Z" | |
} | |
} | |
], | |
"must_not": [ | |
{ | |
"match": { | |
"url_path_dir2": "category" | |
} | |
} | |
] | |
} | |
}, | |
"_source": true, | |
"fields": [ | |
"first_author", | |
"publish_date", | |
"title", | |
"url.keyword", | |
"ingest_error" | |
], | |
"aggs": { | |
"NAME": { | |
"terms": { | |
"field": "url_path_dir1" | |
} | |
} | |
} | |
} | |
GET elastic-labs-main/_search | |
{ | |
"size": 0, | |
"aggs": { | |
"dir1": { | |
"terms": { | |
"field": "url_path_dir1" | |
} | |
}, | |
"dir2": { | |
"terms": { | |
"field": "url_path_dir2" | |
} | |
} | |
} | |
} | |
GET elastic-labs-main/_search | |
{ | |
"size": 0, | |
"query": { | |
"bool": { | |
"must": [ | |
{ | |
"match": { | |
"url_path_dir1": "observability-labs" | |
} | |
} | |
] | |
} | |
}, | |
"aggs": { | |
"NAME": { | |
"terms": { | |
"field": "url_path_dir2" | |
} | |
} | |
} | |
} | |
GET elastic-labs-main/_search | |
{ | |
"query": { | |
"bool": { | |
"must": [ | |
{ | |
"match": { | |
"url_path_dir2": "topics" | |
} | |
} | |
] | |
} | |
} | |
} | |
GET elastic-labs-main/_mapping | |
#new | |
GET _ingest/pipeline/parse_author_and_publish_date | |
PUT _ingest/pipeline/parse_author_and_publish_date | |
{ | |
"processors": [ | |
{ | |
"script": { | |
"source": """ | |
// If raw_author is null or empty array, set default unknown | |
if (ctx.raw_author == null || ctx.raw_author.isEmpty()) { | |
ctx.list_of_authors = []; | |
ctx.first_author = 'Unknown'; | |
} else { | |
// raw_author is already an array from crawler | |
ctx.list_of_authors = ctx.raw_author; | |
ctx.first_author = ctx.raw_author[0]; | |
} | |
""", | |
"if": "ctx.containsKey('raw_author')" | |
} | |
}, | |
{ | |
"date": { | |
"field": "raw_publish_date", | |
"target_field": "publish_date", | |
"formats": [ | |
"MMMM dd, সরিয়ে", | |
"MMMM d, সরিয়ে", | |
"d MMMM yyyy", | |
"MMMM d, yyyy", | |
"yyyy-MM-dd'T'HH:mm:ssXXX", | |
"yyyy-MM-dd'T'HH:mm:ssZ", | |
"yyyy-MM-dd" | |
], | |
"timezone": "UTC", | |
"if": "ctx.containsKey('raw_publish_date') && ctx.raw_publish_date != ''", | |
"on_failure": [ | |
{ | |
"set": { | |
"field": "ingest_error", | |
"value": "Date parsing failed: {{ _ingest.on_failure_message }} for field raw_publish_date with value {{ctx.raw_publish_date}}", | |
"override": false | |
} | |
} | |
] | |
} | |
}, | |
{ | |
"set": { | |
"field": "publish_date", | |
"value": "1970-01-01T00:00:00Z", | |
"if": "ctx.publish_date == null" | |
} | |
}, | |
{ | |
"remove": { | |
"field": "raw_publish_date", | |
"if": "ctx.containsKey('raw_publish_date')", | |
"ignore_missing": true | |
} | |
}, | |
{ | |
"remove": { | |
"field": "raw_author", | |
"if": "ctx.containsKey('raw_author')", | |
"ignore_missing": true | |
} | |
} | |
] | |
} | |
# _reindex elastic_blogs-small_sample - 20/site | |
DELETE elastic_blogs-small_sample | |
// Reindex Search Labs | |
POST _reindex | |
{ | |
"source": { | |
"index": "elastic-labs-main", | |
"query": { | |
"match": { | |
"url_path_dir1": "search-labs" | |
} | |
}, | |
"size": 20 | |
}, | |
"dest": { | |
"index": "elastic_blogs-small_sample" | |
} | |
} | |
// Reindex Observability Labs | |
POST _reindex | |
{ | |
"source": { | |
"index": "elastic-labs-main", | |
"query": { | |
"match": { | |
"url_path_dir1": "observability-labs" | |
} | |
}, | |
"size": 20 | |
}, | |
"dest": { | |
"index": "elastic_blogs-small_sample" | |
} | |
} | |
// Reindex Security Labs | |
POST _reindex | |
{ | |
"source": { | |
"index": "elastic-labs-main", | |
"query": { | |
"match": { | |
"url_path_dir1": "security-labs" | |
} | |
}, | |
"size": 20 | |
}, | |
"dest": { | |
"index": "elastic_blogs-small_sample" | |
} | |
} | |
GET elastic_blogs-small_sample/_search | |
GET _license | |
GET _inference/ | |
GET airbnb-20241218-chicago-reviews/_mapping | |
GET airbnb-20241218-chicago-listings/_mapping | |
#DELETE airbnb-20241218-chicago-listings | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM airbnb-20241218-chicago-reviews METADATA _score | |
| WHERE comments_semantic: "a place with easy street parking" | |
| SORT _score DESC | |
| LIMIT 10 | |
| KEEP listing_id, comments | |
| LOOKUP JOIN airbnb-20241218-chicago-listings ON listing_id | |
| KEEP listing_id, name, neighbourhood, comments | |
""" | |
} | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM airbnb-20241218-chicago-reviews | |
| WHERE comments_semantic: "a place with easy street parking" | |
| LIMIT 10 | |
| KEEP listing_id, comments | |
| LOOKUP JOIN airbnb-20241218-chicago-listings ON listing_id | |
""" | |
} | |
PUT airbnb-20241218-chicago-reviews/_settings | |
{ | |
"index": { | |
"mode": "lookup" | |
} | |
} | |
PUT airbnb-20241218-chicago-reviews/_settings?reopen=true | |
{ | |
"index": { | |
"mode": "lookup" | |
} | |
} | |
POST /airbnb-20241218-chicago-reviews/_close | |
GET _cat/indices | |
#DELETE airbnb-20241218-chicago-reviews | |
POST _inference/chat_completion/openai-completion/_stream | |
{ | |
"model": "gpt-4o", | |
"messages": [ | |
{ | |
"role": "user", | |
"content": "What is Elastic?" | |
} | |
] | |
} | |
PUT _inference/chat_completion/azure_openai_gpt-4o-chat | |
{ | |
"service": "azureopenai", | |
"service_settings": { | |
"api_key": "18439f806cc740588906e8e3296dfbbc", | |
"resource_name": "sa-openai", | |
"deployment_id": "gpt-4o", | |
"api_version": "2024-08-01-preview" | |
} | |
} | |
#curl -X POST "https://litellm-proxy-service-1059491012611.us-central1.run.app/v1/chat/completions" \ | |
#-H "Authorization: Bearer sk-u4cXLiDPvYJkQ-Q8VEjVIQ" \ | |
#-H "Content-Type: application/json" \ | |
#-d '{ | |
# "model": "gpt-4o", | |
# "messages": [ | |
# { | |
# "role": "user", | |
# "content": "this is a test request, write a short poem" | |
# } | |
# ] | |
#}' | |
DELETE _inference/openai_test | |
PUT _inference/chat_completion/openai_chat_completion | |
{ | |
"service": "openai", | |
"service_settings": { | |
"api_key": "sk-u4cXLiDPvYJkQ-Q8VEjVIQ", | |
"model_id": "gpt-4o", | |
"url": "https://litellm-proxy-service-1059491012611.us-central1.run.app/v1/chat/completions" | |
} | |
} | |
POST _inference/chat_completion/openai_chat_completion/_stream | |
{ | |
"model": "gpt-4o", | |
"messages": [ | |
{ | |
"role": "user", | |
"content": "What is Elastic?" | |
} | |
] | |
} | |
POST _inference/completion/azure_openai_gpt-4o/_stream | |
{ | |
"input": "What is Elastic?" | |
} | |
POST _inference/chat_completion/azure_openai_gpt-4o/_stream | |
{ | |
"model": "gpt-4o", | |
"messages": [ | |
{ | |
"role": "user", | |
"content": "What is Elastic?" | |
} | |
] | |
} | |
GET elastic_lm_docs-jeff03/_doc/e5661f706f317af209b9cacfc728581a | |
GET elastic_lm_docs-jeff03/_search | |
{ | |
"query": { | |
"semantic": { | |
"field": "semantic_text", | |
"query": "total revenue" | |
} | |
} | |
} | |
GET elastic_lm_docs-jeff03/_search | |
{ | |
"fields": [ | |
"semantic_text", | |
"text", | |
"text_as_html", | |
"_inference_fields" | |
], | |
"_source": false, | |
"highlight": { | |
"fields": { | |
"semantic_text": { | |
"number_of_fragments": 2, | |
"order": "score" | |
} | |
} | |
} | |
} | |
GET elastic_lm_docs-jeff03/_mapping | |
# full text - text | |
# semantic - semantic_text | |
#"analyzer": "english" | |
PUT _inference/sparse_embedding/.my-elser-endpoint | |
{ | |
"service": "elser", | |
"service_settings": { | |
"adaptive_allocations": { | |
"enabled": true, | |
"min_number_of_allocations": 3, | |
"max_number_of_allocations": 10 | |
}, | |
"num_threads": 1 | |
} | |
} | |
PUT _ingest/pipeline/elser-ingest-pipeline | |
{ | |
"description": "Ingest pipeline for ELSER", | |
"processors": [ | |
{ | |
"inference": { | |
"model_id": ".elser_model_2", | |
"input_output": [ | |
{ | |
"input_field": "plot", | |
"output_field": "plot_embedding" | |
} | |
] | |
} | |
} | |
] | |
} | |
GET _inference/my-elser-endpoint | |
PUT _inference/sparse_embedding/my-elser-endpoint | |
{ | |
"service": "elasticsearch", | |
"service_settings": { | |
"adaptive_allocations": { | |
"enabled": true, | |
"min_number_of_allocations": 1, | |
"max_number_of_allocations": 4 | |
}, | |
"num_threads": 1, | |
"model_id": ".elser_model_2" | |
} | |
} | |
DELETE _inference/<inference_id> | |
PUT _in | |
GET _cat/indices | |
GET elastic_lm_users/_search | |
GET elastic_lm_docs-jeff10/_mapping | |
GET elastic_lm_docs-jeff12/_search | |
{ | |
"retriever": { | |
"standard": { | |
"query": { | |
"bool": { | |
"should": [ | |
{ | |
"semantic": { | |
"field": "semantic_text", | |
"query": "Total current liabilities in April 2024" | |
} | |
} | |
] | |
} | |
} | |
} | |
} | |
} | |
GET elastic_lm_docs-jeff07/_search | |
{ | |
"retriever": { | |
"standard": { | |
"query": { | |
"bool": { | |
"should": [ | |
{ | |
"semantic": { | |
"field": "semantic_text", | |
"query": "long-term debt in October" | |
} | |
} | |
], | |
"filter": [] | |
} | |
} | |
} | |
} | |
} | |
GET elastic_lm_docs-jeff04/_search | |
{ | |
"bool": { | |
"should": [ | |
{ | |
"semantic": { | |
"field": "semantic_text", | |
"query": "what are flops?" | |
} | |
} | |
], | |
"filter": [ | |
{ | |
"terms": { | |
"file_name": [ | |
"ELSER paper - SIGIR_2024.pdf" | |
] | |
} | |
} | |
] | |
} | |
} | |
PUT _inference/completion/openai_chat_completions_test | |
{ | |
"service": "openai", | |
"service_settings": { | |
"api_key": "sk-kT5mND03mBgUyG3mLDnbwg", | |
"model_id": "gpt-4o", | |
"url": "https://litellm-proxy-service-1059491012611.us-central1.run.app/v1/chat/completions" | |
} | |
} | |
PUT _inference/completion/openai_chat_completions_test | |
{ | |
"service": "openai", | |
"service_settings": { | |
"api_key": "sk-ta06K7hSiSuGDW0-xFh0KQ", | |
"model_id": "gpt-4o-westus", | |
"url": "https://litellm-proxy-service-1059491012611.us-central1.run.app/v1/chat/completions" | |
} | |
} | |
POST _inference/completion/openai_chat_completions_test | |
{ | |
"input": "How many male malard ducks fit in an american football field?" | |
} | |
get nothing/ | |
{ | |
"steps": [ | |
{ | |
"action": "query_elasticsearch", | |
"args": { | |
"query": { | |
"retriever": { | |
"standard": { | |
"query": { | |
"bool": { | |
"should": [ | |
{ | |
"semantic": { | |
"field": "semantic_text", | |
"query": "long-term debt in April" | |
} | |
} | |
] | |
} | |
} | |
} | |
} | |
} | |
}, | |
"description": "Query Elasticsearch for long-term debt information relevant to April." | |
}, | |
{ | |
"action": "query_elasticsearch", | |
"args": { | |
"query": { | |
"retriever": { | |
"standard": { | |
"query": { | |
"bool": { | |
"should": [ | |
{ | |
"semantic": { | |
"field": "semantic_text", | |
"query": "long-term debt in October" | |
} | |
} | |
] | |
} | |
} | |
} | |
} | |
} | |
}, | |
"description": "Query Elasticsearch for long-term debt information relevant to October." | |
}, | |
{ | |
"action": "do_math", | |
"args": { | |
"operation": "subtract", | |
"operands": [ | |
"debt_value_in_october", | |
"debt_value_in_april" | |
] | |
}, | |
"description": "Calculate the difference in long-term debt between April and October." | |
}, | |
{ | |
"action": "done", | |
"args": {}, | |
"description": "Finalize the plan." | |
} | |
] | |
} | |
################## ES|QL Search | |
// ingest pipeline to copy the field | |
PUT _ingest/pipeline/demo | |
{ | |
"processors": [ | |
{ | |
"set": { | |
"field": "semantic_title", | |
"value": "{{{title}}}" | |
} | |
} | |
] | |
} | |
#DELETE search-movies | |
POST search-movies/_doc/2?pipeline=demo | |
{ | |
"title":"tacos are tasty" | |
} | |
POST search-movies/_doc/1?pipeline=demo | |
{"Title":"The Godfather","Rated":"R","Plot":"The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son.","Awards":"Won 3 Oscars. Another 29 wins & 19 nominations.","Poster":"http://ia.media-imdb.com/images/M/MV5BMjEyMjcyNDI4MF5BMl5BanBnXkFtZTcwMDA5Mzg3OA@@._V1_SX300.jpg","imdbID":"tt0068646","Type":"movie","Response":"True","PosterS3":"https://s3-eu-west-1.amazonaws.com/imdbimages/images/MV5BMjEyMjcyNDI4MF5BMl5BanBnXkFtZTcwMDA5Mzg3OA@@._V1_SX300.jpg","id":"tt0068646","language":["English","Italian","Latin"],"genres":["Crime","Drama"],"actors":["Marlon Brando","Al Pacino","James Caan","Richard S. Castellano"],"country":["USA"],"directors":["Francis Ford Coppola"],"writers":["Mario Puzo ","Francis Ford Coppola ","Mario Puzo"],"year":1972,"metascore":100,"runtime":175,"imdbRating":9.2,"imdbVotes":807926,"released":"1972-03-23T23:00:00.000Z"} | |
GET _inference/.elser-2-elasticsearch | |
PUT search-movies | |
{ | |
"mappings": { | |
"properties": { | |
"type": { | |
"type": "keyword" | |
}, | |
"title": { | |
"type": "text" | |
}, | |
"semantic_title": { | |
"type": "semantic_text", | |
"inference_id": ".elser-2-elasticsearch" | |
}, | |
"year": { | |
"type": "integer" | |
}, | |
"rated": { | |
"type": "keyword" | |
}, | |
"released": { | |
"type": "date" | |
}, | |
"genres": { | |
"type": "text", | |
"fields": { | |
"keyword": { | |
"type": "keyword" | |
} | |
} | |
}, | |
"directors": { | |
"type": "text", | |
"fields": { | |
"keyword": { | |
"type": "keyword" | |
} | |
} | |
}, | |
"writers": { | |
"type": "text", | |
"fields": { | |
"keyword": { | |
"type": "keyword" | |
} | |
} | |
}, | |
"actors": { | |
"type": "text", | |
"fields": { | |
"keyword": { | |
"type": "keyword" | |
} | |
} | |
}, | |
"countries": { | |
"type": "text", | |
"fields": { | |
"keyword": { | |
"type": "keyword" | |
} | |
} | |
}, | |
"plot": { | |
"type": "text" | |
}, | |
"poster": { | |
"type": "keyword" | |
}, | |
"id": { | |
"type": "keyword" | |
}, | |
"metascore": { | |
"type": "integer" | |
}, | |
"imdbrating": { | |
"type": "float" | |
} | |
} | |
} | |
} | |
POST _inference/sparse_embedding/.elser-2-elasticsearch | |
{ | |
"input": "The sky above the port was the color of television tuned to a dead channel." | |
} | |
GET search-movies/_mapping | |
GET search-movies/_search | |
GET search-movies/_search | |
{ | |
"query": { | |
"semantic": { | |
"field": "semantic_title", | |
"query": "tacos" | |
} | |
} | |
} | |
GET search-movies/_search | |
{ | |
"query": { | |
"match_all": {} | |
}, | |
"fields": [ "_inference_fields" ] | |
} | |
##################### | |
###### es|ql | |
// First let's see how we get back semantic_text fields | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM search-movies | |
| KEEP title, semantic_title | |
| LIMIT 10 | |
""" | |
} | |
// semantic text fields are string fields for ES|QL | |
// can be used with ES|QL functions/commands that already accept keyword/text | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM search-movies | |
| EVAL len = length(semantic_title) | |
| EVAL semantic_title = to_upper(semantic_title) | |
| KEEP semantic_title, len | |
| SORT len DESC | |
""" | |
} | |
// apply wildcard regex on semantic text fields | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM search-movies | |
| WHERE semantic_title LIKE "Harry Potter*" | |
| KEEP semantic_title | |
""" | |
} | |
// Works with match! | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM search-movies METADATA _score | |
| WHERE match(semantic_title, "Shakespeare") | |
| SORT _score DESC | |
| KEEP title, semantic_title, _score | |
| LIMIT 10 | |
""" | |
} | |
// Works with : as well | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM search-movies METADATA _score | |
| WHERE semantic_title:"Shakespeare" | |
| SORT _score DESC | |
| KEEP title, semantic_title, _score | |
| LIMIT 10 | |
""" | |
} | |
// compare to BM25 results | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM search-movies METADATA _score | |
| WHERE title:"Shakespeare" | |
| SORT _score DESC | |
| KEEP title, _score | |
| LIMIT 10 | |
""" | |
} | |
// can be combined with BM25 queries | |
// with : we have a nice way to filter on multi valued fields | |
POST _query?format=txt | |
{ | |
"query": """ | |
FROM search-movies METADATA _score | |
| WHERE semantic_title:"Shakespeare" AND genres:"romance" | |
| SORT _score DESC | |
| KEEP title, _score, genres | |
| LIMIT 10 | |
""" | |
} | |
###### | |
PUT my-semantic-index | |
{ | |
"mappings": { | |
"properties": { | |
"inference_field": { | |
"type": "semantic_text" | |
} | |
} | |
} | |
} | |
PUT my-semantic-index/_doc/1 | |
{ | |
"inference_field": "pluto is a planet" | |
} | |
PUT my-semantic-index/_doc/2 | |
{ | |
"inference_field": "the moon is made of green cheese" | |
} | |
PUT my-semantic-index/_doc/3 | |
{ | |
"inference_field": "the earth is not flat" | |
} | |
PUT my-semantic-index/_doc/4 | |
{ | |
"inference_field": "mark watney grew potatoes on mars" | |
} | |
PUT my-semantic-index/_doc/5 | |
{ | |
"inference_field": "an asteroid killed the dinosaurs" | |
} | |
GET my-semantic-index/_search | |
{ | |
"query": { | |
"sparse_vector": { | |
"field": "inference_field", | |
"query": "is pluto a planet", | |
"prune": false | |
} | |
} | |
} | |
GET my-semantic-index/_search | |
{ | |
"query": { | |
"sparse_vector": { | |
"field": "inference_field", | |
"query": "is pluto a planet", | |
"prune": true, | |
"pruning_config": { | |
"tokens_weight_threshold": 0.9, | |
"tokens_freq_ratio_threshold": 1 | |
} | |
} | |
} | |
} | |
###### | |
POST _inference/chat_completion/.rainbow-sprinkles-elastic/_stream | |
{ | |
"messages": [ | |
{ | |
"role": "user", | |
"content": "Say yes if it works" | |
} | |
], | |
"temperature": 0.7, | |
"max_completion_tokens": 300 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment