rjurney · January 31, 2025 17:27
diff --git a/comment.txt b/comment.txt
 You can see it linked many topics that are related - Apache Phoenix - but not actually mentioned in the text...
diff --git a/hello_relik.py b/hello_relik.py
 from relik import Relik
 from relik.inference.data.objects import RelikOutput

 relik = Relik.from_pretrained("sapienzanlp/relik-entity-linking-large")
 relik_out: RelikOutput = relik("Russell Jurney is researching the possibility of an FPGA accelerated GraphFrames. GraphFrames is powered by Apache Spark SQL.")

 # Take a look!
 relik_out.to_dict()
diff --git a/output.py b/output.py
 {'text': 'Russell Jurney is researching the possibility of an FPGA accelerated GraphFrames. GraphFrames is powered by Apache Spark SQL.',
 'tokens': ['Russell',
  'Jurney',
  'is',
  'researching',
  'the',
  'possibility',
  'of',
  'an',
  'FPGA',
  'accelerated',
  'GraphFrames',
  '.',
  'GraphFrames',
  'is',
  'powered',
  'by',
  'Apache',
  'Spark',
  'SQL',
  '.'],
 'spans': [],
 'triplets': [],
 'candidates': {'span': [[[{'text': 'Field-programmable gate array',
      'id': 4074039,
      'metadata': {'definition': 'A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an Application-Specific Integrated Circuit (ASIC). Circuit diagrams were previously used to specify the configuration, but this is increasingly rare due to the advent of electronic design automation tools.  FPGAs contain an array of programmable logic blocks, and a hierarchy of "reconfigurable interconnects" that allow the blocks to be "wired together", like many logic gates that can be inter-wired in different configurations. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. Many FPGAs can be reprogrammed to implement different logic functions, allowing flexible reconfigurable computing as performed in computer software.'}},
     {'text': 'Apache Spark',
      'id': 5253974,
      'metadata': {'definition': "Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since."}},
     {'text': 'SPARK (programming language)',
      'id': 1627746,
      'metadata': {'definition': 'SPARK is a formally defined computer programming language based on the Ada programming language, intended for the development of high integrity software used in systems where predictable and highly reliable operation is essential. It facilitates the development of applications that demand safety, security, or business integrity.  Originally, there were three versions of the SPARK language (SPARK83, SPARK95, SPARK2005) based on Ada 83, Ada 95 and Ada 2005 respectively.  A fourth version of the SPARK language, SPARK 2014, based on Ada 2012, was released on April 30, 2014. SPARK 2014 is a complete re-design of the language and supporting verification tools.  The SPARK language consists of a well-defined subset of the Ada language that uses contracts to describe the specification of components in a form that is suitable for both static and dynamic verification.  In SPARK83/95/2005, the contracts are encoded in Ada comments (and so are ignored by any standard Ada compiler), but are processed by the SPARK "Examiner" and its associated tools.  SPARK 2014, in contrast, uses Ada 2012\'s built-in "aspect" syntax to express contracts, bringing them into the core of the language. The main tool for SPARK 2014 (GNATprove) is based on the GNAT/GCC infrastructure, and re-uses almost the entirety of the GNAT Ada 2012 front-end.'}},
     {'text': 'Spark (software)',
      'id': 4859272,
      'metadata': {'definition': 'Spark is a free and open-source software web application framework and domain-specific language written in Java. It is an alternative to other Java web application frameworks such as JAX-RS, Play framework and Spring MVC. It runs on an embedded Jetty web server by default, but can be configured to run on other webservers.  Inspired by Sinatra, it does not follow the model–view–controller pattern used in other frameworks, such as Spring MVC. Instead, Spark is intended for "quickly creating web-applications in Java with minimal effort."  Spark was created and open-sourced in 2011 by Per Wendel, and was completely rewritten for version 2 in 2014. The rewrite was hugely centered on the Java 8 lambda philosophy, so Java 7 is officially not supported in version 2 and above.'}},
     {'text': 'PostgreSQL',
      'id': 2163464,
      'metadata': {'definition': 'PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and technical standards compliance. It is designed to handle a range of workloads, from single machines to data warehouses or Web services with many concurrent users. It is the default database for macOS Server, and is also available for Linux, FreeBSD, OpenBSD, and Windows.  PostgreSQL features transactions with Atomicity, Consistency, Isolation, Durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. PostgreSQL is developed by the PostgreSQL Global Development Group, a diverse group of many companies and individual contributors.'}},
     {'text': 'Sparksee (graph database)',
      'id': 5184411,
      'metadata': {'definition': 'Sparksee (formerly known as DEX) is a high-performance and scalable graph database management system written in C++.  Its development started in 2006 and its first version was available on Q3 - 2008. The fourth version is available since Q3-2010. There is a free community version, for academic or evaluation purposes, available to download, limited to 1 million nodes, no limit on edges.  Sparksee is a product originated by the research carried out at DAMA-UPC (Data Management group at the Polytechnic University of Catalonia). On March 2010 a spin-off called Sparsity-Technologies has been created at the UPC to commercialize and give services to the technologies developed at DAMA-UPC.  DEX changed name to Sparksee on its 5th release on February 2014.'}},
     {'text': 'SQL',
      'id': 4137050,
      'metadata': {'definition': 'SQL ( "S-Q-L", "sequel"; Structured Query Language) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data where there are relations between different entities/variables of the data. SQL offers two main advantages over older read/write APIs like ISAM or VSAM. First, it introduced the concept of accessing many records with one single command; and second, it eliminates the need to specify "how" to reach a record, e.g. with or without an index.  Originally based upon relational algebra and tuple relational calculus, SQL consists of many types of statements, which may be informally classed as sublanguages, commonly: a data query language (DQL), a data definition language (DDL), a data control language (DCL), and a data manipulation language (DML). The scope of SQL includes data query, data manipulation (insert, update and delete), data definition (schema creation and modification), and data access control. Although SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes procedural elements.  SQL was one of the first commercial languages for Edgar F. Codd\'s relational model. The model was described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks". Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language.  SQL became a standard of the American National'}},
     {'text': 'Apache Phoenix',
      'id': 5390832,
      'metadata': {'definition': 'Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native noSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of noSQL stores.'}},
     {'text': 'Russ Nelson',
      'id': 625297,
      'metadata': {'definition': 'Russell "Russ" Nelson (born March 21, 1958) is an American computer programmer. He was a founding board member of the Open Source Initiative and briefly served as its president in 2005.'}},
     {'text': 'StreamSQL',
      'id': 2661538,
      'metadata': {'definition': 'StreamSQL is a query language that extends SQL with the ability to process real-time data streams. SQL is primarily intended for manipulating relations (also known as tables), which are finite bags of tuples (rows). StreamSQL adds the ability to manipulate streams, which are infinite sequences of tuples that are not all available at the same time. Because streams are infinite, operations over streams must be monotonic. Queries over streams are generally "continuous", executing for long periods of time and returning incremental results.  The StreamSQL language is typically used in the context of a Data Stream Management System (DSMS), for applications including algorithmic trading, market data analytics, network monitoring, surveillance, e-fraud detection and prevention, clickstream analytics and real-time compliance (anti-money laundering, RegNMS, MiFID).  New Generation of Stream Processing Engines has added support for Stream SQL ( a.k.a. Streaming SQL). Among the examples are Kafka KSQL, SQLStreamBuilder, WSO2 Stream Processor, SQLStreams, [SamzaSQL http://ieeexplore.ieee.org/document/7530060/], and Storm SQL.'}},
     {'text': 'Ada (programming language)',
      'id': 207188,
      'metadata': {'definition': 'Ada is a structured, statically typed, imperative, and object-oriented high-level computer programming language, extended from Pascal and other languages. It has built-in language support for design-by-contract, extremely strong typing, explicit concurrency, tasks, synchronous message passing, protected objects, and non-determinism. Ada improves code safety and maintainability by using the compiler to find errors in favor of runtime errors. Ada is an international standard; the current version (known as Ada 2012) is defined by ISO/IEC 8652:2012.  Ada was originally designed by a team led by French computer scientist Jean Ichbiah of CII Honeywell Bull under contract to the United States Department of Defense (DoD) from 1977 to 1983 to supersede over 450 programming languages used by the DoD at that time. Ada was named after Ada Lovelace (1815–1852), who has been credited as the first computer programmer.'}},
     {'text': 'Spark (XMPP client)',
      'id': 4083973,
      'metadata': {'definition': 'Spark is an open-source instant messaging program (based on XMPP protocol) that allows users to communicate via text in real time. It can be integrated with the Openfire server to provide additional features, such as controlling various parts of Spark functionality from a central management console, or integrating with a customer support service Fastpath, allowing Spark users to log into queues, accept and forward support requests, use canned responses. Being a cross-platform application, it can be run on various systems. Installers for Windows, macOS and Linux are available on the official website.'}},
     {'text': 'Graph database',
      'id': 4840657,
      'metadata': {'definition': 'In computing, a graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the "graph" (or "edge" or "relationship"). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships within a graph database is fast because they are perpetually stored within the database itself. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.  Graph databases are part of the NoSQL databases created to address the limitations of the existing relational databases. While the graph model explicitly lays out the dependencies between nodes of data, the relational model and other NoSQL database models link the data by implicit connections. Graph databases, by design, allow simple and fast retrieval of complex hierarchical structures that are difficult to model in relational systems. Graph databases are similar to 1970s network model databases in that both represent general graphs, but network-model databases operate at a lower level of abstraction and lack easy traversal over a chain of edges.  The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and “store” the graph data in a table (although a table is a'}},
     {'text': 'GraphQL',
      'id': 2027814,
      'metadata': {'definition': "GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. GraphQL was developed internally by Facebook in 2012 before being publicly released in 2015. On 7 November 2018, the GraphQL project was moved from Facebook to the newly-established GraphQL Foundation, hosted by the non-profit Linux Foundation. Since 2012, GraphQL's rise has followed the adoption timeline as set out by Lee Byron, GraphQL's creator, with surprising accuracy. Byron's goal is to make GraphQL omnipresent across web platforms.  It provides an efficient, powerful and flexible approach to developing web APIs, and has been compared and contrasted with REST and other web service architectures. It allows clients to define the structure of the data required, and exactly the same structure of the data is returned from the server, therefore preventing excessively large amounts of data from being returned, but this has implications for how effective web caching of query results can be. The flexibility and richness of the query language also adds complexity that may not be worthwhile for simple APIs.. It consists of a type system, query language and execution semantics, static validation, and type introspection.  GraphQL supports reading, writing (mutating) and subscribing to changes to data (realtime updates).  Major GraphQL clients include Apollo Client and Relay. GraphQL servers are available for multiple languages, including Haskell, JavaScript, Perl, Python, Ruby, Java, C#, Scala, Go, Elixir, Erlang, PHP, R, and Clojure.  On 9 February 2018, the GraphQL Schema Definition Language (SDL) was made part"}},
     {'text': 'FPGA prototyping',
      'id': 5015139,
      'metadata': {'definition': 'Field-programmable gate array prototyping (FPGA prototyping), also referred to as FPGA-based prototyping, ASIC prototyping or system-on-chip (SoC) prototyping, is the method to prototype system-on-chip and application-specific integrated circuit designs on FPGAs for hardware verification and early software development.  Verification methods for hardware design as well as early software and firmware co-design have become mainstream. Prototyping SoC and ASIC designs with one or more FPGAs and electronic design automation (EDA) software has become a good method to do this.'}},
     {'text': 'Particle.io',
      'id': 2529400,
      'metadata': {'definition': 'Particle.io (formerly Spark.io ) is a early stage venture company that produces low cost Internet of things hardware, software, and connectivity. Its website claims 40,000 developers, 170 countries, 8,500 companies, and 500,000 devices. Their product line includes:  In the United States, their products are sold on their online store, Adafruit, Arrow, Digi-key, Mouser, Sparkfun, Premier Farnell, and others.'}},
     {'text': 'Apache Cassandra',
      'id': 3294173,
      'metadata': {'definition': 'Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.'}},
     {'text': 'Arena (software)',
      'id': 3303575,
      'metadata': {'definition': 'Arena is a discrete event simulation and automation software developed by Systems Modeling and acquired by Rockwell Automation in 2000. It uses the SIMAN processor and simulation language. As of Dec 2016, it is in version 15, providing significant enhancements in optimization, animation and inclusion of 64bit operation for modelling processes with \'Big Data\'. It has been suggested that Arena may join other Rockwell software packages under the "FactoryTalk" brand.  In Arena, the user builds an experiment "model" by placing "modules" (boxes of different shapes) that represent processes or logic. Connector lines are used to join these modules together and to specify the flow of "entities". While modules have specific actions relative to entities, flow, and timing, the precise representation of each module and entity relative to real-life objects is subject to the modeler. Statistical data, such as cycle time and WIP (work in process) levels, can be recorded and made output as reports.  Arena can be integrated with Microsoft technologies. It includes Visual Basic for Applications so models can be further automated if specific algorithms are needed. It also supports importing Microsoft Visio flowcharts, as well as reading from or sending output to Excel spreadsheets and Access databases. Hosting ActiveX controls is also supported.'}},
     {'text': 'Adobe Spark',
      'id': 5247773,
      'metadata': {'definition': 'Adobe Spark is an integrated suite of media creation applications for the mobile and web developed by Adobe Systems. It comprises three separate design apps: Spark Page, Spark Post, and Spark Video.'}},
     {'text': 'Apache Kafka',
      'id': 2926897,
      'metadata': {'definition': 'Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue designed as a distributed transaction log," making it highly valuable for enterprise infrastructures to process streaming data. Kafka can also connect to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.  The design is heavily influenced by transaction logs.'}},
     {'text': 'Library of Efficient Data types and Algorithms',
      'id': 3233248,
      'metadata': {'definition': 'The Library of Efficient Data types and Algorithms (LEDA) is a proprietarily-licensed software library providing C++ implementations of a broad variety of algorithms for graph theory and computational geometry. It was originally developed by the Max Planck Institute for Informatics Saarbrucken. Since 2001, LEDA is further developed and distributed by the Algorithmic Solutions Software GmbH.  LEDA is available as Free, Research, and Professional edition. The Free edition is freeware, with source code access available for purchase. The Research and Professional editions require payment of licensing fees for any use. Since October 2017, LEDA graph algorithms are also available for Java development environment.'}},
     {'text': 'Scala (programming language)',
      'id': 479339,
      'metadata': {'definition': 'Scala ( ) is a general-purpose programming language providing support for functional programming and a strong static type system. Designed to be concise, many of Scala\'s design decisions aimed to address criticisms of Java.  Scala source code is intended to be compiled to Java bytecode, so that the resulting executable code runs on a Java virtual machine. Scala provides language interoperability with Java, so that libraries written in either language may be referenced directly in Scala or Java code. Like Java, Scala is object-oriented, and uses a curly-brace syntax reminiscent of the C programming language. Unlike Java, Scala has many features of functional programming languages like Scheme, Standard ML and Haskell, including currying, type inference, immutability, lazy evaluation, and pattern matching. It also has an advanced type system supporting algebraic data types, covariance and contravariance, higher-order types (but not higher-rank types), and anonymous types. Other features of Scala not present in Java include operator overloading, optional parameters, named parameters, and raw strings. Conversely, a feature of Java not in Scala is checked exceptions, which have proved controversial.  The name Scala is a portmanteau of "scalable" and "language", signifying that it is designed to grow with the demands of its users.'}},
     {'text': 'Scala (company)',
      'id': 2599425,
      'metadata': {'definition': 'Scala is a producer of multimedia software. It was founded in 1987 as a Norwegian company called Digital Visjon. It is headquartered near Philadelphia, Pennsylvania, USA, and has subsidiaries in Europe and Asia.'}},
     {'text': 'Swift (parallel scripting language)',
      'id': 3311516,
      'metadata': {'definition': 'Swift is an implicitly parallel programming language that allows writing scripts that distribute program execution across distributed computing resources, including clusters, clouds, grids, and supercomputers. Swift implementations are open-source software under the Apache License, version 2.0.'}},
     {'text': 'Accelerated Mobile Pages',
      'id': 3624719,
      'metadata': {'definition': 'AMP (originally an acronym for Accelerated Mobile Pages) is a web component framework and a website publishing technology developed by Google which has the mission to "provide a user-first format for web content".'}},
     {'text': 'Relational data stream management system',
      'id': 4570575,
      'metadata': {'definition': 'A relational data stream management system (RDSMS) is a distributed, in-memory data stream management system (DSMS) that is designed to use standards-compliant SQL queries to process unstructured and structured data streams in real-time. Unlike SQL queries executed in a traditional RDBMS, which return a result and exit, SQL queries executed in a RDSMS do not exit, generating results continuously as new data become available. Continuous SQL queries in a RDSMS use the SQL Window function to analyze, join and aggregate data streams over fixed or sliding windows. Windows can be specified as time-based or row-based.'}},
     {'text': 'Apache Hadoop',
      'id': 4887603,
      'metadata': {'definition': 'Apache Hadoop () is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardwarestill the common useit has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.  The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality, where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.  The base Apache Hadoop framework is composed of the following modules:   The term "Hadoop" is often used for both base modules and sub-modules and also the "ecosystem", or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache'}},
     {'text': 'IBM BLU Acceleration',
      'id': 2728469,
      'metadata': {'definition': 'IBM BLU Acceleration is a collection of technologies from the IBM Research and Development Labs for analytical database workloads. BLU Acceleration integrates a number of different technologies including in-memory processing of columnar data, Actionable Compression (which uses "approximate Huffman encoding" to compress and pack data tightly), CPU Acceleration (which exploits SIMD technology and provides parallel vector processing), and Data Skipping (which allows data that\'s of no use to the current active workload to be ignored). The term ‘BLU’ does not stand for anything in particular; however it has an indirect play on IBM\'s traditional corporate nickname Big Blue. (Ten IBM Research and Development facilities around the world filed more than 25 patents while working on the Blink Ultra project, which has resulted in BLU Acceleration.) BLU Acceleration does not require indexes, aggregates or tuning. BLU Acceleration is integrated in Version 10.5 of IBM DB2 for Linux, Unix and Windows,(DB2 for LUW) and uses the same storage and memory constructs (i.e., storage groups, table spaces, and buffer pools), SQL language interfaces, and administration tools as traditional DB2 for LUW databases. BLU Acceleration is available on both IBM POWER and x86 processor architectures.'}},
     {'text': 'Russell Churney',
      'id': 81903,
      'metadata': {'definition': 'Russell Churney (10 September 1964 – 27 February 2007) was an English composer, pianist, arranger and musical director. He was also a member of the comedy/cabaret group, "Fascinating Aida". His sister is Ooberman keyboardist and vocalist Sophia Churney.'}},
     {'text': 'Foundry Discovery Protocol',
      'id': 5220003,
      'metadata': {'definition': 'The Foundry Discovery Protocol (FDP) is a proprietary data link layer protocol. It was developed by Foundry Networks.  Although Foundry Networks was acquired by Brocade Communications Systems, the protocol is still supported.'}},
     {'text': 'Russell C. Eberhart',
      'id': 4941693,
      'metadata': {'definition': 'Russell C. Eberhart, an American electrical engineer, best known as the co-developer of particle swarm optimization concept (with James Kennedy (social psychologist)). He is professor of Electrical and Computer Engineering, and adjunct professor of Biomedical Engineering at the Purdue School of Engineering and Technology, Indiana University Purdue University Indianapolis (IUPUI). Fellow of the IEEE. Fellow of the American Institute for Medical and Biological Engineering.  He earned a Ph.D. in electrical engineering from Kansas State University in 1972. And he was Associate Editor of IEEE Transactions on Evolutionary Computation and Past president of IEEE Neural Networks Council.'}},
     {'text': 'Apache Giraph',
      'id': 2120100,
      'metadata': {'definition': "Apache Giraph is an Apache project to perform graph processing on big data. Giraph utilizes Apache Hadoop's MapReduce implementation to process graphs. Facebook used Giraph with some performance improvements to analyze one trillion edges using 200 machines in 4 minutes. Giraph is based on a paper published by Google about its own graph processing system called Pregel. It can be compared to other Big Graph processing libraries such as Cassovary."}},
     {'text': 'Apache Flink',
      'id': 1677184,
      'metadata': {'definition': "Apache Flink is an open-source stream-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.  Flink provides a high-throughput, low-latency streaming engine as well as support for event-time processing and state management. Flink applications are fault-tolerant in the event of machine failure and support exactly-once semantics. Programs can be written in Java, Scala, Python, and SQL and are automatically compiled and optimized into dataflow programs that are executed in a cluster or cloud environment.  Flink does not provide its own data-storage system, but provides data-source and sink connectors to systems such as Amazon Kinesis, Apache Kafka, Alluxio, HDFS, Apache Cassandra, and ElasticSearch."}},
     {'text': 'Precision Graphics Markup Language',
      'id': 305750,
      'metadata': {'definition': 'Precision Graphics Markup Language (PGML) is an XML-based language for representing vector graphics. It was a World Wide Web Consortium (W3C) submission by Adobe Systems, IBM, Netscape, and Sun Microsystems, that was not adopted as a recommendation. PGML is a 2D graphical format, offering precision for graphic artists, guaranteeing that the design created will appear in end user systems with the correct formatting, layout and the precision of color.  PGML and Vector Markup Language, another XML-based vector graphics language W3C submission supported by Autodesk, Hewlett-Packard, Macromedia, Microsoft, and Visio Corporation, were later joined and improved upon to create Scalable Vector Graphics (SVG).'}},
     {'text': 'SPARQL',
      'id': 303988,
      'metadata': {'definition': 'SPARQL (pronounced "sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the "RDF Data Access Working Group" (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 became an official W3C Recommendation, and SPARQL 1.1 in March, 2013.  SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns.  Implementations for multiple programming languages exist. There exist tools that allow one to connect and semi-automatically construct a SPARQL query for a SPARQL endpoint, for example ViziQuer. In addition, there exist tools that translate SPARQL queries to other query languages, for example to SQL and to XQuery.'}},
     {'text': 'Rusty Russell',
      'id': 1948954,
      'metadata': {'definition': "Rusty Russell is an Australian free software programmer and advocate, known for his work on the Linux kernel's networking subsystem and the Filesystem Hierarchy Standard."}},
     {'text': 'Sparkpr',
      'id': 1362364,
      'metadata': {'definition': 'Spark Public Relations (also referred to as Spark) is a public relations and integrated marketing communications agency headquartered in San Francisco, California, with offices in New York City. The firm focuses on the technology and consumer markets.'}},
     {'text': 'Akka (toolkit)',
      'id': 4831850,
      'metadata': {'definition': 'Akka is a free and open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM. Akka supports multiple programming models for concurrency, but it emphasizes actor-based concurrency, with inspiration drawn from Erlang.  Language bindings exist for both Java and Scala. Akka is written in Scala and, as of Scala 2.10, the actors in the Scala standard library are deprecated in favor of Akka.'}},
     {'text': 'Spark (mathematics)',
      'id': 2626076,
      'metadata': {'definition': ''}},
     {'text': 'Mimer SQL',
      'id': 524764,
      'metadata': {'definition': 'Mimer SQL is an SQL-based relational database management system produced by the Swedish company "Mimer Information Technology AB" (Mimer AB), formerly known as "Upright Database Technology AB". It was originally developed as a research project at the Uppsala University, Uppsala, Sweden in the 1970s before being developed into a commercial product.  The database has been deployed in a wide range of application situations, including the NHS "Pulse" blood transfusion service in the UK, Volvo Cars production line in Sweden and automotive dealers in Australia. It has sometimes been one of the limited options available in realtime critical applications and resource restricted situations such as mobile devices.'}},
     {'text': 'Spark Infrastructure',
      'id': 359389,
      'metadata': {'definition': 'Spark Infrastructure () is an entity listed on the Australian Securities Exchange which invests in infrastructure assets.  Its principal investment is a 49% holding in SA Power Networks, CitiPower and Powercor, with the remaining 51% owned by the Cheung Kong group.  The company debuted on the stock exchange on 16 December 2005 as SKICA representing an instalment receipt of $1.26 with a final instalment of $0.54 due in March 2007 when the instalment receipts were transferred for SKI shares.'}},
     {'text': 'Apache Apex',
      'id': 2885430,
      'metadata': {'definition': 'Apache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable.  Apache Apex was named a top-level project by The Apache Software Foundation on April 25, 2016.'}},
     {'text': 'Relational database',
      'id': 852160,
      'metadata': {'definition': 'A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. A software system used to maintain relational databases is a relational database management system (RDBMS). Virtually all relational database systems use SQL (Structured Query Language) for querying and maintaining the database.'}},
     {'text': 'Sapphire Rapids',
      'id': 5137457,
      'metadata': {'definition': 'Sapphire Rapids is the Intel CPU microarchitecture based on either the second refinement of the 10\xa0nanometer process or a new 7\xa0nanometer process. It will be used as part of the Tinsley workstation and server platform after 2020.  Only very limited information on the desktop/mobile version of Sapphire Rapids exists . Difficulties with the 10\xa0nm and 7\xa0nm fabrication processes may result in the release being pushed back to 2021 at the earliest.'}},
     {'text': 'Lola (computing)',
      'id': 1514700,
      'metadata': {'definition': 'Lola is designed to be a simple hardware description language for describing synchronous, digital circuits. Niklaus Wirth developed the language to teach digital design on field-programmable gate arrays (FPGAs) to computer science students while a professor at ETH Zurich.  The purpose of Lola is to statically describe the structure and functionality of hardware components and of the connections between them. A Lola text is composed of declarations and statements. It describes the hardware on the gate level in the form of signal assignments. Signals are combined using operators and assigned to other signals. Signals and the respective assignments can be grouped together into types. An instance of a type is a hardware component. Types can be composed of instances of other types, thereby supporting a hierarchical design style and they can be generic (e.g. parametrizable with the word-width of a circuit).  All of the concepts mentioned above are demonstrated in the following example of a circuit for adding binary data. First, a fundamental building block (TYPE Cell) is defined, then this Cell is used to declare a cascade of word-width 8, and finally the Cells are connected to each other. The MODULE Adder defined in this example can serve as a building block on a higher level of the design hierarchy.  Wirth describes Lola from a user\'s perspective in his book "Digital Circuit Design". A complementary view on the details of the Lola compiler\'s implementation can be found in Wirth\'s technical report "Lola System Notes". An overview of the whole system of'}},
     {'text': 'Spark', 'id': 1458099, 'metadata': {'definition': ''}},
     {'text': 'PostGIS',
      'id': 129716,
      'metadata': {'definition': 'PostGIS ( ) is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium (OGC).  Technically PostGIS was implemented as a "PostgreSQL external extension".'}},
     {'text': 'Kinetica (software)',
      'id': 2362723,
      'metadata': {'definition': 'Kinetica DB, Inc. is a company that develops a distributed, in-memory database management system using graphics processing units (GPUs). The software it markets is also called Kinetica. The company has headquarters in Arlington, Virginia and San Francisco.'}},
     {'text': 'SQream DB',
      'id': 1058641,
      'metadata': {'definition': 'SQream DB is a relational database management system (RDBMS) that uses graphics processing units (GPUs) from Nvidia. SQream DB is designed for big data analytics using the Structured Query Language (SQL).'}},
     {'text': 'SciDB',
      'id': 1354996,
      'metadata': {'definition': 'SciDB is a column-oriented database management system (DBMS) designed for multidimensional data management and analytics common to scientific, geospatial, financial, and industrial applications. It is developed by Paradigm4 and co-created by Turing Award winner Michael Stonebraker.'}},
     {'text': 'X-Video Bitstream Acceleration',
      'id': 648057,
      'metadata': {'definition': "X-Video Bitstream Acceleration (XvBA), designed by AMD Graphics for its Radeon GPU and Fusion APU, is an arbitrary extension of the X video extension (Xv) for the X Window System on Linux operating-systems. XvBA API allows video programs to offload portions of the video decoding process to the GPU video-hardware. Currently, the portions designed to be offloaded by XvBA onto the GPU are currently motion compensation (MC) and inverse discrete cosine transform (IDCT), and variable-length decoding (VLD) for MPEG-2, MPEG-4 ASP (MPEG-4 Part 2, including Xvid, and older DivX and Nero Digital), MPEG-4 AVC (H.264), WMV3, and VC-1 encoded video.  XvBA is a direct competitor to NVIDIA's Video Decode and Presentation API for Unix (VDPAU) and Intel's Video Acceleration API (VA API).  In November 2009 a XvBA backend for Video Acceleration API (VA API) was released, which means any software that supports VA API will also support XvBA.  On 24 February 2011, an official XvBA SDK (Software Development Kit) was publicly released alongside a suite of open source tools by AMD."}},
     {'text': 'Fast Virtual Disk',
      'id': 3385236,
      'metadata': {'definition': 'Fast Virtual Disk (better known as FVD) is a virtualization-oriented disk image file format developed by IBM for the QEMU virtualization platform. It differs from existing paravirtualization-centric virtual disk image formats through a design that emphasizes lack of contention and separation of concerns between the host and guest kernels through duduplication of filesystem and block layer storage management.  FVD can be written either directly to a physical or logical blockstore (avoiding host filesystem overheads), or to a regular host file system file. It strives to maintain similarity to raw disk layouts, eliminate host filesystem and disk image compression overheads, and minimize metadata-related overheads.'}},
     {'text': 'Apache Hive',
      'id': 5608138,
      'metadata': {'definition': 'Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.'}},
     {'text': 'Swagger (software)',
      'id': 5112674,
      'metadata': {'definition': 'Swagger is an open-source software framework backed by a large ecosystem of tools that helps developers design, build, document, and consume RESTful Web services. While most users identify Swagger by the Swagger UI tool, the Swagger toolset includes support for automated documentation, code generation, and test-case generation.  Sponsored by SmartBear Software, Swagger has been a strong supporter of open-source software, and has widespread adoption.'}},
     {'text': 'Accelerated-X',
      'id': 5577064,
      'metadata': {'definition': 'Accelerated-X is a proprietary port of the X Window System to Intel x86 machines.'}},
     {'text': 'Sones GraphDB',
      'id': 2813396,
      'metadata': {'definition': 'Sones GraphDB was a graph database developed by the German company sones GmbH, available from 2010 to 2012. Its last version was released in May 2011. sones GmbH, which was based in Erfurt and Leipzig, was declared bankrupt on January 1, 2012.  GraphDB was unique in that its design based on weighted graphs. The open source edition was released in July 2010. The commercially available enterprise version offered a wider variety of functions.  GraphDB was developed in the programming language C# and ran on Microsoft\'s .NET Framework and on the open source reimplementation Mono.  GraphDB was available as software as a service (SaaS) on the Microsoft cloud Azure Services Platform. GraphDB was also a component of an open source solution stack.  In 2014 the trademark "GraphDB" was acquired by Ontotext. OWLIM, Ontotext\'s graph database and RDF triplestore, was renamed GraphDB.'}},
     {'text': 'Data stream management system',
      'id': 5750284,
      'metadata': {'definition': 'A data stream management system (DSMS) is a computer software system to manage continuous data streams. It is similar to a database management system (DBMS), which is, however, designed for static data in conventional databases. A DSMS also offers a flexible query processing so that the information need can be expressed using queries. However, in contrast to a DBMS, a DSMS executes a "continuous query" that is not only performed once, but is permanently installed. Therefore, the query is continuously executed until it is explicitly uninstalled. Since most DSMS are data-driven, a continuous query produces new results as long as new data arrive at the system. This basic concept is similar to Complex event processing so that both technologies are partially coalescing.'}},
     {'text': 'SPARK (rocket)',
      'id': 130472,
      'metadata': {'definition': 'SPARK, or Spaceborne Payload Assist Rocket - Kauai, also known as Super Strypi, is an American expendable launch system developed by the University of Hawaii, Sandia and Aerojet Rocketdyne. Designed to place miniaturized satellites into low Earth and sun-synchronous orbits, it is a derivative of the Strypi rocket which was developed in the 1960s in support of nuclear weapons testing. SPARK is being developed under the Low Earth Orbiting Nanosatellite Integrated Defense Autonomous System (LEONIDAS) program, funded by the Operationally Responsive Space Office of the United States Department of Defense.'}},
     {'text': 'Spark (application)',
      'id': 4874478,
      'metadata': {'definition': 'Spark is an email application for iOS, macOS, and Android devices by Readdle. "Lifehacker" wrote that Spark was the best alternative for Mailbox users when that service went offline.'}},
     {'text': 'Array Based Queuing Locks',
      'id': 3940107,
      'metadata': {'definition': 'Array-Based Queuing Lock (ABQL) is an advanced lock algorithm that ensures that threads spin on unique memory locations thus ensuring fairness of lock acquisition coupled with improved scalability.'}},
     {'text': 'Apache HTTP Server',
      'id': 2313373,
      'metadata': {'definition': 'The Apache HTTP Server, colloquially called Apache ( ), is free and open-source cross-platform web server software, released under the terms of Apache License 2.0. Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation.  The vast majority of Apache HTTP Server instances run on a Linux distribution, but current versions also run on Microsoft Windows and a wide variety of Unix-like systems. Past versions also ran on OpenVMS, NetWare, OS/2 and other operating systems.  Originally based on the NCSA HTTPd server, development of Apache began in early 1995 after work on the NCSA code stalled. Apache played a key role in the initial growth of the World Wide Web, quickly overtaking NCSA HTTPd as the dominant HTTP server, and has remained most popular since April 1996. In 2009, it became the first web server software to serve more than 100 million websites. , it was estimated to serve 39% of all active websites and 35% of the top million websites.'}},
     {'text': 'Apache Parquet',
      'id': 401868,
      'metadata': {'definition': 'Apache Parquet is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is similar to the other columnar-storage file formats available in Hadoop namely RCFile and ORC. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.'}},
     {'text': 'QuickDB ORM',
      'id': 3355343,
      'metadata': {'definition': 'QuickDB is an object-relational mapping framework for the Java software platform. It was developed by Diego Sarmentero along with others and is licensed under the LGPL License. Versions for .NET, Python and PHP are also being developed.'}},
     {'text': 'FORTRAS',
      'id': 2028219,
      'metadata': {'definition': 'Fortras ("Forschungs und Entwicklungsgesellschaft fur Transportwesen" or: "Research and Development Corporation for the Transportation Sector") is an EDI standard for data exchange between carriers.  Fortras was the designation of an independent company until 2001, when it merged with System Alliance. The designation \'Fortras\' is now solely used for the EDI standard that also is used outside of System Alliance.  Important Fortras message types are "BORD" for consignment information (comparable to EDIFACT\'s IFCSUM), "ENTL" for unloading reports from the receiving carrier, and "STAT" for status information (comparable to EDIFACT\'s IFTSTA).  Fortras messages consist of a sequence of lines with a fixed number of characters, e.g. 80, 128, or 512 and data elements are placed on fixed positions on each line. The document type is specified by a letter and a record number at the beginning of the line.'}},
     {'text': 'Tabular Data Stream',
      'id': 362406,
      'metadata': {'definition': 'Tabular Data Stream (TDS) is an application layer protocol, used to transfer data between a database server and a client. It was initially designed and developed by Sybase Inc. for their Sybase SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server.'}},
     {'text': 'Advanced eXtensible Interface',
      'id': 3079622,
      'metadata': {'definition': 'The Advanced eXtensible Interface (AXI), part of the ARM Advanced Microcontroller Bus Architecture 3 (AXI3) and 4 (AXI4) specifications , is a parallel high-performance, synchronous, high-frequency, multi-master, multi-slave communication interface, mainly designed for on-chip communication.  AXI has been introduced in 2003 with the AMBA3 specification. In 2010, a new revision of AMBA, AMBA4, defined the AXI4, AXI4-Lite and AXI4-Stream protocol. AXI is royalty-free and its specification is freely available from ARM.  AXI offers a wide spectrum of features, including:  AMBA AXI specifies many optional signals, which can be optionally included depending on the specific requirements of the design , making AXI a versatile bus for a large number of applications.  While the communication over an AXI bus is between a single master and a single slave, the specification includes detailed description and signals to include N:M interconnects, able to extend the bus to topologies with more masters and slaves .  AMBA AXI4, AXI4-Lite and AXI4-Stream have been adopted by Xilinx and many of its partners as main communication buses in their products .'}},
     {'text': 'Advanced Boolean Expression Language',
      'id': 336579,
      'metadata': {'definition': "The Advanced Boolean Expression Language (ABEL) is an obsolete hardware description language and an associated set of design tools for programming PLDs. It was created in 1983 by Data I/O Corporation, in Redmond, Washington.  ABEL includes both concurrent equation and truth table logic formats as well as a sequential state machine description format. A preprocessor with syntax loosely based on DEC's Macro-11 is also included.  In addition to being used for logic descriptions, ABEL may also be used to describe test vectors (patterns of inputs and expected outputs) that may be downloaded to a hardware device programmer along with the compiled and fuse-mapped PLD programming data.  Other PLD design languages originating in the same era include CUPL and PALASM. Since the advent of larger Field Programmable Gate Arrays (FPGAs), PLD languages have fallen out of favor as standard Hardware Description Languages (HDLs) such as VHDL and Verilog have gained in popularity.  The ABEL concept and original compiler were created by Russell de Pina of Data I/O's Applied Research Group in 1981. The work was continued by ABEL product development team (led by Dr. Kyu Y. Lee) and included Mary Bailey, Bjorn Benson, Walter Bright, Michael Holley, Charles Olivier and David Pellerin.  After a series of acquisitions, the ABEL toolchain and IP were bought by Xilinx Inc. Xilinx discontinued support for ABEL in ISE design suite 11, released in 2010."}},
     {'text': 'Explicit data graph execution',
      'id': 1528796,
      'metadata': {'definition': 'Explicit data graph execution, or EDGE, is a type of instruction set architecture (ISA) which intends to improve computing performance compared to common processors like the Intel x86 line. EDGE combines many individual instructions into a larger group known as a "hyperblock". Hyperblocks are designed to be able to easily run in parallel.  Parallelism of modern CPU designs generally starts to plateau at about eight internal units and from one to four "cores", EDGE designs intend to support hundreds of internal units and offer processing speeds hundreds of times greater than existing designs. Major development of the EDGE concept had been led by the University of Texas at Austin under DARPA\'s Polymorphous Computing Architectures program, with the stated goal of producing a single-chip CPU design with 1 TFLOPS performance by 2012, which has yet to be realized as of 2018.'}},
     {'text': 'NoSQL',
      'id': 4468440,
      'metadata': {'definition': 'A NoSQL (originally referring to "non SQL" or "non relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the "NoSQL" moniker until a surge of popularity in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages, or sit alongside SQL database in a polyglot persistence architecture.  Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines (which is a problem for relational databases), finer control over availability and limiting the object-relational impedance mismatch. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables.  Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability, partition tolerance, and speed. Barriers to the greater adoption of NoSQL stores include the use of low-level query languages (instead of SQL, for instance the lack of ability to perform ad-hoc joins across tables), lack'}},
     {'text': 'Spark New Zealand',
      'id': 160844,
      'metadata': {'definition': 'Spark New Zealand Limited, more commonly known Spark, is a New Zealand telecommunications company providing fixed line telephone services, a mobile network, an internet service provider, and a major ICT provider to NZ businesses (through its Spark Digital division). Its name in te reo Maori is Kora Aotearoa, and it was formerly known as Telecom New Zealand until it was rebranded with its current name in 2014. It has operated as a publicly traded company since 1990.  Spark is one of the largest companies by value on the New Zealand Exchange (NZX). As of 2007, it was the 39th largest telecommunications company in the OECD. The company is part of New Zealand Telecommunications Forum.  Telecom New Zealand was formed in 1987 from a division of the New Zealand Post Office, and privatised in 1990. In 2008, Telecom was operationally separated into three divisions under local loop unbundling initiatives by central government – Telecom Retail; Telecom Wholesale; and Chorus, the network infrastructure division. This separation effectively ended any remnants of monopoly that Telecom Retail once had in the market. In 2011 the demerger process was complete, with Telecom and Chorus becoming separate listed companies. On 8 August 2014, the company changed its name to Spark New Zealand.'}},
     {'text': 'NewSQL',
      'id': 5065946,
      'metadata': {'definition': 'NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.  Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) are too large for conventional relational databases, but have transactional and consistency requirements that are not practical for NoSQL systems. The only options previously available for these organizations were to either purchase more powerful computers or to develop custom middleware that distributes requests over conventional DBMS. Both approaches feature high costs and/or development costs. NewSQL systems attempt to reconcile the conflicts.'}},
     {'text': 'Scylla (database)',
      'id': 2832970,
      'metadata': {'definition': "Scylla is an open-source distributed NoSQL data store. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It supports the same protocols as Cassandra (CQL and Thrift) and the same file formats (SSTable), but is a completely rewritten implementation, using the C++17 language replacing Cassandra's Java, and the Seastar asynchronous programming library replacing threads, shared memory, mapped files, and other classic Linux programming techniques.  Scylla uses a sharded design on each node, meaning that each CPU core handles a different subset of data. Cores do not share data, but rather communicate explicitly when they need to. The Scylla authors claim that this design allows Scylla to achieve much better performance on modern NUMA SMP machines, and to scale very well with the number of cores. They have measured as much as 2 million requests per second on a single machine, and also claim that a Scylla cluster can serve as many requests as a Cassandra cluster 10 times its size - and do so with lower latencies. Independent testing has not always been able to confirm such 10-fold throughput improvements, and sometimes measured smaller speedups, such as 2x. A 2017 benchmark from Samsung observed the 10x speedup on high-end machines - the Samsung benchmark reported that Scylla outperformed Cassandra on a cluster of 24-core machines by a margin of 10–37x depending on the YCSB workload."}},
     {'text': 'Pillar Data Systems',
      'id': 2961274,
      'metadata': {'definition': 'Pillar Data Systems, a computer data storage company headquartered in San Jose, California, developed midrange and enterprise network storage systems. Pillar Data employed 325 people and sold its products to organizations in the financial services, healthcare, government and legal industries. Its primary product-offering was the Axiom platform.'}},
     {'text': 'Adobe Spark Video',
      'id': 1129903,
      'metadata': {'definition': "Adobe Spark Video is a video storytelling application for the iPad and iPhone developed by Adobe Systems. It combines motion graphics, audio recording, music, text, and photos and is used to produce short animated, narrated explainer videos. It is part of the Adobe Spark suite of design and storytelling apps. It became the company's first application to be named by Apple as an App Store Best App of the Year and has been downloaded over 3.5 million times."}},
     {'text': 'Node stream',
      'id': 1498555,
      'metadata': {'definition': 'A node stream is a method of transferring large amounts of data on mobile devices or websites (such as uploading detailed photographs) by breaking the file or data down into manageable chunks. The chunks of data do not use as much computer memory, so they are less likely to slow down the device, allowing the user to do other things on it whilst waiting for the file transfer to complete.  In technical terms, in Node.js a node stream is a readable or writable continuous flow of data that can be manipulated asynchronously as data comes in (or out).  This API can be used in data intensive web applications where scalability is an issue.  A node stream can be many different things: a file stream, a parser, an HTTP request, a child process, etc.'}},
     {'text': 'Proof of authority',
      'id': 1784918,
      'metadata': {'definition': 'Proof of authority (PoA) is an algorithm used with blockchains that delivers comparatively fast transactions through a consensus mechanism based on identity as a stake.'}},
     {'text': 'Presto (SQL query engine)',
      'id': 2188458,
      'metadata': {'definition': 'Presto is a high performance, distributed SQL query engine for big data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. One can even query data from multiple data sources within a single query. Presto is community driven open-source software released under the Apache License.'}},
     {'text': 'Apache Flume',
      'id': 755849,
      'metadata': {'definition': 'Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.'}},
     {'text': 'Matroska',
      'id': 1585285,
      'metadata': {'definition': 'The Matroska Multimedia Container is a free, open-standard container format, a file format that can hold an unlimited number of video, audio, picture, or subtitle tracks in one file. It is a universal format for storing common multimedia content, like movies or TV shows. Matroska is similar in concept to other containers like AVI, MP4, or Advanced Systems Format (ASF), but is entirely open in specification, with implementations consisting mostly of open source software. Matroska file extensions are .MKV for video (which may or may not include subtitles and audio), .MK3D for stereoscopic video, .MKA for audio-only files, and .MKS for subtitles only.  "Matroska" is derived from "matryoshka" ( ), which refers to the hollow wooden Russian matryoshka doll which opens to expose another doll that in turn opens to expose another doll, and so on. That may be confusing for Russian speakers, as the Russian word "matroska" () actually refers to a sailor suit. The logo uses "Matroska", with the caron over the "s", as the letter s represents the "sh" sound () in various languages.'}},
     {'text': 'Data Format Description Language',
      'id': 5384008,
      'metadata': {'definition': 'Data Format Description Language (DFDL, often pronounced "daff-o-dil"), published as an Open Grid Forum Proposed Recommendation in January 2011, is a modeling language for describing general text and binary data in a standard way. A DFDL model or schema allows any text or binary data to be read (or "parsed") from its native format and to be presented as an instance of an "information set". (An information set is a logical representation of the data contents, independent of the physical format. For example, two records could be in different formats, because one has fixed-length fields and the other uses delimiters, but they could contain exactly the same data, and would both be represented by the same information set). The same DFDL schema also allows data to be taken from an instance of an information set and written out (or "serialized") to its native format.  DFDL is "descriptive" and not "prescriptive". DFDL is not a data format, nor does it impose the use of any particular data format. Instead it provides a standard way of describing many different kinds of data format. This approach has several advantages. It allows an application author to design an appropriate data representation according to their requirements while describing it in a standard way which can be shared, enabling multiple programs to directly interchange the data.  DFDL achieves this by building upon the facilities of W3C XML Schema 1.0. A subset of XML Schema is used, enough to enable the modeling of non-XML data. The motivations for this approach'}},
     {'text': 'GraphML',
      'id': 1908132,
      'metadata': {'definition': 'GraphML is an XML-based file format for graphs. The GraphML file format results from the joint effort of the graph drawing community to define a common format for exchanging graph structure data. It uses an XML-based syntax and supports the entire range of possible graph structure constellations including directed, undirected, mixed graphs, hypergraphs, and application-specific attributes.'}},
     {'text': 'Stream processing',
      'id': 4570880,
      'metadata': {'definition': 'Stream processing is a computer programming paradigm, equivalent to dataflow programming, event stream processing, and reactive programming, that allows some applications to more easily exploit a limited form of parallel processing. Such applications can use multiple computational units, such as the floating point unit on a graphics processing unit or field-programmable gate arrays (FPGAs), without explicitly managing allocation, synchronization, or communication among those units.  The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed. Given a sequence of data (a "stream"), a series of operations ("kernel functions") is applied to each element in the stream. Kernel functions are usually pipelined, and optimal local on-chip memory reuse is attempted, in order to minimize the loss in bandwidth, accredited to external memory interaction. "Uniform streaming", where one kernel function is applied to all elements in the stream, is typical. Since the kernel and stream abstractions expose data dependencies, compiler tools can fully automate and optimize on-chip management tasks. Stream processing hardware can use scoreboarding, for example, to initiate a direct memory access (DMA) when dependencies become known. The elimination of manual DMA management reduces software complexity, and an associated elimination for hardware cached I/O, reduces the data area expanse that has to be involved with service by specialized computational units such as arithmetic logic units.  During the 1980s stream processing was explored within dataflow programming. An example is the language SISAL (Streams and Iteration in a Single Assignment Language).'}},
     {'text': 'FriCAS',
      'id': 5361314,
      'metadata': {'definition': 'FriCAS is a general purpose computer algebra system with a strong focus on mathematical research and development of new algorithms. It comprises an interpreter, a compiler and a still-growing library of more than 1,000 domains and categories.  FriCAS provides a strongly typed high-level programming language called SPAD and a similar interactive language that uses type-inferencing for convenience. Aldor was intentionally developed being the next generation compiler for Axiom and forks. FriCAS (optionally) allows running Aldor programs. Both languages share a similar syntax and a sophisticated (dependent) type system.  FriCAS is comprehensively documented and available as source code and as a binary distribution for the most common platforms. Compiling the sources requires besides other prerequisites a Common Lisp environment (whereby many of the major implementations are supported and freely available as open source).  FriCAS runs on many POSIX platforms such as Linux, macOS, Unix, BSD as well as under Cygwin and Microsoft Windows (restricted).'}},
     {'text': 'Apache Impala',
      'id': 3270712,
      'metadata': {'definition': 'Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.'}},
     {'text': 'Microsoft Azure SQL Database',
      'id': 5648026,
      'metadata': {'definition': 'Microsoft Azure SQL Database (formerly SQL Azure, SQL Server Data Services, SQL Services, and Windows Azure SQL Database) is a managed cloud database (SaaS) provided as part of Microsoft Azure.  A cloud database is a database that runs on a cloud computing platform, and access to it is provided as a service. Managed database services take care of scalability, backup, and high availability of the database. Azure SQL Database is a managed database service which is different from AWS RDS which is a container service.  Microsoft Azure SQL Database includes built-in intelligence that learns app patterns and adapts to maximize performance, reliability, and data protection.  It was originally announced in 2009 and released in 2010.  Key capabilities include:'}},
     {'text': 'Tuple-versioning',
      'id': 2281550,
      'metadata': {'definition': "Tuple-versioning (also called point-in-time) is a mechanism used in a relational database management system to store past states of a relation. Normally, only the current state is captured.  Using tuple-versioning techniques, typically two values for time are stored along with each tuple: a start time and an end time. These two values indicate the validity of the rest of the values in the tuple.  Typically when tuple-versioning techniques are used, the current tuple has a valid start time, but a null value for end time. Therefore, it is easy and efficient to obtain the current values for all tuples by querying for the null end time.  A single query that searches for tuples with start time less than, and end time greater than, a given time (where null end time is treated as a value greater than the given time) will give as a result the valid tuples at the given time.  For example, if a person's job changes from Engineer to Manager, there would be two tuples in an Employee table, one with the value Engineer for job and the other with the value Manager for job. The end time for the Engineer tuple would be equal to the start time for the Manager tuple.  The pattern known as log trigger uses this technique to automatically store historical information of a table in a database."}},
     {'text': 'Linter SQL RDBMS',
      'id': 1032423,
      'metadata': {'definition': 'Linter SQL RDBMS is the main product of RELEX Group. Linter is a Russian DBMS compliant with the standard and supporting the majority of operating systems, among them Windows, various versions of Unix, QNX, and others. The system enables transparent interaction between the client applications and the database server functioning in different hardware and software environments. DBMS Linter includes program interfaces for the majority of popular development tools. The system provides a high data security level allowing the user to work with secret information. Linter is the only DBMS certified by FSTEC of Russia as compliant with Class 2 data security requirements and Level 2 of undeclared feature absence control. For many years Linter has been used by Russian Ministry of Defense, Ministry of Foreign Affairs and other government bodies.'}},
     {'text': 'Apache Avro',
      'id': 4115736,
      'metadata': {'definition': "Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing (Avro IDL) and another which is more machine-readable based on (JSON).  It is similar to Thrift and Protocol Buffers, but does not require running a code-generation program when a schema changes (unless desired for statically-typed languages).  Apache Spark SQL can access Avro as a data source."}},
     {'text': 'Hardware acceleration',
      'id': 2893315,
      'metadata': {'definition': 'In computing, hardware acceleration is the use of computer hardware specially made to perform some functions more efficiently than is possible in software running on a general-purpose . Any transformation of data or routine that can be computed, can be calculated purely in software running on a generic CPU, purely in custom-made hardware, or in some mix of both. An operation can be computed faster in application-specific hardware designed or programmed to compute the operation than specified in software and performed on a general-purpose computer processor. Each approach has advantages and disadvantages. The implementation of computing tasks in hardware to decrease latency and increase throughput is known as hardware acceleration.  Typical advantages of software include more rapid development (leading to faster times to market), lower non-recurring engineering costs, heightened portability, and ease of updating features or patching bugs, at the cost of overhead to compute general operations. Advantages of hardware include speedup, reduced power consumption, lower latency, increased parallelism and bandwidth, and better utilization of area and functional components available on an integrated circuit; at the cost of lower ability to update designs once etched onto silicon and higher costs of functional verification and times to market. In the hierarchy of digital computing systems ranging from general-purpose processors to fully customized hardware, there is a tradeoff between flexibility and efficiency, with efficiency increasing by orders of magnitude when any given application is implemented higher up that hierarchy. This hierarchy includes general-purpose processors such as CPUs, more specialized processors such as GPUs, fixed-function implemented on'}},
     {'text': 'Column (data store)',
      'id': 2023554,
      'metadata': {'definition': 'A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key–value pair) consisting of three elements:'}},
     {'text': 'Oracle Streams',
      'id': 4080868,
      'metadata': {'definition': "In computing, the Oracle Streams product from Oracle Corporation encourages users of Oracle databases to propagate information within and between databases. It provides tools to capture, process ('stage') and manage database events via Advanced Queuing queues.  Oracle Streams is the flow of information either within a single database or from one database to another. Oracle Streams can be set up in homogeneous (all Oracle databases) or heterogeneous (non-Oracle and Oracle databases) environments. The Streams setup uses a set of processes and database objects to share data and messages. The database changes (DDL and DML) are captured at the source; those are then staged and propagated to one or more destination databases to be applied there. Message propagation uses Advanced Queuing mechanism within the Oracle databases.  Applications for the Oracle Streams tool-set include data distribution, data warehousing and data replication."}},
     {'text': 'Scalable Coherent Interface',
      'id': 540882,
      'metadata': {'definition': 'The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations.  The IEEE Std 1596-1992, IEEE Standard for Scalable Coherent Interface (SCI) was approved by the IEEE standards board on March 19, 1992. It saw some use during the 1990s, but never became widely used and has been replaced by other systems from the early 2000s.'}},
     {'text': 'Simple and Fast Multimedia Library',
      'id': 5365880,
      'metadata': {'definition': 'Simple and Fast Multimedia Library (SFML) is a cross-platform software development library designed to provide a simple application programming interface (API) to various multimedia components in computers. It is written in C++ with bindings available for C, Crystal, D, Euphoria, Go, Java, Julia, .NET, Nim, OCaml, Python, Ruby, and Rust. Experimental mobile ports were made available for Android and iOS with the release of SFML 2.2.  SFML handles creating and input to windows, and creating and managing OpenGL contexts. It also provides a graphics module for simple hardware acceleration of 2D computer graphics which includes text rendering using FreeType, an audio module that uses OpenAL and a networking module for basic Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) communication.  SFML is free and open-source software provided under the terms of the zlib/png license. It is available on Linux, macOS, Windows and FreeBSD.<ref name="SFML/cmake/Config.cmake"></ref> The first version v1.0 was released on 9 August 2007, the latest version v2.5.1 was released on 15 Oct 2018.'}},
     {'text': 'AGESA',
      'id': 2910093,
      'metadata': {'definition': "AMD Generic Encapsulated Software Architecture (AGESA), is a procedure library developed by Advanced Micro Devices (AMD), used to perform the Platform Initialization (PI) on mainboards using their AMD64 architecture. As part of the BIOS of such mainboards, AGESA is responsible for the initialization of the processor cores, memory, and the HyperTransport controller.  AGESA was open sourced in early 2011, aiming to aid in the development of coreboot, a project attempting to replace PC's BIOS. However, such releases never became the basis for the development of coreboot, and were subsequently halted."}},
     {'text': 'JUMP GIS',
      'id': 5253310,
      'metadata': {'definition': 'JUMP is a Java based vector and raster GIS and programming framework. Current development continues under the OpenJUMP name.'}},
     {'text': 'Databricks',
      'id': 4370934,
      'metadata': {'definition': 'Databricks is a company founded by the original creators of Apache Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, an open-source distributed computing framework built atop Scala. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. In addition to building the Databricks platform, the company is co-organizing massive open online courses about Spark and runs the largest conference about Spark - Spark Summit.'}},
     {'text': 'Oracle Flashback',
      'id': 155363,
      'metadata': {'definition': "In Oracle databases, Flashback tools allow administrators and users to view and manipulate past states of an instance's data without (destructively) recovering to a fixed point in time.  Compare the functionality of Oracle LogMiner, which identifies how and when data changed rather than its state at a given time."}},
     {'text': 'RapidQ',
      'id': 3212197,
      'metadata': {'definition': 'RapidQ (also known as "Rapid-Q") is a free, cross-platform, semi-object-oriented dialect of the BASIC programming language. It can create console, graphical user interface, and Common Gateway Interface applications. The integrated development environment includes a drag-and-drop form designer, syntax highlighting, and single-button compilation. Versions are available for Microsoft Windows, Linux, Solaris, and HP-UX.  Additional functionality not normally seen in BASIC languages are function callbacks and primitive object-orientation. The language is called semi-object-oriented by its author because there are only two levels of the class hierarchy: built-in classes, and user-defined classes derived from those; the latter cannot be extended further. The ability to call external shared libraries is available, thus giving full access to the underlying operating system\'s application program interface. Other capabilities include built-in interfaces to DirectX and MySQL.  RapidQ features a bytecode compiler that produces standalone executables by binding the generated bytecode with the interpreter. No external run time libraries are needed; the bytecode interpreter is self-contained. The file sizes of executable files created by RapidQ are about 150 kilobytes or larger for console applications.  RapidQ\'s author, William Yu, sold the source code to REAL Software, the makers of REALbasic, in 2000.The freely distributed program has been improved and many additional components have been created by an active user group.'}},
     {'text': 'Epidata',
      'id': 3423786,
      'metadata': {'definition': 'EpiData is a group of applications used in combination for creating documented data structures and analysis of quantitative data. The EpiData Association, which created the software, was created in 1999 and is based in Denmark. EpiData was developed in Pascal and uses open standards such as HTML where possible.  EpiData is widely used by organizations and individuals to create and analyze large amounts of data. The World Health Organization (WHO) uses EpiData in its STEPS method of collecting epidemiological, medical, and public health data, for biostatistics, and for other quantitative-based projects.  Epicentre, the research wing of Medecins Sans Frontieres, uses EpiData to manage data from its international research studies and field epidemiology studies. E.g.: Piola P, Fogg C et al.: Supervised versus unsupervised intake of six-dose artemether-lumefantrine for treatment of acute, uncomplicated Plasmodium falciparum malaria in Mbarara, Uganda: a randomised trial. Lancet. 2005 Apr 23-29;365(9469):1467-73 ". Other examples: ", " or ".  EpiData has two parts:  The software is free; development is funded by governmental and non-governmental organizations like WHO.'}},
     {'text': 'DirectShow',
      'id': 5327319,
      'metadata': {'definition': 'DirectShow (sometimes abbreviated as DS or DShow), codename Quartz, is a multimedia framework and API produced by Microsoft for software developers to perform various operations with media files or streams. It is the replacement for Microsoft\'s earlier Video for Windows technology. Based on the Microsoft Windows Component Object Model (COM) framework, DirectShow provides a common interface for media across various programming languages, and is an extensible, filter-based framework that can render or record media files on demand at the request of the user or developer. The DirectShow development tools and documentation were originally distributed as part of the DirectX SDK. Currently, they are distributed as part of the Windows SDK (formerly known as the Platform SDK).  Microsoft plans to completely replace DirectShow gradually with Media Foundation in future Windows versions. One reason cited by Microsoft is to provide "much more robust support for content protection systems" (see digital rights management). Microsoft\'s MSFT Becky Weiss also confirms that "you\'ll notice that working with the Media Foundation requires you to work at a slightly lower level than working with DirectShow would have. And there are still DirectShow features that aren\'t (yet) in Media Foundation". As described in the Media Foundation article, Windows Vista and Windows 7 applications use Media Foundation instead of DirectShow for several media related tasks.'}}]]],
  'triplet': []}}
	from relik import Relik
	from relik.inference.data.objects import RelikOutput

	relik = Relik.from_pretrained("sapienzanlp/relik-entity-linking-large")
	relik_out: RelikOutput = relik("Russell Jurney is researching the possibility of an FPGA accelerated GraphFrames. GraphFrames is powered by Apache Spark SQL.")

	# Take a look!
	relik_out.to_dict()