RAG testing using different indexes with llamaindex

Different Indexes result in different answers

(Generated by Chat GPT 4.0)

Retrieval-Augmented Generation (RAG) systems, which combine a neural network-based generative model with a retrieval system, can use various types of indexes to retrieve relevant documents or passages. The type of index used can significantly impact the performance and output of the RAG system. Let's explore why using different indexes like Keyword Table Index, Vector Store Index, Summary Index, Tree Index, and Knowledge Graph Index for the same document results in different answers:

Keyword Table Index:
- Nature: It's based on keyword matching.
- Impact on RAG: This index is effective for queries that include specific keywords present in the documents. However, it might not perform well with queries that require understanding the context or semantics beyond mere keyword matching.
- Result Difference: Answers might be more literal and directly related to the presence of specific keywords in the text.
Vector Store Index:
- Nature: Utilizes embeddings (dense vector representations) of documents and queries.
- Impact on RAG: This approach is good for understanding semantic similarities and can retrieve documents that are conceptually related to the query, even if they don't share exact keywords.
- Result Difference: Answers tend to be more contextually relevant and can capture the semantic essence of the query better.
Summary Index:
- Nature: Based on summaries or condensed versions of documents.
- Impact on RAG: Useful for quickly grasping the gist of documents. However, it might miss out on detailed or specific information present in the full text.
- Result Difference: Answers are likely to be more generalized and focused on the main ideas rather than specifics.
Tree Index:
- Nature: Organizes data in a tree-like structure, often hierarchical.
- Impact on RAG: Can be effective for structured data and when the hierarchy or relationships within the data are important for understanding the context.
- Result Difference: Answers may include more structured and hierarchically relevant information.
Knowledge Graph Index:
- Nature: Based on a graph structure that represents entities and their interrelationships.
- Impact on RAG: Excellent for queries that benefit from understanding relationships between entities. It provides a rich, interconnected context.
- Result Difference: Answers are likely to be more insightful in terms of relationships and connections between different concepts or entities.

Each type of index has its strengths and weaknesses, and the choice depends on the specific requirements of the RAG application. For example, a Vector Store Index might be preferred for nuanced, semantic searches, while a Keyword Table might be better for straightforward, keyword-specific queries. The variety in indexing approaches allows RAG systems to be versatile and adaptable to different types of information retrieval needs.

Additional info via Bing Chat query: https://sl.bing.net/bjvw8O7Z7ee

Table summary

Using GPT-4 model, but different indexes.

Some references

Question	VectorSearchIndex (default)	VectorSearchIndex (GPT 4.0)	KeyTableIndex (GPT 4.0)	SummaryIndex (GPT 4.0)	TreeIndex (GPT 4.0)
What schools did the author apply to?	The author applied to two schools: RISD (Rhode Island School of Design) in the US and the Accademia di Belli Arti in Florence.	The author applied to two schools: RISD in the US, and the Accademia di Belli Arti in Florence.	The author applied to three graduate schools: MIT, Yale, and Harvard. Later, he applied to two art schools: RISD in the US, and the Accademia di Belli Arti in Florence.	The author submitted applications to three graduate institutions: MIT, Yale, and Harvard. Additionally, he applied to two art institutions: RISD in the United States, and the Accademia di Belli Arti in Florence.	The author applied to three graduate schools: MIT, Yale, and Harvard.
What did the author do growing up?	The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer. They also got a microcomputer and started programming on it, writing simple games and a word processor.	The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer. They also got a microcomputer and started programming on it, writing simple games and a word processor.	No Answer	The author had a diverse range of interests and experiences during their formative years. They explored various fields including art, programming, and writing. Their background in art played a significant role when they were involved in developing the online store builder, as they emphasized the importance of creating an aesthetically pleasing and polished platform for users. Furthermore, they cultivated their love for writing by crafting short stories and also experimented with programming on different computer systems.	The author wrote short stories and also worked on programming, specifically on an IBM 1401 computer in their school district.
What grad schools did the author apply to?	The author applied to two art schools: RISD in the US and the Accademia di Belli Arti in Florence.	The author applied to two grad schools: RISD in the US and the Accademia di Belli Arti in Florence.	The author applied to MIT, Yale, and Harvard for grad school.	The author did not mention applying to any graduate schools.	The author applied to MIT, Yale, and Harvard for grad school.
What are some of the programming languages the author learned?	The author learned Lisp and an early version of Fortran.	The author learned Lisp, which was regarded as the language of AI at the time. They also mentioned that the commonly used programming languages then were pretty primitive, and the default language at Cornell was a Pascal-like language called PL/I.	The author learned Lisp and an early version of Fortran.	The author learned Arc, an early version of Fortran, and Lisp.	The author learned an early version of Fortran and later worked with microcomputers, specifically the TRS-80.

Different Types of Indexes

Within LlamaIndex, you can use the following indexes:

Keyword Table Index: KeywordTableIndex is a keyword-based table data structure (inspired by "hash tables"). During index construction, KeywordTableIndex first takes in a dataset of text documents as input, and chunks them up into smaller document chunks. For each text chunk, KeywordTableIndex uses GPT to extract a set of relevant keywords with a keyword extraction prompt. (keywords can include short phrases, like "new york city"). These keywords are then stored in a table, referencing the same text chunk.
Summary Index: SummaryIndex is a simple list-based data structure. During index construction, SummaryIndex takes in a dataset of text documents as input, chunks them up into smaller document chunks, and concatenates them into a list. GPT is not called at all during index construction.
Vector Store Index:
Tree Index: The TreeIndex first takes in a set of text documents as input. It then builds up a tree-index in a bottom-up fashion; each parent node is able to summarize the children nodes using a general summarization prompt; each intermediate node contains text summarizing the components below. Once the index is built, it can be saved to disk as a JSON and loaded for future use.

Using Default VectorSearchIndex

default llama-index VectorSearchIndex

Model: gpt-4, Temperature: 0
Question: What schools did the author apply to?
Response: The author applied to two schools: RISD (Rhode Island School of Design) in the US and the Accademia di Belli Arti in Florence.
Index Time: 0.1009681224822998 
Query Time: 0.027238845825195312  
Response Time: 1.420546054840088

This is correct if we're only talking about undergraduate schools

Model: gpt-4, Temperature: 0
Question: What did the author do growing up?
Response: The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer. They also got a microcomputer and started programming on it, writing simple games and a word processor.
Index Time: 0.09856581687927246 
Query Time: 0.027364253997802734 
Response Time: 2.090113878250122

This is correct

Model: gpt-4, Temperature: 0
Question: What grad schools did the author apply to?
Response: The author applied to two art schools: RISD in the US and the Accademia di Belli Arti in Florence.
Index Time: 0.10110998153686523 
Query Time: 0.0450899600982666 
Response Time: 1.8664560317993164

This is not correct

Model: gpt-4, Temperature: 0
Question: What are some of the programming languages the author learned?
Response: The author learned Lisp and an early version of Fortran.
Index Time: 1.113976001739502 
Query Time: 0.048342227935791016 
Response Time: 1.6827337741851807

This is more or less correct

Using KeywordTableIndex (GPT-4)

index built by gpt-4

Model: gpt-4, Temperature: 0
Question: What schools did the author apply to?
Response: The author applied to three graduate schools: MIT, Yale, and Harvard. Later, he applied to two art schools: RISD in the US, and the Accademia di Belli Arti in Florence.
Index Time: 80.58215999603271 
Query Time: 0.055422067642211914 
Response Time: 5.6499269008636475

This is a more complete answer Note length of time to build the index

Model: gpt-4, Temperature: 0
Question: What did the author do growing up?
Response: Empty Response
Index Time: 0.025336265563964844 
Query Time: 0.027658939361572266 
Response Time: 0.5969069004058838

Could not answer it

Model: gpt-4, Temperature: 0
Question: What grad schools did the author apply to?
Response: The author applied to MIT, Yale, and Harvard for grad school.
Index Time: 0.02649712562561035 
Query Time: 0.028027057647705078 
Response Time: 1.975188970565796

Correct answer

Model: gpt-4, Temperature: None
Question: What are some of the programming languages the author learned?
Response: The author learned Lisp, Fortran, and an early version of Fortran.
Index Time: 0.08376598358154297 
Query Time: 0.04093599319458008 
Reponse Time: 3.3881871700286865

More complete answer

Model: gpt-4, Temperature: 0
Question: What are some of the programming languages the author learned?
Response: The author learned Lisp and an early version of Fortran.
Index Time: 0.025270938873291016 
Query Time: 0.02950906753540039 
Response Time: 3.2701120376586914

Not as complete answer

Using VectorSearchIndex (GPT-4)

Model: gpt-4, Temperature: 0
Question: What schools did the author apply to?
Response: The author applied to two schools: RISD in the US, and the Accademia di Belli Arti in Florence.
Index Time: 1.0310370922088623 
Query Time: 0.04251694679260254 
Response Time: 3.0567898750305176

More or less correct

Model: gpt-4, Temperature: None
Question: What did the author do growing up?
Response: The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer.
Index Time: 0.16034674644470215 
Query Time: 0.03920698165893555 
Reponse Time: 1.7148032188415527

Not as verbose as VectorSearchIndex (default) but is correct

Model: gpt-4, Temperature: 0
Question: What did the author do growing up?
Response: The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer. They also got a microcomputer and started programming on it, writing simple games and a word processor.
Index Time: 0.1014101505279541 
Query Time: 0.05443906784057617 
Response Time: 1.5671579837799072

This is correct

Model: gpt-4, Temperature: 0
Question: What grad schools did the author apply to?
Response: The author applied to two grad schools: RISD in the US and the Accademia di Belli Arti in Florence.
Index Time: 0.10022878646850586 
Query Time: 0.043331146240234375 
Response Time: 7.19597601890564

Incorrect

Model: gpt-4, Temperature: 0
Question: What are some of the programming languages the author learned?
Response: The author learned Lisp, which was regarded as the language of AI at the time. They also mentioned that the commonly used programming languages then were pretty primitive, and the default language at Cornell was a Pascal-like language called PL/I.
Index Time: 0.09830904006958008 
Query Time: 0.027807950973510742 
Response Time: 1.4504060745239258

Correct, more verbose

Using Summary Index (GPT-4)

Model: gpt-4, Temperature: 0
Question: What schools did the author apply to?
Response: The author submitted applications to three graduate institutions: MIT, Yale, and Harvard. Additionally, he applied to two art institutions: RISD in the United States, and the Accademia di Belli Arti in Florence.
Index Time: 0.06429004669189453 
Query Time: 0.040140628814697266 
Response Time: 78.27172613143921

Correct

Model: gpt-4, Temperature: None
Question: What did the author do growing up?
Response: The author had a diverse range of interests and pursuits during their formative years. They explored their passion for both writing and programming, engaging in activities such as crafting short stories and experimenting with programming on a computer. They even acquired a microcomputer and delved into creating simple games and a word processor. Although they initially intended to study philosophy in college, their path eventually led them to switch to AI. Notably, for their undergraduate thesis, they undertook the challenging task of reverse-engineering a program. Additionally, they pursued computer science in graduate school and even took art classes at Harvard. Ultimately, their journey led them to apply to and attend art schools.
Index Time: 0.01833033561706543 
Query Time: 0.02890491485595703 
Reponse Time: 28.693405866622925

Correct

Model: gpt-4, Temperature: 0
Question: What did the author do growing up?
Response: The author had a diverse range of interests and experiences during their formative years. They explored various fields including art, programming, and writing. Their background in art played a significant role when they were involved in developing the online store builder, as they emphasized the importance of creating an aesthetically pleasing and polished platform for users. Furthermore, they cultivated their love for writing by crafting short stories and also experimented with programming on different computer systems.
Index Time: 0.018696069717407227 
Query Time: 0.03977489471435547 
Response Time: 16.75838303565979

Correct

Model: gpt-4, Temperature: 0
Question: What grad schools did the author apply to?
Response: The author did not mention applying to any graduate schools.
Index Time: 0.018157958984375 
Query Time: 0.02908611297607422 
Response Time: 132.8014588356018

Incorrect (note that in the first question, it correctly noted teh graduate schools)

Model: gpt-4, Temperature: None
Question: What are some of the programming languages the author learned?
Response: The author learned various programming languages including Fortran and Lisp.
Index Time: 0.01894211769104004 
Query Time: 0.03011798858642578 
Reponse Time: 10.65634799003601

Partially correct

Model: gpt-4, Temperature: 0
Question: What are some of the programming languages the author learned?
Response: The author learned Arc, an early version of Fortran, and Lisp.
Index Time: 0.018624544143676758 
Query Time: 0.039716243743896484 
Response Time: 9.900952100753784

Partially correct

Using Tree Index (GPT-4)

Model: gpt-4, Temperature: None
Question: What schools did the author apply to?
Response: Empty Response
Index Time: 38.16082310676575 
Query Time: 0.06797003746032715 
Reponse Time: 1.9578197002410889

No response

Model: gpt-4, Temperature: 0
Question: What schools did the author apply to?
Response: The author applied to three graduate schools: MIT, Yale, and Harvard.
Index Time: 26.344624996185303 
Query Time: 0.08454298973083496 
Response Time: 15.011389970779419

Partially correct

Model: gpt-4, Temperature: None
Question: What did the author do growing up?
Response: The author, growing up, worked on writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. They used an early version of Fortran and had to type programs on punch cards. They later got a microcomputer and started programming on it, writing simple games and a word processor.
Index Time: 0.018891096115112305 
Query Time: 0.038529157638549805 
Response Time: 6.51462459564209

Correct

Model: gpt-4, Temperature: 0
Question: What did the author do growing up?
Response: The author wrote short stories and also worked on programming, specifically on an IBM 1401 computer in their school district.
Index Time: 0.01995706558227539 
Query Time: 0.054466962814331055 
Response Time: 4.16385293006897

Partially correct

Model: gpt-4, Temperature: None
Question: What grad schools did the author apply to?
Response: Empty Response
Index Time: 0.019597768783569336 
Query Time: 0.027894020080566406 
Response Time: 0.9427392482757568

No response

Model: gpt-4, Temperature: 0
Question: What grad schools did the author apply to?
Response: The author applied to MIT, Yale, and Harvard for grad school.
Index Time: 0.018321752548217773 
Query Time: 0.028130054473876953 
Response Time: 3.647555112838745

Correct

Model: gpt-4, Temperature: 0
Question: What are some of the programming languages the author learned?
Response: The author learned an early version of Fortran and later worked with microcomputers, specifically the TRS-80.
Index Time: 0.018574237823486328 
Query Time: 0.02780890464782715 
Response Time: 3.8308730125427246

Partially correct

Code Snippet

import logging
import sys
import time
import os.path, shutil

#
# May need to start this in a new virtual environment
# https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html
# https://docs.llamaindex.ai/en/stable/getting_started/customization.html
#
# python -m venv venv
# source venv/bin/activate
# pip install llama_index 

#
# Configurations
#

# Questions
#myQuestion = "What schools did the author apply to?"
#myQuestion = "What did the author do growing up?"
#myQuestion = "What grad schools did the author apply to?"
myQuestion = "What are some of the programming languages the author learned?"


# Specify model temperature: None, 0, 0.1, ..., 2.0
myTemp = 0

# Model: gpt-4, gpt-3.5-turbo, etc. 
#   Options: https://gpt-index.readthedocs.io/en/stable/module_guides/models/llms.html
myModel = "gpt-4"

# Rebuild index if the index exists (default is False)
rebuildIndex = False

# If you build index, use the default VectorStoreIndex (vs. using myModel (e.g. GPT 4) to build the index)
#    (default is False)
defaultVectorStoreIndex = False



# View queries and logging 
#    Reminder: Level options: logging.DEBUG, logging.ERROR
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
    ServiceContext,
    KeywordTableIndex,
    SummaryIndex,
    TreeIndex,
)

# Configure to use different LLMs
from llama_index.llms import OpenAI

# define LLM
#llm = OpenAI(temperature=0.1, model="gpt-4")
if myTemp == None:
    llm = OpenAI(model=myModel)
else:
    llm = OpenAI(temperature=myTemp, model=myModel)
service_context = ServiceContext.from_defaults(llm=llm)

# Rebuild Index (if specified)
if rebuildIndex:
    if os.path.exists("./storage"):
        shutil.rmtree("./storage") 


index_time = time.time()
# check if storage already exists
if not os.path.exists("./storage"):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()

    # Which type of index to build
    if defaultVectorStoreIndex:
        index = VectorStoreIndex.from_documents(documents)
    else:
        # index = KeywordTableIndex.from_documents(
        #     documents, service_context=service_context
        # )
        # index = VectorStoreIndex.from_documents(
        #     documents, service_context=service_context
        # )
        # index = SummaryIndex.from_documents(
        #     documents, service_context=service_context
        # )
        index = TreeIndex.from_documents(
            documents, service_context=service_context
        )


    # store it for later
    index.storage_context.persist()
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context)


# either way we can now query the index
query_time = time.time()
query_engine = index.as_query_engine()
response_time = time.time()
response = query_engine.query(myQuestion)
complete_time = time.time()

# To screen
print ("Model: %s, Temperature: %s" % (myModel, myTemp) )
print("Question: %s" % myQuestion)
print("Response: %s" % response)
print("Index Time: %s " % (query_time - index_time))
print("Query Time: %s " % (response_time - query_time))
print("Response Time: %s " % (complete_time - response_time))

dennyglee/rag-testing-llama-index-indexes.md

RAG testing using different indexes with llamaindex

Different Indexes result in different answers

Table summary

Different Types of Indexes

Using Default VectorSearchIndex

Using KeywordTableIndex (GPT-4)

Using VectorSearchIndex (GPT-4)

Using Summary Index (GPT-4)

Using Tree Index (GPT-4)

Code Snippet