Last active
August 16, 2024 12:00
-
-
Save ranfysvalle02/86f4b89cfe1d33bbc0a76fb28858aba3 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from youtube_transcript_api import YouTubeTranscriptApi | |
from duckduckgo_search import DDGS | |
from openai import AzureOpenAI | |
# Replace with your actual values | |
AZURE_OPENAI_ENDPOINT = "https://DEMO.openai.azure.com" | |
AZURE_OPENAI_API_KEY = "" | |
deployment_name = "gpt-4-32k" # The name of your model deployment | |
client = AzureOpenAI(azure_endpoint=AZURE_OPENAI_ENDPOINT,api_version="2023-07-01-preview",api_key=AZURE_OPENAI_API_KEY) | |
# Replace with your actual values - if desired | |
VIDEO_IDS = [ | |
] | |
CONCEPT = '' | |
def extract_youtube_id_from_href(href_url): | |
# Split the URL on the '=' character | |
url_parts = href_url.split('=') | |
# The video ID is the part after 'v', which is the last part of the URL | |
video_id = url_parts[-1] | |
return video_id | |
def get_transcript(video_id): | |
"""Fetches the transcript for a given YouTube video ID. | |
Args: | |
video_id: The ID of the YouTube video. | |
Returns: | |
A list of transcript segments, or None if no transcript is found. | |
""" | |
try: | |
transcript = YouTubeTranscriptApi.get_transcript(video_id) | |
alltext = (' '.join(item['text'] for item in transcript)) | |
return alltext | |
except Exception as e: | |
print(f"Error fetching transcript for {video_id}: {e}") | |
return None | |
## START THE SHOW | |
if CONCEPT == '': | |
CONCEPT = input("What would you like to learn about? ") | |
if len(VIDEO_IDS) == 0: | |
query = CONCEPT + " site:youtube.com" | |
results = DDGS().text(str(query),region="us-en", max_results=5) | |
for result in results: | |
VIDEO_IDS.append(extract_youtube_id_from_href(result["href"])) | |
all_videos_str = "" | |
for video_id in VIDEO_IDS: | |
vidtxt = get_transcript(video_id) | |
if vidtxt: | |
all_videos_str += "[video_id:"+video_id+"]\n"+str(vidtxt) + "\n[end video_id:"+video_id+"]\n" | |
messages = [ | |
{"role": "system", "content": "You are a helpful assistant that summarizes multiple video transcripts into a comprehensive set of detailed notes."}, | |
{"role": "user", "content": "I'm trying to learn about " + CONCEPT}, | |
{"role": "user", "content": "This is all the video transcript text I found online: " + all_videos_str }, | |
{"role": "user", "content": "Give me a comprehensive set of notes. Minimum of 15000 characters. Think critically and step by step."}, | |
] | |
print(messages) | |
ai_msg = completion = client.chat.completions.create( | |
model=deployment_name, | |
messages=messages, | |
) | |
print("--------------------------") | |
print(ai_msg.choices[0].message.content) |
Author
ranfysvalle02
commented
Aug 10, 2024
- "What do you want to learn about?" (await user input)
- (use input to search the web for youtube videos on the input)
- extract video transcript
- summarize all the video transcripts
- build a comprehensive list of notes on the topic
1. Introduction to GraphRAG: GraphRAG is an advanced version of the Retrieval-Augmented Generation (RAG) model, developed by Microsoft. Unlike the baseline RAG, which uses vector databases for retrieval, GraphRAG uses Knowledge Graphs. This allows for an enhanced semantic understanding of the data, leading to more accurate retrieval of information.
2. Two-Step Process: GraphRAG operates in two stages. Firstly, it indexes private data to create Language Model (LLM) derived Knowledge Graphs. These graphs serve as a memory representation for the LLM. Secondly, it uses the indices from the first step to construct improved RAG operations.
3. Differentiators: The key differentiators of GraphRAG include enhanced search relevancy and the enablement of new scenarios requiring larger context analysis. It provides a holistic view of semantics across the entire data set.
4. Operation: The operation of GraphRAG starts with the extraction of sentences from text chunks. LLMs perform reasoning operations over these sentences, identifying not just named entities but also relationships between these entities. This allows for the creation of weighted graphs from relationships, providing a richer understanding of the data.
5. Knowledge Graph Construction: These knowledge graphs consist of nodes (entities) connected by relationships. Once this graph is constructed, Graph Machine Learning is used for semantic aggregations over the structures, allowing for granular filtering and the ability to ask questions at any level of granularity across the data set.
6. Use Cases: These knowledge graphs can be used for data set question generation, summarization, Q&A, and other methods. It also helps provide evidence and grounding for any information extracted, critical for analysts.
7. Global vs Local Search: GraphRAG offers two search methods, global and local. Global search queries the entire Knowledge Graph, providing comprehensive results and is best suited for answering generic queries. Local search, on the other hand, focuses on specific subgraphs or entities, providing highly specific results.
8. Implementation: GraphRAG can be implemented using OpenAI's platform Llama and LM Studio. The process involves pulling the required model (such as GPT-3) from the platform, running the trained model over the data to create knowledge graphs, and finally configuring the retrieval process.
9. Use of GraphRAG: GraphRAG can be used to analyze large data sets such as scientific literature, articles, books, etc. It allows for the extraction of entities and relationships, construction of Knowledge Graphs, community summary generation, and finally, the retrieval process for answering queries.
10. Future of GraphRAG: The concept of GraphRAG has opened up new possibilities in the field of AI and Machine Learning. It presents a significant opportunity for developers to build applications that can process unstructured data and generate meaningful insights. It is anticipated that the use of GraphRAG will expand beyond chatbots and into dynamic content generation, making it a valuable tool in the world of AI.
11. Conclusion: GraphRAG is a powerful tool that enhances the retrieval process by using Knowledge Graphs instead of traditional vector databases. It provides a more comprehensive understanding of the data by recognizing not just entities but also their relationships, improving the quality of the search results. With potential applications ranging from chatbots to dynamic content generation, GraphRAG holds significant promise in the AI and ML space.
Note: To fully understand and utilize GraphRAG, a solid understanding of Graph Analytics, entity extraction, and relationship extraction is recommended. Additionally, knowledge of platforms like Llama and LM Studio can also be beneficial for implementation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment