Skip to content

Instantly share code, notes, and snippets.

View m0o0scar's full-sized avatar
πŸ’»
Oscar + coffee + AI => code

Oscar m0o0scar

πŸ’»
Oscar + coffee + AI => code
  • Sea
  • Singapore
View GitHub Profile
@m0o0scar
m0o0scar / πŸ“– VikParuchuri_marker.md
Last active July 31, 2024 10:01
VikParuchuri/marker. Continue this conversation at http://localhost:3000?gist=eebeb4250a5f0774f7717d9d234bd7d4
@m0o0scar
m0o0scar / πŸ“– From RAG to RICHES! Retrieval Interlaced with Sequence Generation.md
Last active July 31, 2024 00:13
From RAG to RICHES: Retrieval Interlaced with Sequence Generation. Continue this conversation at https://readfm.vercel.app?gist=96b25d25a0b9c3530fdb833b5df86b1e

[arxiv] From RAG to RICHES: Retrieval Interlaced with Sequence Generation

Source

Palak Jain, Livio Baldini Soares, Tom Kwiatkowski

We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks. RICHES offers an alternative to conventional RAG systems by eliminating the need for separate retriever and generator. It retrieves documents by directly decoding their contents, constrained on the corpus. Unifying retrieval with generation allows us to adapt to diverse new tasks via prompting alone. RICHES can work with any Instruction-tuned model, without additional training. It provides attributed evidence, supports multi-hop retrievals and interleaves thoughts to plan on what to retrieve next, all within a single decoding pass of the LLM. We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.

URL: https://arxiv.org/abs/2407.00361

@m0o0scar
m0o0scar / πŸ“– FaaF! Facts as a Function for the evaluation of generated text.md
Created July 30, 2024 00:55
FaaF: Facts as a Function for the evaluation of generated text. Continue this conversation at https://readfm.vercel.app?gist=6a7d993500e8e75f595f204d2405e51e

[arxiv] FaaF: Facts as a Function for the evaluation of generated text

Source

Vasileios Katranidis, Gabor Barany

The demand for accurate and efficient verification of information in texts generated by large language models (LMs) is at an all-time high, but remains unresolved. Recent efforts have focused on extracting and verifying atomic facts from these texts via prompting LM evaluators. However, we demonstrate that this method of prompting is unreliable when faced with incomplete or inaccurate reference information. We introduce Facts as a Function (FaaF), a new approach to the fact verification task that leverages the function-calling capabilities of LMs. FaaF significantly enhances the ability of LMs to identify unsupported facts in texts, while also improving efficiency and significantly lowering costs compared to prompt-based methods. Additionally, we propose a framework for evaluating factual recall in Retrieval Augmented Generation (RAG) systems, which we employ to compare prompt-based and

@m0o0scar
m0o0scar / πŸ“– RAGAS! Automated Evaluation of Retrieval Augmented Generation.md
Created July 30, 2024 00:36
RAGAS: Automated Evaluation of Retrieval Augmented Generation. Continue this conversation at https://readfm.vercel.app?gist=3d2cd79b57833f2aa497aa930d754dd0

[arxiv] RAGAS: Automated Evaluation of Retrieval Augmented Generation

Source

Shahul Es, Jithin James, Luis Espinosa-Anke, Steven Schockaert

We introduce RAGAs (Retrieval Augmented Generation Assessment), a framework for reference-free evaluation of Retrieval Augmented Generation (RAG) pipelines. RAG systems are composed of a retrieval and an LLM based generation module, and provide LLMs with knowledge from a reference textual database, which enables them to act as a natural language layer between a user and textual databases, reducing the risk of hallucinations. Evaluating RAG architectures is, however, challenging because there are several dimensions to consider: the ability of the retrieval system to identify relevant and focused context passages, the ability of the LLM to exploit such passages in a faithful way, or the quality of the generation itself. With RAGAs, we put forward a suite of metrics which can be used to evaluate these different dimensions \textit{without having to rely on ground

@m0o0scar
m0o0scar / πŸ“– LongRoPE! Extending LLM Context Window Beyond 2 Million Tokens.md
Created July 30, 2024 00:16
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens. Continue this conversation at https://readfm.vercel.app?gist=e2887998ba3cd89102495694ac70331f

[arxiv] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Source

Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang

Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a pro

@m0o0scar
m0o0scar / πŸ“– LazyLLM! Dynamic Token Pruning for Efficient Long Context LLM Inference.md
Created July 29, 2024 13:38
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference. Continue this conversation at https://readfm.vercel.app?gist=a02c603d24e089e03e2813148e480043

[arxiv] LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Source

Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi

The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first token. Consequently, the prefilling stage may become a bottleneck in the generation process. An open question remains whether all prompt tokens are essential for generating the first token. To answer this, we introduce a novel method, LazyLLM, that selectively computes the KV for tokens important for the next token prediction in both the prefilling and decoding stages. Contrary to static pruning approaches that prune the prompt at once, La

@m0o0scar
m0o0scar / πŸ“– Wolf! Captioning Everything with a World Summarization Framework.md
Last active July 29, 2024 13:19
Wolf: Captioning Everything with a World Summarization Framework. Continue this conversation at https://readfm.vercel.app?gist=f9066a087b472e5285f722c8fc6a46f7

[arxiv] Wolf: Captioning Everything with a World Summarization Framework

Source

Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone

We propose Wolf, a WOrLd summarization Framework for accurate video captioning. Wolf is an automated captioning framework that adopts a mixture-of-experts approach, leveraging complementary strengths of Vision Language Models (VLMs). By utilizing both image and video models, our framework captures different levels of information and summarizes them efficiently. Our approach can be applied to enhance video understanding, auto-labeling, and captioning. To evaluate caption quality, we introduce CapScore, an LLM-based metric to assess the similarity and quality of generated captions compared to the ground truth captions. We further build four human-annotated datasets in three