Skip to content

Instantly share code, notes, and snippets.

View raphaelcosta's full-sized avatar
:atom:

Raphael Costa raphaelcosta

:atom:
View GitHub Profile

Question: Should I avoid using RAG for my AI application after reading that "RAG is dead" for coding agents?

Many developers are confused about when and how to use RAG after reading articles claiming "RAG is dead." Understanding what RAG actually means versus the narrow marketing definitions will help you make better architectural decisions for your AI applications.

Answer: The viral article claiming RAG is dead specifically argues against using naive vector database retrieval for autonomous coding agents, not RAG as a whole. This is a crucial distinction that many developers miss due to misleading marketing.

RAG simply means Retrieval-Augmented Generation - using retrieval to provide relevant context that improves your model's output. The core principle remains essential: your LLM needs the right context to generate accurate answers. The question isn't whether to use retrieval, but how to retrieve effectively.

For coding

@Olshansk
Olshansk / llm.sh
Last active September 28, 2024 00:47
A bash wrapper around python's mlx_whisper to leverage the GPU on a mac for transcription
# A one liner to leverage the GPU on a mac to transcribe audio files
# Inspired by https://simonwillison.net/2024/Aug/13/mlx-whisper/
llm_transcribe_recording () {
local file_path="$1"
python3 -c "
import mlx_whisper
result = mlx_whisper.transcribe('$file_path', path_or_hf_repo='mlx-community/distil-whisper-large-v3')
print(result['text'])
"
}
@veekaybee
veekaybee / normcore-llm.md
Last active June 27, 2025 19:34
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

@adrienbrault
adrienbrault / llama2-mac-gpu.sh
Last active April 8, 2025 13:49
Run Llama-2-13B-chat locally on your M1/M2 Mac with GPU inference. Uses 10GB RAM. UPDATE: see https://twitter.com/simonw/status/1691495807319674880?s=20
# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# Build it
make clean
LLAMA_METAL=1 make
# Download model
export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin
@oneryalcin
oneryalcin / sse_fast_api.py
Last active August 2, 2024 16:40
Server Side Events (SSE) with FastAPi and (partially) Langchain
# I couldn't get return generators from chains so I had to do a bit of low level SSE, Hope this is useful
# Probably you'll use another Vector Store instead of OpenSearch, but if you want to mimic what I did here,
# please use the fork of `OpenSearchVectorSearch` in https://github.com/oneryalcin/langchain
import json
import os
import logging
from typing import List, Generator
@python273
python273 / app.py
Last active December 29, 2024 23:37
Flask Streaming Langchain Example
import os
os.environ["OPENAI_API_KEY"] = ""
from flask import Flask, Response, request
import threading
import queue
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import AIMessage, HumanMessage, SystemMessage

Hydra Performance Microbenchmark

Important: This microbenchmark is not intended to represent any real workload. Compression ratios, and therefore performance, will depend heavily on the specific workload. This is only for the purpose of illustrating a "columnar friendly" contrived workload that showcases the benefits of columnar.

Schema

@Mr0grog
Mr0grog / constant-pooled-data.mjs
Created April 26, 2022 03:23
Parse Airtable’s ConstantPooledData format.
/**
* Parse Airtable's "ConstantPooledData" format. They recently started using
* this format to compress some API responses, and it appears to be a
* home-grown format.
*
* Call `parseData()` if you have an object with data (e.g. a JSON-parsed API
* response body).
*
* Call `parseString()` if you have a raw string of data (e.g. an API response
* body).
@tabishiqbal
tabishiqbal / _form.html.erb
Last active January 15, 2025 21:39
Ruby on Rails Tom-Select Example with Stimulus controller
<%= form_with(model: team) do |form| %>
<div>
<%= form.label :name %>
<%= form.text_field :name, class: "input" %>
</div>
<div>
<%= f.select :user_id, {}, {placeholder: "Select user"}, {class: "w-full", data: { controller: "select", select_url_value: users_path }} %>
</div>
@coco98
coco98 / track_all_tables.py
Last active September 16, 2021 19:24
Track tables in python (Hasura)
import requests
#Fetch existing tables
tables = requests.post('http://localhost:8080/v1/query', json={
"type":"select",
"args":{
"table": {"schema": "information_schema", "name": "tables"},
"columns": ["table_name"],
"where": {"table_schema": {"$eq": "public"}}
}