Skip to content

Instantly share code, notes, and snippets.

View abodacs's full-sized avatar

Abdullah Mohammed abodacs

View GitHub Profile
@corbt
corbt / 1_results.txt
Last active June 13, 2025 12:42
Benchmark script for reward model performance
Strategy | Relative Throughput | Time (s) | Cost ($/M tokens)
----------------------------------------------------------------------------------------
Unsloth | 2.17 | 3.83 | $0.0188
Unsloth+PEFT | 1.58 | 5.27 | $0.0259
Transformers+Liger | 1.14 | 7.28 | $0.0358
vLLM | 1.00 | 8.31 | $0.0409
Transformers | 0.97 | 8.54 | $0.0420
Transformers+Liger+PEFT | 0.84 | 9.85 | $0.0484
Transformers+PEFT | 0.74 | 11.26 | $0.0554

Question: Should I avoid using RAG for my AI application after reading that "RAG is dead" for coding agents?

Many developers are confused about when and how to use RAG after reading articles claiming "RAG is dead." Understanding what RAG actually means versus the narrow marketing definitions will help you make better architectural decisions for your AI applications.

Answer: The viral article claiming RAG is dead specifically argues against using naive vector database retrieval for autonomous coding agents, not RAG as a whole. This is a crucial distinction that many developers miss due to misleading marketing.

RAG simply means Retrieval-Augmented Generation - using retrieval to provide relevant context that improves your model's output. The core principle remains essential: your LLM needs the right context to generate accurate answers. The question isn't whether to use retrieval, but how to retrieve effectively.

For coding

@stevebauman
stevebauman / vite.config.js
Last active May 8, 2025 01:08
Vite Server Cors Allow Any Subdomain
import { defineConfig, loadEnv } from 'vite';
// ...
export default defineConfig(({ mode }) => {
const env = loadEnv(mode, process.cwd());
const { protocol, hostname } = new URL(env.VITE_URL);
const root = hostname.split('.').slice(-2).join('\\.');
@perfectbase
perfectbase / await.tsx
Last active June 12, 2025 01:34
Await component for tRPC with prefetch
/* eslint-disable @typescript-eslint/no-explicit-any */
import { type TRPCQueryOptions } from '@trpc/tanstack-react-query';
import { unstable_noStore } from 'next/cache';
import { Fragment, Suspense, type ReactNode } from 'react';
import { ErrorBoundary } from 'react-error-boundary';
import { HydrateClient, prefetch as prefetchTRPC } from '@/trpc/server';
type AwaitProps<T> =
| {
promise: Promise<T>;
@joseph-crowley
joseph-crowley / shadcn.md
Created March 17, 2025 19:58
initial solution for shadcn llms.txt

Below is a compressed yet complete reference for quickly integrating each shadcn component. Assumption: you already have the files from your question in @/components/ui/*.tsx and can import them directly. All components accept typical React props plus any Radix/3rd-party props. Adjust styling and props as needed.Do not rewrite any of the code for the shadcn components.


1. Accordion

Import

import {
  Accordion,
  AccordionItem,
@kalomaze
kalomaze / gist:37c70e022cb1e9428ebb1ee7a4b52275
Last active April 5, 2025 10:57
GRPO Reinforcement Learning - 7b GSM8k on 8xH100 / 8xA100
# the "verifiers" repository is a clean implementation of templated GRPO reinforcement learning training environments
# this is a generic set of "install from scratch" commands complete with a deepspeed z3 config that i have been using when i spin up nodes
# it will run on the gsm8k example w/ default batch size & generation size (8), and the 8th GPU is used for vllm generations
# qwen 14b full finetuning will run on this configuration too without LoRA or CUDA OOM, at least for the gsm8k task's context sizes + generation lengths
# hyperparameters are controlled by `verifiers/utils/config_utils.py`; i have been preferring extreme grad clipping (between 0.001 and 0.01) and low beta (under 0.01)
# NOTE FEB 27: examples have moved into `verifiers/examples` not `/examples`
cd /root
mkdir boom
@sayakpaul
sayakpaul / grade_images_with_gemini.py
Last active June 8, 2025 15:46
Shows how to use Gemini Flash 2.0 to grade images on multiple aspects like accuracy to prompt, emotional and thematic response, etc.
from google import genai
from google.genai import types
import typing_extensions as typing
from PIL import Image
import requests
import io
import json
import os
@qunash
qunash / grpo_qwen-0-5b_single_t4.ipynb
Last active June 2, 2025 01:10
grpo_qwen-0-5b_single_t4.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Chillee
Chillee / merge_attention.py
Last active April 11, 2025 20:58
Merge Attention
import torch
from torch.nn.attention.flex_attention import create_block_mask, flex_attention
torch.set_default_device('cuda')
q, k, v = [torch.randn(8, 8, 1024, 64, requires_grad=True) for _ in range(3)]
causal_mask = create_block_mask(lambda b, h, q_idx, kv_idx: q_idx >= kv_idx, None, None, 1024, 1024)
uncausal_mask = create_block_mask(lambda b, h, q_idx, kv_idx: q_idx < kv_idx, None, None, 1024, 1024)
ref_out = flex_attention(q, k, v)
causal_out, causal_lse = flex_attention(q, k, v, block_mask=causal_mask, return_lse=True)
@willccbb
willccbb / grpo_demo.py
Last active June 16, 2025 05:05
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
"""
citation:
@misc{brown2025grpodemo,
title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
author={Brown, William},