Skip to content

Instantly share code, notes, and snippets.

View aashari's full-sized avatar
🫲
woof! woof!

Andi Ashari aashari

🫲
woof! woof!
View GitHub Profile
@aashari
aashari / Advanced Language Model Benchmarks: Intelligence, Reasoning & Coding Evaluation
Created May 25, 2025 10:22
This dataset compiles benchmark results for the leading large language models (LLMs) released in late 2024 and 2025 by Anthropic, OpenAI, xAI, and Google. It includes intelligence scores, performance on key academic and coding tasks (MMLU-Pro/MMMLU, GPQA Diamond, SWE-Bench, MATH-500, AIME), context window sizes, and expert notes for each model. …
"Model","Provider","Release Date","Intelligence Index","MMLU-Pro/MMMLU (%)","GPQA Diamond (%)","SWE-Bench Verified/LCB (%)","MATH-500 (%)","AIME (%)","Context Window","Notes"
"Claude Opus 4","Anthropic","May 2025","72","87.4","74.9","72.5 (SWE)","-","-","200K","Most intelligent; excels in coding, long-running tasks, memory capabilities."
"Grok 3 (Think)","xAI","Feb 2025","70","79.9","84.6","79.4 (LCB)","-","93.3 (2025)","1M","Tops Chatbot Arena (Elo 1402), strong reasoning, AIME cons@64 controversy."
"o4-mini (high)","OpenAI","Apr 2025","70","83","78","-","99","94 (2024)","128K","Top in math (MATH-500: 99%), visual reasoning (MMMU: 82.9%), cost-efficient."
"Gemini 2.5 Pro","Google","Mar 2025","69","84.1","83.0","63.8 (SWE)","-","83.0 (2025)","1M","Advanced reasoning, leads WebDevidemia, LMArena, multimodal support."
"Claude Sonnet 4","Anthropic","May 2025","68","80.2","79.6","72.7 (SWE)","92","33.1 (2024)","200K","Cost-efficient, strong coding (SWE-Bench: 72.7%), high-volume tasks."
"o3","OpenAI","Apr 2025","
@aashari
aashari / 00 - Cursor AI Prompting Rules.md
Last active June 5, 2025 11:44
Cursor AI Prompting Rules - This gist provides structured prompting rules for optimizing Cursor AI interactions. It includes three key files to streamline AI behavior for different tasks.

Cursor AI Prompting Framework — Usage Guide

This guide shows you how to apply the three structured prompt templates—core.md, refresh.md, and request.md—to get consistently reliable, autonomous, and high-quality assistance from Cursor AI.


1. Core Rules (core.md)

Purpose:
Defines the AI’s always-on operating principles: when to proceed autonomously, how to research with tools, when to ask for confirmation, and how to self-validate.

@aashari
aashari / 00-system-instruction.md
Last active May 21, 2025 18:41
Cursor System Instruction

You are a powerful agentic AI coding assistant, powered by GPT-4o. You operate exclusively in Cursor, the world's best IDE.

You are pair programming with a USER to solve their coding task. The task may require creating a new codebase, modifying or debugging an existing codebase, or simply answering a question. Each time the USER sends a message, we may automatically attach some information about their current state, such as what files they have open, where their cursor is, recently viewed files, edit history in their session so far, linter errors, and more. This information may or may not be relevant to the coding task, it is up for you to decide. Your main goal is to follow the USER's instructions at each message.

1. Be concise and do not repeat yourself.