AI / ML Toolkit

Some notes on AI / ML tools that seem interesting/useful (largely aiming to focus on open source tools)

See Also
- My Other Related Deepdive Gist's and Projects
OpenRouter
ollama
llama.cpp
- node-llama-cpp
vLLM
LiteLLM
Protocols / Standards / etc
- Agent2Agent Protocol (A2A)
- Model Context Protocol (MCP)
SDKs / Toolkits / etc
AI Agents / Assistants / etc
Code Generation / Execution
- Code Leaderboards / Benchmarks
Vision / Multimodal
Image Generation
Song / Audio Generation
Infrastructure and Hosting (Cloud GPUs, etc)
- Runpod
Vector Databases/Search, Similarity Search, Clustering, etc
- Faiss
Benchmarks / Leaderboards
Prompts / Prompt Engineering / etc
Other Useful Tools / Libraries / etc
- Unsorted
- Node-based UI's, Graph Execution, Flow Based Programming, etc
Unsorted

OpenRouter

https://openrouter.ai/
- The Unified Interface For LLMs
- Better prices, better uptime, no subscription.
- https://openrouter.ai/rankings
  - Rankings
- https://openrouter.ai/models
  - Models
- https://openrouter.ai/chat
  - Chat
- https://openrouter.ai/docs/
  - Docs - Quickstart
  - Get started with OpenRouter
  - OpenRouter provides a unified API that gives you access to hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Get started with just a few lines of code using your preferred SDK or framework.
  - https://openrouter.ai/docs/community/frameworks
    - Frameworks
    - Using OpenRouter with Frameworks

ollama

https://github.com/ollama/ollama
- Get up and running with Llama 2 and other large language models locally
- https://github.com/ollama/ollama#model-library
  - Ollama supports a list of open-source models available on https://ollama.ai/library
- https://github.com/ollama/ollama#customize-your-own-model
  - Ollama supports importing GGUF models in the Modelfile
  - https://medium.com/@phillipgimmi/what-is-gguf-and-ggml-e364834d241c
    - GGUF and GGML are file formats used for storing models for inference, particularly in the context of language models like GPT (Generative Pre-trained Transformer).
    - In summary, GGUF is positioned as an upgrade to GGML, offering more flexibility, extensibility, and compatibility. It aims to simplify the user experience and accommodate various models beyond llama.cpp. GGML, while a valuable early effort, had limitations that GGUF seeks to overcome.
- https://github.com/ollama/ollama#cli-reference
  - CLI Reference
- https://github.com/ollama/ollama#rest-api
  - Ollama has a REST API for running and managing models
- https://github.com/ollama/ollama#community-integrations
  - Community Integrations
  - https://github.com/hinterdupfinger/obsidian-ollama
    - This is a plugin for Obsidian that allows you to use Ollama within your notes.
- https://github.com/ollama/ollama-python
  - Ollama Python Library The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.
- https://github.com/ollama/ollama-js
  - Ollama JavaScript Library The Ollama JavaScript library provides the easiest way to integrate your JavaScript project with Ollama.
- https://ollama.ai/
  - https://ollama.ai/library
    - https://ollama.ai/library/mistral
      - Mistral 7B model is an Apache licensed 7.3B parameter model. It is available in both instruct (instruction following) and text completion.
    - https://ollama.ai/library/llama2
      - Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.
    - https://ollama.ai/library/codellama
      - Code Llama is a model for generating and discussing code, built on top of Llama 2. It’s designed to make workflows faster and efficient for developers and make it easier for people to learn how to code. It can generate both code and natural language about code. Code Llama supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash and more.
    - https://ollama.ai/library/zephyr
      - Zephyr is a 7 billion parameter model, fine-tuned on Mistral to achieve results similar to Llama 2 70B Chat in several benchmarks (ARC, HellaSwag, MMLU, TruthfulQA). It was fine-tuned using Direct Performance Optimization This model has the built-in alignment of its source datasets removed.
    - etc
  - https://ollama.ai/blog
    - https://ollama.ai/blog/python-javascript-libraries
      - Python & JavaScript Libraries

llama.cpp

https://github.com/ggml-org/llama.cpp
- llama.cpp
- Inference of Meta's LLaMA model (and others) in pure C/C++
- LLM inference in C/C++
- The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.
  - Plain C/C++ implementation without any dependencies
  - Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
  - AVX, AVX2, AVX512 and AMX support for x86 architectures
  - 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
  - Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA)
  - Vulkan and SYCL backend support
  - CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
- https://github.com/ggml-org/llama.cpp#supported-backends
  - Supported Backends
- https://github.com/ggml-org/llama.cpp#llama-cli
  - llama-cli
  - A CLI tool for accessing and experimenting with most of llama.cpp's functionality.
- https://github.com/ggml-org/llama.cpp#llama-server
  - llama-server
  - A lightweight, OpenAI API compatible, HTTP server for serving LLMs.
- https://github.com/ggml-org/llama.cpp#llama-perplexity
  - llama-perplexity
  - A tool for measuring the perplexity (and other quality metrics) of a model over a given text.
- https://github.com/ggml-org/llama.cpp#llama-bench
  - llama-bench
  - Benchmark the performance of the inference for various parameters.
- https://github.com/ggml-org/llama.cpp#llama-run
  - llama-run
  - A comprehensive example for running llama.cpp models. Useful for inferencing. Used with RamaLama
- https://github.com/ggml-org/llama.cpp#llama-simple
  - llama-simple
  - A minimal example for implementing apps with llama.cpp. Useful for developers.

node-llama-cpp

https://github.com/withcatai/node-llama-cpp
- node-llama-cpp Run AI models locally on your machine
- Pre-built bindings are provided with a fallback to building from source with cmake
- Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
https://node-llama-cpp.withcat.ai/
- node-llama-cpp
- Run AI models locally on your machine
- node.js bindings for llama.cpp, and much more

vLLM

https://github.com/vllm-project/vllm
- A high-throughput and memory-efficient inference and serving engine for LLMs
- vLLM is a fast and easy-to-use library for LLM inference and serving.
- https://blog.vllm.ai/
  - https://blog.vllm.ai/2023/06/20/vllm.html
    - vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
  - https://blog.vllm.ai/2023/11/14/notes-vllm-vs-deepspeed.html
    - Notes on vLLM v.s. DeepSpeed-FastGen

LiteLLM

https://www.litellm.ai/
- LiteLLM
- LLM Gateway to provide model access, fallbacks and spend tracking across 100+ LLMs. All in the OpenAI format.
- https://docs.litellm.ai/
  - LiteLLM - Getting Started
  - You can use litellm through either:
    - LiteLLM Proxy Server - Server (LLM Gateway) to call 100+ LLMs, load balance, cost tracking across projects
    - LiteLLM python SDK - Python Client to call 100+ LLMs, load balance, cost tracking
  - https://docs.litellm.ai/#litellm-python-sdk
    - LiteLLM Python SDK
  - https://docs.litellm.ai/#litellm-proxy-server-llm-gateway
    - LiteLLM Proxy Server (LLM Gateway)
- https://docs.litellm.ai/docs/hosted
  - Hosted LiteLLM Proxy
- https://models.litellm.ai/
  - LLM Model Cost Map
https://github.com/BerriAI/litellm
- LiteLLM
- Call all LLM APIs using the OpenAI format (Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.)

Protocols / Standards / etc

Agent2Agent Protocol (A2A)

https://google.github.io/A2A/#/
- https://github.com/google/A2A
  - A2A protocol
  - An open protocol enabling communication and interoperability between opaque agentic applications.
- https://github.com/google/A2A/tree/main/samples
  - https://github.com/google/A2A/blob/main/samples/python/agents/google_adk/README.md
    - Google Agent Development Kit (ADK)
  - https://github.com/google/A2A/blob/main/samples/python/agents/langgraph/README.md
    - LangGraph
  - https://github.com/google/A2A/blob/main/samples/js/src/agents/README.md
    - Firebase GenKit
https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
- Announcing the Agent2Agent Protocol (A2A)
- A2A is an open protocol that complements Anthropic's Model Context Protocol (MCP), which provides helpful tools and context to agents. Drawing on Google's internal expertise in scaling agentic systems, we designed the A2A protocol to address the challenges we identified in deploying large-scale, multi-agent systems for our customers. A2A empowers developers to build agents capable of connecting with any other agent built using the protocol and offers users the flexibility to combine agents from various providers. Critically, businesses benefit from a standardized method for managing their agents across diverse platforms and cloud environments. We believe this universal interoperability is essential for fully realizing the potential of collaborative AI agents.
https://developers.googleblog.com/en/google-cloud-donates-a2a-to-linux-foundation/
- Google Cloud donates A2A to Linux Foundation

Model Context Protocol (MCP)

See Also:
- Model Context Protocol (MCP) Tools (0xdevalias' gist)
https://github.com/modelcontextprotocol
- Model Context Protocol
- A protocol for seamless integration between LLM applications and external data sources
https://modelcontextprotocol.io/introduction
- Introduction
- Get started with the Model Context Protocol (MCP)
- MCP is an open protocol that standardizes how applications provide context to LLMs.
https://docs.anthropic.com/en/docs/agents-and-tools/mcp
- Agents and tools Model Context Protocol (MCP)
- MCP is an open protocol that standardizes how applications provide context to LLMs.
https://www.anthropic.com/news/model-context-protocol
- Introducing the Model Context Protocol
- Today, we're open-sourcing the Model Context Protocol (MCP), a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments. Its aim is to help frontier models produce better, more relevant responses.
- It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol. The result is a simpler, more reliable way to give AI systems access to the data they need.

SDKs / Toolkits / etc

Google Agent Development Kit (ADK)

https://developers.googleblog.com/en/agent-development-kit-easy-to-build-multi-agent-applications/
- Agent Development Kit: Making it easy to build multi-agent applications
https://google.github.io/adk-docs/
- Agent Development Kit An open-source AI agent framework integrated with Gemini and Google
- What is Agent Development Kit? Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents. ADK can be used with popular LLMs and open-source generative AI tools and is designed with a focus on tight integration with the Google ecosystem and Gemini models. ADK makes it easy to get started with simple agents powered by Gemini models and Google AI tools while providing the control and structure needed for more complex agent architectures and orchestration.
- https://github.com/google/adk-docs
  - adk-docs
- https://cloud.google.com/vertex-ai/generative-ai/docs/agent-development-kit/quickstart
  - Vertex AI: Quickstart: Build an agent with the Agent Development Kit
  - This quickstart guides you through setting up your Google Cloud project, installing the Agent Development Kit (ADK), setting up a basic agent, and running its developer user interface.
https://github.com/google/adk-python
- Agent Development Kit (ADK)
- An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
https://github.com/google/adk-samples
- Agent Development Kit (ADK) Samples
- A collection of sample agents built with Agent Development (ADK)
- Welcome to the Sample Agents repository! This collection provides ready-to-use agents built on top of the Agent Development Kit, designed to accelerate your development process. These agents cover a range of common use cases and complexities, from simple conversational bots to complex multi-agent workflows.

Vercel AI SDK / Toolkit

https://github.com/vercel/ai
- AI SDK The AI SDK is a TypeScript toolkit designed to help you build AI-powered applications using popular frameworks like Next.js, React, Svelte, Vue and runtimes like Node.js.
- https://github.com/vercel/ai#ai-sdk-core
  - AI SDK Core The AI SDK Core module provides a unified API to interact with model providers like OpenAI, Anthropic, Google, and more.
    
    You will then install the model provider of your choice.
- https://github.com/vercel/ai#ai-sdk-ui
  - AI SDK UI The AI SDK UI module provides a set of hooks that help you build chatbots and generative user interfaces. These hooks are framework agnostic, so they can be used in Next.js, React, Svelte, and Vue.
    
    You need to install the package for your framework
https://sdk.vercel.ai/
- The AI Toolkit for TypeScript From the creators of Next.js, the AI SDK is a free open-source library that gives you the tools you need to build AI-powered products.
- https://sdk.vercel.ai/docs/introduction
  - AI SDK The AI SDK is the TypeScript toolkit designed to help developers build AI-powered applications and agents with React, Next.js, Vue, Svelte, Node.js, and more.
  - https://sdk.vercel.ai/docs/reference
    - API Reference
https://vercel.com/templates?type=ai
- Find your Template Jumpstart your app development process with pre-built solutions from Vercel and our community.

GenKit

https://github.com/firebase/genkit
- Genkit is a framework for building AI-powered applications. It provides open source libraries for Node.js and Go, along with tools to help you debug and iterate quickly.
- Genkit is built for developers seeking to add generative AI to their apps with Node.js or Go, and can run anywhere these runtimes are supported. It's designed around a plugin architecture that can work with any generative model API or vector database, with many integrations already available.
- While developed by the Firebase team, Genkit can be used independently of Firebase or Google Cloud services.
https://firebase.google.com/docs/genkit
- Genkit - JS
- Genkit is an open-source TypeScript toolkit designed to help you build AI-powered features in web and mobile apps.
  
  It offers a unified interface for integrating AI models from Google, OpenAI, Anthropic, Ollama, and more, so you can explore and choose the best models for your needs. Genkit simplifies AI development with streamlined APIs for multimodal content generation, structured data generation, tool calling, human-in-the-loop, and other advanced capabilities.
  
  Whether you're building chatbots, intelligent agents, workflow automations, or recommendation systems, Genkit handles the complexity of AI integration so you can focus on creating incredible user experiences.
https://firebase.google.com/docs/genkit-go/get-started-go
- GenKit - Go
- Get started with Genkit using Go

LangChain, LangServe, LangSmith, LangFlow, LangGraph, etc

https://github.com/langchain-ai/langchain
- Building applications with LLMs through composability
- LangChain is a framework for developing applications powered by language models.
- Looking for the JS/TS library? Check out LangChain.js
- https://www.langchain.com/
- https://python.langchain.com/docs/get_started/introduction
https://github.com/langchain-ai/langchainjs
- LangChain.js
- Building applications with LLMs through composability
- This is built to integrate as seamlessly as possible with the LangChain Python package. Specifically, this means all objects (prompts, LLMs, chains, etc) are designed in a way where they can be serialized and shared between languages.
  
  The LangChainHub is a central place for the serialized versions of these prompts, chains, and agents.
- https://js.langchain.com/docs/get_started/introduction
- https://blog.cloudflare.com/langchain-and-cloudflare/
  - Using LangChainJS and Cloudflare Workers together
https://github.com/hwchase17/langchain-hub
- Taking inspiration from Hugging Face Hub, LangChainHub is collection of all artifacts useful for working with LangChain primitives such as prompts, chains and agents. The goal of this repository is to be a central resource for sharing and discovering high quality prompts, chains and agents that combine together to form complex LLM applications.
  
  We are starting off the hub with a collection of prompts, and we look forward to the LangChain community adding to this collection. We hope to expand to chains and agents shortly.
- This repo is getting replaced by our hosted LangChain Hub Product! Visit it at https://smith.langchain.com/hub
https://github.com/langchain-ai/langserve
- LangServe helps developers deploy LangChain runnables and chains as a REST API.
  
  This library is integrated with FastAPI and uses pydantic for data validation.
  
  In addition, it provides a client that can be used to call into runnables deployed on a server. A javascript client is available in LangChainJS.
https://www.langchain.com/langsmith
- Build and deploy LLM apps with confidence An all-in-one developer platform for every step of the application lifecycle.
- A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
https://github.com/logspace-ai/langflow
- Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.
- https://www.langflow.org/
- https://github.com/logspace-ai/langflow_examples
  - Examples for LangFlow
- https://github.com/logspace-ai/langflow-embedded-chat
  - The Langflow Embedded Chat is a powerful web component that enables seamless communication with the Langflow. This widget provides a chat interface, allowing you to integrate Langflow into your web applications effortlessly.
https://www.langchain.com/langgraph
- LangGraph
- Balance agent control with agency
- Gain control with LangGraph to design agents that reliably handle complex tasks. Build and scale agentic applications with LangGraph Platform.
- https://github.com/langchain-ai/langgraph
  - LangGraph — used by Replit, Uber, LinkedIn, GitLab and more — is a low-level orchestration framework for building controllable agents. While langchain provides integrations and composable components to streamline LLM application development, the LangGraph library enables agent orchestration — offering customizable architectures, long-term memory, and human-in-the-loop to reliably handle complex tasks.
- https://langchain-ai.github.io/langgraph/
  - https://langchain-ai.github.io/langgraphjs/tutorials/quickstart/
    - LangGraph.js - Quickstart
https://github.com/langfuse/langfuse
- Langfuse is the open source LLM engineering platform
- https://langfuse.com/
  - https://langfuse.com/docs
    - Langfuse is an open source LLM engineering platform to help teams collaboratively debug, analyze and iterate on their LLM Applications.
- https://docs.langflow.org/guides/langfuse_integration
  - Integrating Langfuse with Langflow

AI Agents / Assistants / etc

Agent Benchmarks / Leaderboards

See also:
- Benchmarks / Leaderboards
https://github.com/zhangxjohn/LLM-Agent-Benchmark-List
- LLM-Agent-Benchmark-List A benchmark list for evaluation of large language models.
https://github.com/THUDM/AgentBench
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
- https://llmbench.ai/agent
https://aider.chat/docs/leaderboards/
- Aider LLM Leaderboards
https://github.com/princeton-nlp/SWE-bench
- [ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
- SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.
- https://www.swebench.com/
  - https://openai.com/index/introducing-swe-bench-verified/
    - Introducing SWE-bench Verified We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
  - https://www.swebench.com/lite.html
    - SWE-bench Lite A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers
    - SWE-bench was designed to provide a diverse set of codebase problems that were verifiable using in-repo unit tests. The full SWE-bench test split comprises 2,294 issue-commit pairs across 12 python repositories.
      
      Since its release, we've found that for most systems evaluating on SWE-bench, running each instance can take a lot of time and compute. We've also found that SWE-bench can be a particularly difficult benchmark, which is useful for evaluating LMs in the long term, but discouraging for systems trying to make progress in the short term.
      
      To remedy these issues, we've released a canonical subset of SWE-bench called SWE-bench Lite. SWE-bench Lite comprises 300 instances from SWE-bench that have been sampled to be more self-contained, with a focus on evaluating functional bug fixes. SWE-bench Lite covers 11 of the original 12 repositories in SWE-bench, with a similar diversity and distribution of repositories as the original. We perform similar filtering on the SWE-bench dev set to provide 23 development instances that can be useful for active development on the SWE-bench task. We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench Lite in lieu of the full SWE-bench set if necessary.
https://github.com/aorwall/SWE-bench-docker
- A Docker based solution of the SWE-bench evaluation framework
- This is a Dockerfile based solution of the SWE-Bench evaluation framework.
  
  The solution is designed so that each "testbed" for testing a version of a repository is built in a separate Docker image. Each test is then run in its own Docker container. This approach ensures more stable test results because the environment is completely isolated and is reset for each test. Since the Docker container can be recreated each time, there's no need for reinstallation, speeding up the benchmark process.
https://multi-swe-bench.github.io/
- Multi-SWE-bench
- A Multilingual Benchmark for Issue Resolving
- https://github.com/multi-swe-bench/multi-swe-bench
  - Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
  - We are extremely delighted to release Multi-SWE-bench! Multi-SWE-bench addresses the lack of multilingual benchmarks for evaluating LLMs in real-world code issue resolution. Unlike existing Python-centric benchmarks (e.g., SWE-bench), our framework spans 7 languages (i.e., Java, TypeScript, JavaScript, Go, Rust, C, and C++) with 1,632 high-quality instances, curated from 2,456 candidates by 68 expert annotators for reliability.
- https://arxiv.org/abs/2504.02605
  - Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving (April, 2025)
https://liveswebench.ai/
- LiveSWEBench
- A Challenging, Contamination-Free Benchmark for AI Software Engineers
- https://github.com/livebench/liveswebench
  - LiveSWEBench
  - LiveSWEBench is a benchmark for evaluating the utility of AI coding assistants in real-world software engineering tasks, at varying levels of developer involvement. Given a real-world codebase and issue, we investigate the following questions:
    - How useful are AI coding assistants at completing tasks with no developer involvement?
    - How useful are AI coding assistants at completing tasks with some developer involvement (i.e. writing prompts)?
    - How useful are AI coding assistants at aiding in the completion of tasks with high developer involvement (i.e. writing code)?
- https://livebench.ai/
  - LiveBench
  - A Challenging, Contamination-Free LLM Benchmark

OpenAI Assistants / ChatGPT custom GPTs

https://openai.com/blog/introducing-gpts
- Introducing GPTs You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills.
- We’re rolling out custom versions of ChatGPT that you can create for a specific purpose—called GPTs. GPTs are a new way for anyone to create a tailored version of ChatGPT to be more helpful in their daily life, at specific tasks, at work, or at home—and then share that creation with others.
https://platform.openai.com/docs/assistants/overview
- The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries.

OpenGPTs

https://github.com/langchain-ai/opengpts
- This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:
  
  The LLM you use (choose between the 60+ that LangChain offers)
  - The prompts you use (use LangSmith to debug those)
  - The tools you give it (choose from LangChain's 100+ tools, or easily write your own)
  - The vector database you use (choose from LangChain's 60+ vector database integrations)
  - The retrieval algorithm you use
  - The chat history database you use

GitHub Copilot Agent / Workspace / CLI

https://github.com/features/copilot
- GitHub Copilot
- https://code.visualstudio.com/docs/copilot/chat/chat-agent-mode
  - Use agent mode in VS Code With chat agent mode in Visual Studio Code, you can use natural language to define a high-level task and to start an agentic code editing session to accomplish that task. In agent mode, Copilot autonomously plans the work needed and determines the relevant files and context. It then makes edits to your codebase and invokes tools to accomplish the request you made. Agent mode monitors the outcome of edits and tools and iterates to resolve any issues that arise.
https://githubnext.com/projects/copilot-workspace
- Copilot Workspace A Copilot-native dev environment, designed for everyday tasks.
- https://github.com/githubnext/copilot-workspace-user-manual
  - The user manual for GitHub Copilot Workspace
- https://github.blog/2024-04-29-github-copilot-workspace/
  - GitHub Copilot Workspace: Welcome to the Copilot-native developer environment We’re redefining the developer environment with GitHub Copilot Workspace - where any developer can go from idea, to code, to software all in natural language.
https://githubnext.com/projects/copilot-cli/
- Copilot for CLI Ever having trouble remembering shell commands and flags for this or that? Ever wish you could just say what you want the shell to do? Don't worry: we're building GitHub Copilot assistance right into your terminal.
- https://docs.github.com/en/copilot/using-github-copilot/using-github-copilot-in-the-command-line
  - Using GitHub Copilot in the command line You can use Copilot with the GitHub CLI to get suggestions and explanations for the command line.
- https://docs.github.com/en/copilot/responsible-use-of-github-copilot-features/responsible-use-of-github-copilot-in-the-cli
  - Responsible use of GitHub Copilot in the CLI Learn how to use GitHub Copilot in the CLI responsibly by understanding its purposes, capabilities, and limitations.

Claude Code

https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview
- Claude Code overview
- Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands. By integrating directly with your development environment, Claude Code streamlines your workflow without requiring additional servers or complex setup.
https://github.com/anthropics/claude-code
- Claude Code (Research Preview)
- Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.
  
  Some of its key capabilities include:
  - Edit files and fix bugs across your codebase
  - Answer questions about your code's architecture and logic
  - Execute and fix tests, lint, and other commands
  - Search through git history, resolve merge conflicts, and create commits and PRs
https://github.com/anthropics/claude-cookbooks/tree/main
- Claude Cookbooks
- A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
- The Claude Cookbooks provide code and guides designed to help developers build with Claude, offering copy-able code snippets that you can easily integrate into your own projects.
https://github.com/anthropics/skills
- Skills
- Public repository for Skills
- Skills are folders of instructions, scripts, and resources that Claude loads dynamically to improve performance on specialized tasks. Skills teach Claude how to complete specific tasks in a repeatable way, whether that's creating documents with your company's brand guidelines, analyzing data using your organization's specific workflows, or automating personal tasks.

OpenAI Codex CLI

https://github.com/openai/codex
- OpenAI Codex CLI
- Lightweight coding agent that runs in your terminal
https://help.openai.com/en/articles/11096431-openai-codex-cli-getting-started
- OpenAI Codex CLI – Getting Started
- OpenAI Codex CLI is an open‑source command‑line tool that brings the power of our latest reasoning models directly to your terminal. It acts as a lightweight coding agent that can read, modify, and run code on your local machine to help you build features faster, squash bugs, and understand unfamiliar code. Because the CLI runs locally, your source code never leaves your environment unless you choose to share it.
https://openai.com/index/introducing-o3-and-o4-mini/#codex-cli-frontier-reasoning-in-the-terminal
- Codex CLI: frontier reasoning in the terminal
  
  We’re also sharing a new experiment: Codex CLI, a lightweight coding agent you can run from your terminal. It works directly on your computer and is designed to maximize the reasoning capabilities of models like o3 and o4-mini, with upcoming support for additional API models like GPT‑4.1⁠.
  
  You can get the benefits of multimodal reasoning from the command line by passing screenshots or low fidelity sketches to the model, combined with access to your code locally. We think of it as a minimal interface to connect our models to users and their computers. Codex CLI is fully open-source at github.com/openai/codex⁠(opens in a new window) today.
  
  Alongside, we are launching a $1 million initiative to support projects using Codex CLI and OpenAI models. We will evaluate and accept applications for grants in increments of $25,000 USD in the form of API credits. Proposals can be submitted here.
- https://openai.com/form/codex-open-source-fund/
  - Codex open source fund We’re excited to launch a $1 million initiative supporting open source projects to use Codex CLI and OpenAI models. Applications will be reviewed on an ongoing basis, with projects receiving grants in increments of $25,000 in API credits.
    
    If you’re interested in participating, please fill out the form below.
https://docs.litellm.ai/docs/tutorials/openai_codex
- Using LiteLLM with OpenAI Codex This guide walks you through connecting OpenAI Codex to LiteLLM.
- BerriAI/litellm#10156
  - LiteLLM / OpenAI Codex Discussion and Support
https://github.com/ymichael/open-codex
- Open Codex CLI
- Important Note: This is a fork of the original OpenAI Codex CLI with expanded model support and changed installation instructions. The main differences in this fork are:
  - Support for multiple AI providers (OpenAI, Gemini, OpenRouter, Ollama)
  - Uses the Chat Completion API instead of the Responses API which allows us to support any openai compatible provider and model.
  - All other functionality remains similar to the original project
  - You can install this fork globally with npm i -g open-codex
https://github.com/lolrazh/codex
- OpenAI Codex CLI (Open Responses Fork)
- A fork of openai/codex integrated with Julep's Open Responses API

aider

https://github.com/paul-gauthier/aider
- aider is AI pair programming in your terminal Aider is a command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository. Aider will directly edit the code in your local source files, and git commit the changes with sensible commit messages. You can start a new project or work with an existing git repo. Aider is unique in that it lets you ask for changes to pre-existing, larger codebases.
- https://aider.chat/
  - https://aider.chat/blog/
    - https://aider.chat/2023/10/22/repomap.html
      - Building a better repository map with tree sitter
    - https://aider.chat/2023/12/21/unified-diffs.html
      - Unified diffs make GPT-4 Turbo 3X less lazy

llm

https://github.com/simonw/llm
- Access large language models from the command-line
- https://llm.datasette.io/
  - LLM A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine.
    
    Run prompts from the command-line, store the results in SQLite, generate embeddings and more.
  - https://llm.datasette.io/en/stable/openai-models.html
    - OpenAI models LLM ships with a default plugin for talking to OpenAI’s API. OpenAI offer both language models and embedding models, and LLM can access both types.
  - https://llm.datasette.io/en/stable/other-models.html
    - Other models LLM supports OpenAI models by default. You can install plugins to add support for other models. You can also add additional OpenAI-API-compatible models using a configuration file.
    - Installing and using a local model LLM plugins can provide local models that run on your machine.
      
      To install llm-gpt4all, providing 17 models from the GPT4All project, run this:
      
      llm install llm-gpt4all
      
      Run llm models to see the expanded list of available models.
  - https://llm.datasette.io/en/stable/embeddings/cli.html
    - Embedding with the CLI LLM provides command-line utilities for calculating and storing embeddings for pieces of content.
    - llm embed The llm embed command can be used to calculate embedding vectors for a string of content. These can be returned directly to the terminal, stored in a SQLite database, or both.
    - Storing embeddings in SQLite Embeddings are much more useful if you store them somewhere, so you can calculate similarity scores between different embeddings later on.
      
      LLM includes the concept of a collection of embeddings. A collection groups together a set of stored embeddings created using the same model, each with a unique ID within that collection.
      
      Embeddings also store a hash of the content that was embedded. This hash is later used to avoid calculating duplicate embeddings for the same content.
    - Storing content and metadata By default, only the entry ID and the embedding vector are stored in the database table.
      
      You can store a copy of the original text in the content column by passing the --store option
  - You can also store a JSON object containing arbitrary metadata in the metadata column by passing the --metadata option.
  - llm embed-multi The llm embed command embeds a single string at a time.
    
    llm embed-multi can be used to embed multiple strings at once, taking advantage of any efficiencies that the embedding model may provide when processing multiple strings.
    
    This command can be called in one of three ways:
    - With a CSV, TSV, JSON or newline-delimited JSON file
    - With a SQLite database and a SQL query
    - With one or more paths to directories, each accompanied by a glob pattern
  - Embedding data from a SQLite database You can embed data from a SQLite database using --sql, optionally combined with --attach to attach an additional database.
  - Embedding data from files in directories LLM can embed the content of every text file in a specified directory, using the file’s path and name as the ID.
  - llm similar The llm similar command searches a collection of embeddings for the items that are most similar to a given or item ID.
    
    This currently uses a slow brute-force approach which does not scale well to large collections. See issue 216 for plans to add a more scalable approach via vector indexes provided by plugins.
    - simonw/llm#216
      - Support for plugins that implement vector indexes
  - You can compare against text stored in a file using -i filename
  - When using a model like CLIP, you can find images similar to an input image using -i filename with --binary
  - llm embed-models To list all available embedding models, including those provided by plugins, run this command:
```
llm embed-models
```
  - llm collections list To list all of the collections in the embeddings database, run this command:
```
llm collections list
```
  - https://llm.datasette.io/en/stable/embeddings/writing-plugins.html
    - Writing plugins to add new embedding models Read the plugin tutorial for details on how to develop and package a plugin.
      
      This page shows an example plugin that implements and registers a new embedding model.
      
      There are two components to an embedding model plugin:
      
      An implementation of the register_embedding_models() hook, which takes a register callback function and calls it to register the new model with the LLM plugin system.
      
      A class that extends the llm.EmbeddingModel abstract base class. The only required method on this class is embed_batch(texts), which takes an iterable of strings and returns an iterator over lists of floating point numbers.
    - Embedding binary content If your model can embed binary content, use the supports_binary property to indicate that
    - If your model accepts binary, your .embed_batch() model may be called with a list of Python bytestrings. These may be mixed with regular strings if the model accepts both types of input.
  - https://llm.datasette.io/en/stable/plugins/installing-plugins.html
    - Installing plugins Plugins must be installed in the same virtual environment as LLM itself.
      
      You can find names of plugins to install in the plugin directory
      
      Use the llm install command (a thin wrapper around pip install) to install plugins in the correct environment
  - https://llm.datasette.io/en/stable/plugins/directory.html#plugin-directory
    - Plugin directory The following plugins are available for LLM.
  - https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html
    - Writing a plugin to support a new model This tutorial will walk you through developing a new plugin for LLM that adds support for a new Large Language Model.
  - https://llm.datasette.io/en/stable/aliases.html
    - Model aliases LLM supports model aliases, which allow you to refer to a model by a short name instead of its full ID.
    - Listing aliases To list current aliases, run this:
      
      llm aliases
    - Adding a new alias The llm aliases set <alias> <model-id> command can be used to add a new alias
    - Removing an alias The llm aliases remove <alias> command will remove the specified alias
  - Viewing the aliases file Aliases are stored in an aliases.json file in the LLM configuration directory.
    
    To see the path to that file, run this:
```
llm aliases path
```
    To view the content of that file, run this:
```
cat "$(llm aliases path)"
```
  - https://llm.datasette.io/en/stable/python-api.html
    - Python API LLM provides a Python API for executing prompts, in addition to the command-line interface.
      
      Understanding this API is also important for writing Plugins.
  - https://llm.datasette.io/en/stable/templates.html
    - Prompt templates Prompt templates can be created to reuse useful prompts with different input data.
  - https://llm.datasette.io/en/stable/logging.html
    - Logging to SQLite llm defaults to logging all prompts and responses to a SQLite database.
      
      You can find the location of that database using the llm logs path command
    - To avoid logging an individual prompt, pass --no-log or -n to the command
    - To turn logging by default off: llm logs off
  - https://llm.datasette.io/en/stable/related-tools.html
    - Related tools The following tools are designed to be used with LLM:
      - https://llm.datasette.io/en/stable/related-tools.html#strip-tags
        
        strip-tags strip-tags is a command for stripping tags from HTML. This is useful when working with LLMs because HTML tags can use up a lot of your token budget.
        
        Here’s how to summarize the front page of the New York Times, by both stripping tags and filtering to just the elements with class="story-wrapper":
        
        curl -s https://www.nytimes.com/ \ | strip-tags .story-wrapper \ | llm -s 'summarize the news'
      - https://llm.datasette.io/en/stable/related-tools.html#ttok
        
        ttok ttok is a command-line tool for counting OpenAI tokens. You can use it to check if input is likely to fit in the token limit for GPT 3.5 or GPT4
        
        It can also truncate input down to a desired number of tokens
      - https://llm.datasette.io/en/stable/related-tools.html#symbex
        
        Symbex Symbex is a tool for searching for symbols in Python codebases. It’s useful for extracting just the code for a specific problem and then piping that into LLM for explanation, refactoring or other tasks.
        
        It can also be used to export symbols in a format that can be piped to llm embed-multi in order to create embeddings
        
        Based on how Symbex is described, I think grep-ast might be able to do a similar job, but across any language supported by tree-sitter, and not just python:
        
        https://github.com/paul-gauthier/grep-ast
        
        grep-ast Grep soure code files and see matching lines with useful context that show how they fit into the code. See the loops, functions, methods, classes, etc that contain all the matching lines. Get a sense of what's inside a matched class or function definition. You see relevant code from every layer of the abstract syntax tree, above and below the matches.
- https://simonwillison.net/tags/llm/
  - https://simonwillison.net/2023/Apr/4/llm/
    - Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter
    - The llm CLI tool This is one new piece of software I’ve released in the past few weeks that I haven’t written about yet.
      
      I built the first version of llm, a command-line tool for running prompts against large language model (currently just ChatGPT and GPT-4), getting the results back on the command-line and also storing the prompt and response in a SQLite database.
  - https://simonwillison.net/2023/May/18/cli-tools-for-llms/
    - llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs I’ve been building out a small suite of command-line tools for working with ChatGPT, GPT-4 and potentially other language models in the future.
      
      The three tools I’ve built so far are:
      
      llm — a command-line tool for sending prompts to the OpenAI APIs, outputting the response and logging the results to a SQLite database. I introduced that a few weeks ago.
      
      ttok — a tool for counting and truncating text based on tokens
      
      strip-tags — a tool for stripping HTML tags from text, and optionally outputting a subset of the page based on CSS selectors
      
      The idea with these tools is to support working with language model prompts using Unix pipes.
  - https://simonwillison.net/2023/Jun/18/symbex/
    - Symbex: search Python code for functions and classes, then pipe them into a LLM I just released a new Python CLI tool called Symbex. It’s a search tool, loosely inspired by ripgrep, which lets you search Python code for functions and classes by name or wildcard, then see just the source code of those matching entities.
  - https://simonwillison.net/2023/Jul/12/llm/
    - My LLM CLI tool now supports self-hosted language models via plugins LLM is my command-line utility and Python library for working with large language models such as GPT-4. I just released version 0.5 with a huge new feature: you can now install plugins that add support for additional models to the tool, including models that can run on your own hardware.
  - https://simonwillison.net/2023/Sep/4/llm-embeddings/
    - LLM is my Python library and command-line tool for working with language models. I just released LLM 0.9 with a new set of features that extend LLM to provide tools for working with embeddings.
    - An embedding model lets you take a string of text—a word, sentence, paragraph or even a whole document—and turn that into an array of floating point numbers called an embedding vector.
    - A model will always produce the same length of array—1,536 numbers for the OpenAI embedding model, 384 for all-MiniLM-L6-v2—but the array itself is inscrutable. What are you meant to do with it? The answer is that you can compare them. I like to think of an embedding vector as a location in 1,536-dimensional space. The distance between two vectors is a measure of how semantically similar they are in meaning, at least according to the model that produced them.
    - Things you can do with embeddings include:
      
      Find related items. I use this on my TIL site to display related articles, as described in Storing and serving related documents with openai-to-sqlite and embeddings.
      
      Build semantic search. As shown above, an embeddings-based search engine can find content relevant to the user’s search term even if none of the keywords match.
      
      Implement retrieval augmented generation—the trick where you take a user’s question, find relevant documentation in your own corpus and use that to get an LLM to spit out an answer. More on that here.
      
      Clustering: you can find clusters of nearby items and identify patterns in a corpus of documents.
      
      Classification: calculate the embedding of a piece of text and compare it to pre-calculated “average” embeddings for different categories.
    - My goal with LLM is to provide a plugin-driven abstraction around a growing collection of language models. I want to make installing, using and comparing these models as easy as possible. The new release adds several command-line tools for working with embeddings, plus a new Python API for working with embeddings in your own code. It also adds support for installing additional embedding models via plugins.
  - https://simonwillison.net/2024/Mar/26/llm-cmd/
    - I just released a neat new plugin for my LLM command-line tool: llm-cmd. It lets you run a command to to generate a further terminal command, review and edit that command, then hit <enter> to execute it or <ctrl-c> to cancel.

Continue - Custom AI Code Assistant

https://github.com/continuedev/continue
- Continue enables developers to create, share, and use custom AI code assistants with our open-source VS Code and JetBrains extensions and hub of models, rules, prompts, docs, and other building blocks
https://www.continue.dev/
- Amplified developers, AI-native development
- Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
- Make your own custom AI code assistants
- https://www.continue.dev/amplified
  - Amplified developers, AI-enhanced development
  - We've worked together with dozens of engineers, platform teams, and others to sketch a path toward a future where developers are amplified, not automated. You can read, suggest edits, and add support for the open document at amplified.dev.
  - https://amplified.dev/
  - We believe in a future where developers are amplified, not automated
- https://docs.continue.dev/
  - Continue enables developers to create, share, and use custom AI code assistants with our open-source VS Code and JetBrains extensions and hub of models, rules, prompts, docs, and other building blocks
  - https://docs.continue.dev/reference
    - config.yaml Reference
    - Continue hub assistants are defined using the config.yaml specification. Assistants can be loaded from the Hub or locally
- https://hub.continue.dev/explore/assistants
  - Assistants Custom AI code assistants are configurations of building blocks that enable you to receive assistance tailored to your specific use cases
  - Explore custom AI code assistants
  - https://docs.continue.dev/hub/assistants/intro
    - Introduction to Assistants
    - Custom AI code assistants are configurations of building blocks that enable a coding experience tailored to your specific use cases.
      
      config.yaml is a format for defining custom AI code assistants. An assistant has some top-level properties (e.g. name, version), but otherwise consists of composable lists of blocks such as models and rules, which are the atomic building blocks of an assistant.
      
      The config.yaml is parsed by the open-source Continue IDE extensions to create custom assistant experiences. When you log in to hub.continue.dev, your assistants will automatically be synced with the IDE extensions.
https://blog.continue.dev/
- The custom AI code assistant blog
- https://blog.continue.dev/continue-1-0/
  - Continue 1.0
  - With Continue 1.0, our community has helped us define seven building blocks so far: models, rules, context, docs, prompts, data, and MCP. These blocks will evolve over time, and new ones will emerge, as developers determine the customizations they want and need.
  - With Continue 1.0, we are standardizing on config.yaml as our packaging format. Our approach to packaging configurations is through an open format that we plan to evolve over time.
- https://blog.continue.dev/transforming-code-search-with-voyage-ai-why-your-continue-assistant-needs-better-embeddings-and-reranking/
  - Transforming Code Search with Voyage AI: Why Your Continue Assistant Needs Better Embedding Models and Rerankers

Cline

https://github.com/cline/cline
- Cline
- Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.

Roo-Code (formerly Roo Cline)

https://roocode.com/
- Your AI-Powered Dev Team, Right in Your Editor.
- Supercharge your editor with AI that understands your codebase, streamlines development, and helps you write, refactor, and debug with ease.
- https://roocode.com/evals
  - Evals
  - Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case.
https://github.com/RooVetGit/Roo-Code
- Roo Code (prev. Roo Cline) gives you a whole dev team of AI agents in your code editor.
- Roo Code is an AI-powered autonomous coding agent that lives in your editor. It can:
  - Communicate in natural language
  - Read and write files directly in your workspace
  - Run terminal commands
  - Automate browser actions
  - Integrate with any OpenAI-compatible or custom API/model
  - Adapt its “personality” and capabilities through Custom Modes
  Whether you’re seeking a flexible coding partner, a system architect, or specialized roles like a QA engineer or product manager, Roo Code can help you build software more efficiently.

Autogen / FLAML / etc

https://github.com/microsoft/autogen
- Enable Next-Gen Large Language Model Applications.
- AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.
- - AutoGen enables building next-gen LLM applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
  - It supports diverse conversation patterns for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy, the number of agents, and agent conversation topology.
  - It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
  - AutoGen provides enhanced LLM inference. It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.
- Roadmap: https://github.com/orgs/microsoft/projects/989/views/3
- https://github.com/microsoft/autogen#multi-agent-conversation-framework
  - Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.
- https://microsoft.github.io/autogen/blog/
- https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenAssistant/
  - AutoGen Assistant: Interactively Explore Multi-Agent Workflows
  - To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Assistant, an interface powered by AutoGen. It allows you to:
    - Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
    - Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken).
    - Explicitly add skills to your agents and accomplish more tasks.
    - Publish your sessions to a local gallery.
    - AutoGen Assistant is open source, give it a try!
  - we are thrilled to introduce a new user-friendly interface: the AutoGen Assistant. Built upon the leading foundation of AutoGen and robust, modern web technologies like React.
  - With the AutoGen Assistant, users can rapidly create, manage, and interact with agents that can learn, adapt, and collaborate. As we release this interface into the open-source community, our ambition is not only to enhance productivity but to inspire a level of personalized interaction between humans and agents.
  - We recommend using a virtual environment (e.g., conda) to avoid conflicts with existing Python packages. With Python 3.10 or newer active in your virtual environment, use pip to install AutoGen Assistant: pip install autogenra
  - Once installed, run the web UI by entering the following in your terminal: autogenra ui --port 8081. This will start the application on the specified port. Open your web browser and go to http://localhost:8081/ to begin using AutoGen Assistant.
  - The AutoGen Assistant proposes some high-level concepts that help compose agents to solve tasks.
    - Agent Workflow: An agent workflow is a specification of a set of agents that can work together to accomplish a task. The simplest version of this is a setup with two agents – a user proxy agent (that represents a user i.e. it compiles code and prints result) and an assistant that can address task requests (e.g., generating plans, writing code, evaluating responses, proposing error recovery steps, etc.). A more complex flow could be a group chat where even more agents work towards a solution.
    - Session: A session refers to a period of continuous interaction or engagement with an agent workflow, typically characterized by a sequence of activities or operations aimed at achieving specific objectives. It includes the agent workflow configuration, the interactions between the user and the agents. A session can be “published” to a “gallery”.
    - Skills: Skills are functions (e.g., Python functions) that describe how to solve a task. In general, a good skill has a descriptive name (e.g. generate_images), extensive docstrings and good defaults (e.g., writing out files to disk for persistence and reuse). You can add new skills to the AutoGen Assistant via the provided UI. At inference time, these skills are made available to the assistant agent as they address your tasks.
    AutoGen Assistant comes with 3 example skills: fetch_profile, find_papers, generate_images. Please feel free to review the repo to learn more about how they work.
  - While the AutoGen Assistant is a web interface, it is powered by an underlying python API that is reusable and modular. Importantly, we have implemented an API where agent workflows can be declaratively specified (in JSON), loaded and run.
- https://microsoft.github.io/autogen/blog/2023/11/26/Agent-AutoBuild/
  - Agent AutoBuild - Automatically Building Multi-agent Systems
  - Introducing AutoBuild, building multi-agent system automatically, fast, and easily for complex tasks with minimal user prompt required, powered by a new designed class AgentBuilder. AgentBuilder also supports open-source LLMs by leveraging vLLM and FastChat.
  - In this blog, we introduce AutoBuild, a pipeline that can automatically build multi-agent systems for complex tasks. Specifically, we design a new class called AgentBuilder, which will complete the generation of participant expert agents and the construction of group chat automatically after the user provides descriptions of a building task and an execution task.
  - AutoBuild supports open-source LLM by vLLM and FastChat.
  - OpenAI Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. AutoBuild also supports the assistant API by adding use_oai_assistant=True to build().
- https://microsoft.github.io/autogen/blog/2023/11/20/AgentEval/
  - How to Assess Utility of LLM-powered Applications?
  - As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks?
  - We introduce AgentEval — the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks. AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria.
- https://microsoft.github.io/autogen/blog/2023/11/13/OAI-assistants/
  - AutoGen Meets GPTs
  - OpenAI assistants are now integrated into AutoGen via GPTAssistantAgent. This enables multiple OpenAI assistants, which form the backend of the now popular GPTs, to collaborate and tackle complex tasks.
- https://microsoft.github.io/autogen/blog/2023/11/09/EcoAssistant/
  - EcoAssistant - Using LLM Assistants More Accurately and Affordably
  - TL;DR:
    - Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably.
    - We show how to let the LLM assistant agent leverage external API to solve user query.
    - We show how to reduce the cost of using GPT models via Assistant Hierachy.
    - We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.
- https://microsoft.github.io/autogen/blog/2023/11/06/LMM-Agent/
  - Multimodal with GPT-4V and LLaVA
  - This blog post and the latest AutoGen update concentrate on visual comprehension. Users can input images, pose questions about them, and receive text-based responses from these LMMs. We support the gpt-4-vision-preview model from OpenAI and LLaVA model from Microsoft now.
- https://microsoft.github.io/autogen/blog/2023/10/26/TeachableAgent/
  - AutoGen's TeachableAgent
  - We introduce TeachableAgent (which uses TextAnalyzerAgent) so that users can teach their LLM-based assistants new facts, preferences, and skills.
- https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat/
  - Retrieval-Augmented Generation (RAG) Applications with AutoGen
  - TL;DR:
    - We introduce RetrieveUserProxyAgent and RetrieveAssistantAgent, RAG agents of AutoGen that allows retrieval-augmented generation, and its basic usage.
    - We showcase customizations of RAG agents, such as customizing the embedding function, the text split function and vector database.
    - We also showcase two advanced usage of RAG agents, integrating with group chat and building a Chat application with Gradio.
https://github.com/microsoft/FLAML
- A Fast Library for Automated Machine Learning & Tuning
- FLAML is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
  - FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
  - For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
  - It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.
- Heads-up: We have migrated AutoGen into a dedicated github repository. Alongside this move, we have also launched a dedicated Discord server and a website for comprehensive documentation.

OpenHands (formerly OpenDevin)

https://www.all-hands.dev/
- Open Source Agents for Developers
https://github.com/All-Hands-AI
- All Hands AI We build AI software development agents for everyone, in the open.
https://github.com/All-Hands-AI/OpenHands
- OpenHands: Code Less, Make More
- https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/
  - Introducing OpenDevin CodeAct 1.0, a new State-of-the-art in Coding Agents
  - today we introduce a new state-of-the-art coding agent, OpenDevin CodeAct 1.0, which achieves 21% solve rate on SWE-Bench Lite unassisted, a 17% relative improvement above the previous state-of-the-art posted by SWE-Agent. OpenDevin CodeAct 1.0 is now the default in OpenDevin v0.5
  - We also are working on a new simplified evaluation harness for testing coding agents, which we hope will be easy to use for agent developers and researchers, facilitating comprehensive evaluation and comparison. The current version of the harness is available here (tutorial, harness).
    - Tutorial: https://github.com/OpenDevin/OpenDevin/tree/bl-xw/swe-bench/evaluation/swe_bench
    - Harness: https://github.com/OpenDevin/OD-SWE-bench/tree/eval
  - SWE-Bench is a great benchmark that tests the ability of coding agents to solve real-world github issues on a number of popular repositories. However, due in part to its realism the process of evaluating on SWE-Bench can initially seem daunting.
  - To help make it easy to perform this process in an efficient, stable, and reproducible manner, the OpenDevin team containerized the evaluation environment. This preparation involves setting up all necessary testbeds (codebases at various versions) and their respective conda environments in advance. For each task instance, we initiate a sandbox container where the testbed is pre-configured, ensuring a ready-to-use setup for the agent
  - This supports both SWE-Bench-Lite (a smaller benchmark of 300 issues that is more conducive to quick benchmarking) and SWE-Bench (the full dataset of 2,294 issues, work-in-progress). With our evaluation pipeline, we obtained a replicated SWE-agent resolve score of 17.3% (52 out of 300 test instances) on SWE-Bench-Lite using the released SWE-agent patch predictions, which differs by 2 from the originally reported 18.0% (54 out of 300).
- OpenHands/OpenHands#742
  - Explore using stack graphs for better code search / navigation / context / repo map / etc
https://github.com/All-Hands-AI/openhands-aci
- Agent-Computer Interface (ACI) for OpenHands
- An Agent-Computer Interface (ACI) designed for software development agents OpenHands. This package provides essential tools and interfaces for AI agents to interact with computer systems for software development tasks.
https://github.com/All-Hands-AI/open-operator
- Open Operator
- Open-source resources on agents for computer use.
- What will it take to make a versatile computer use agent that can safely and effectively handle any task?
  
  This is a collection of resources and ideas towards this goal.

SWE-agent

https://github.com/SWE-agent/SWE-agent
- SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models
- SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.
  
  On SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.
- Agent-Computer Interface (ACI) We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.
  
  Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.

ChatDev

https://github.com/OpenBMB/ChatDev
- Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
- Communicative Agents for Software Development
- ChatDev stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer , Chief Product Officer , Chief Technology Officer , programmer , reviewer , tester , art designer . These agents form a multi-agent organizational structure and are united by a mission to "revolutionize the digital world through programming." The agents within ChatDev collaborate by participating in specialized functional seminars, including tasks such as designing, coding, testing, and documenting. The primary objective of ChatDev is to offer an easy-to-use, highly customizable and extendable framework, which is based on large language models (LLMs) and serves as an ideal scenario for studying collective intelligence.
- https://github.com/OpenBMB/ChatDev#-news
  - November 15th, 2023: We launched ChatDev as a SaaS platform that enables software developers and innovative entrepreneurs to build software efficiently at a very low cost and barrier to entry. Try it out at https://chatdev.modelbest.cn/
  - November 2nd, 2023: ChatDev is now supported with a new feature: incremental development, which allows agents to develop upon existing codes. Try --config "incremental" --path "[source_code_directory_path]" to start it.
  - October 26th, 2023: ChatDev is now supported with Docker for safe execution (thanks to contribution from ManindraDeMel). Please see Docker Start Guide.
  - September 25th, 2023: The Git mode is now available, enabling the programmer to utilize Git for version control. To enable this feature, simply set "git_management" to "True" in ChatChainConfig.json. See guide.
  - September 20th, 2023: The Human-Agent-Interaction mode is now available! You can get involved with the ChatDev team by playing the role of reviewer and making suggestions to the programmer ; try python3 run.py --task [description_of_your_idea] --config "Human". See guide and example.
  - September 1st, 2023: The Art mode is available now! You can activate the designer agent to generate images used in the software; try python3 run.py --task [description_of_your_idea] --config "Art". See guide and example.
- https://chatdev.modelbest.cn/

AutoCoder

https://github.com/bin123apple/AutoCoder
- AutoCoder
- We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024). (90.9% vs 90.2%).
  
  Additionally, compared to previous open-source models, AutoCoder offers a new feature: it can automatically install the required packages and attempt to run the code until it deems there are no issues, whenever the user wishes to execute the code.
- https://arxiv.org/abs/2405.14906
  - AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}
  - We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset.

OpenCodeInterpreter

https://github.com/OpenCodeInterpreter/OpenCodeInterpreter
- OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
- OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. It significantly enhances code generation capabilities by integrating execution and iterative refinement functionalities.
- https://opencodeinterpreter.github.io/
- https://arxiv.org/abs/2402.14658
  - OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
  - The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.

OpenInterpreter

https://github.com/KillianLucas/open-interpreter
- OpenInterpreter A natural language interface for computers
- Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.
  
  This provides a natural-language interface to your computer's general-purpose capabilities
- https://openinterpreter.com/
  - https://docs.openinterpreter.com/introduction
    - https://github.com/KillianLucas/open-interpreter-docs
      - Documentation site for the Open Interpreter project
  - https://changes.openinterpreter.com/
- https://github.com/KillianLucas/open-procedures
  - Tiny, structured coding tutorials that can be searched semantically
  - Open Procedures is an open-source project offering tiny, structured coding tutorials that can be searched semantically. It was created to help code-interpreting language models complete tasks by fetching relevant and up-to-date code snippets.
  - https://open-procedures.replit.app/

Unsorted

https://github.com/holmeswww/agentkit
- AgentKit: Flow Engineering with Graphs, not Coding
- An intuitive LLM prompting framework for multifunctional agents, by explicitly constructing a complex "thought process" from simple natural language prompts.
- AgentKit offers a unified framework for explicitly constructing a complex human "thought process" from simple natural language prompts. The user puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".
  
  Different arrangements of nodes could represent different functionalities, allowing the user to integrate various functionalities to build multifunctional agents.
  
  A basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience.
https://github.com/CopilotKit/CopilotKit
- CopiloptKit
- A framework for building custom AI Copilots 🤖 in-app AI chatbots, in-app AI Agents, & AI-powered Textareas.
- The Open-Source Copilot Framework Build, deploy, and operate fully custom AI Copilots. in-app AI chatbots, AI agents, and AI Textareas
- https://www.copilotkit.ai/
- https://github.com/CopilotKit/demo-todo
  - This is a demo that showcases using CopilotKit to build a simple Todo app.
  - https://todo-demo-phi.vercel.app/
https://github.com/OpenBMB/AgentVerse
- 🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation
- Task-solving: This framework assembles multiple agents as an automatic multi-agent system (AgentVerse-Tasksolving, Multi-agent as system) to collaboratively accomplish the corresponding tasks. Applications: software development system, consulting system, etc.
- Simulation: This framework allows users to set up custom environments to observe behaviors among, or interact with, multiple agents. Applications: game, social behavior research of LLM-based agents, etc.
- https://arxiv.org/abs/2308.10848
  - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
  - Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework that can collaboratively and dynamically adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that the framework can effectively deploy multi-agent groups that outperform a single agent. Furthermore, we delve into the emergence of social behaviors among individual agents within a group during collaborative task accomplishment. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
- https://developer.nvidia.com/blog/building-your-first-llm-agent-application/
  - Building Your First LLM Agent Application
https://gpt.chatcody.com/
- ChatGPT GitHub Empowered assistant Designed for comprehensive repository interaction - from code contributions to read/write operations, reviews and advanced task automation.
- https://chat.openai.com/g/g-jSqTyHBbh-chatcody-github-gitlab-assistant
https://dosu.dev/
- Dosu is an AI teammate that lives in your GitHub repo, helping you respond to issues, triage bugs, and build better documentation.
- How much does Dosu cost? Auto-labeling and backlog grooming are completely free! For Q&A and debugging, Dosu is free for 25 tickets per month. After that, paid plans start at $20 per month. A detailed pricing page is coming soon.
  
  At Dosu, we are strong advocates of OSS. If you maintain a project that is FOSS, part of the Cloud Native Computing Foundation (CNCF), or the Apache Software Foundation (ASF), please reach out to hi@dosu.dev about special free-tier plans
https://github.com/NL2Code/CodeR
- CodeR
- GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28% of issues, in the case of submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.
https://github.com/stitionai/devika
- Devika - Agentic AI Software Engineer
- Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.
https://github.com/geekan/MetaGPT
- MetaGPT: The Multi-Agent Framework Assign different roles to GPTs to form a collaborative entity for complex tasks.
- The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
- https://www.deepwisdom.ai/
  - https://docs.deepwisdom.ai/
https://github.com/Pythagora-io/gpt-pilot
- The first real AI developer
- GPT Pilot is the core technology for the Pythagora VS Code extension that aims to provide the first real AI developer companion. Not just an autocomplete or a helper for PR messages but rather a real AI developer that can write full features, debug them, talk to you about issues, ask for review, etc.
  - https://marketplace.visualstudio.com/items?itemName=PythagoraTechnologies.gpt-pilot-vs-code
    - Pythagora (GPT Pilot) Beta
https://github.com/blarApp/code-base-agent
- Code agents for LLMs This repo introduces a method to represent a local code repository as a graph structure. The objective is to allow an LLM to traverse this graph to understand the code logic and flow. Providing the LLM with the power to debug, refactor, and optimize queries. However, several tasks are yet unexplored.
- https://blar.io/
  - https://blar.io/blog
    - https://blar.io/blog/how-can-you-improve-the-accuracy-of-your-vector-database-and-rag-systems
      - How can you improve the accuracy of your vector database and RAG systems?
    - https://blar.io/blog/vector-database-alternative-graphs
      - Vector Database Alternative: Graphs
https://github.com/cpacker/MemGPT
- MemGPT allows you to build LLM agents with self-editing memory
- Building persistent LLM agents with long-term memory
https://github.com/daveshap/OpenAI_Agent_Swarm
- Hierarchical Autonomous Agent Swarm (HAAS)
- The Hierarchical Autonomous Agent Swarm (HAAS) is a groundbreaking initiative that leverages OpenAI's latest advancements in agent-based APIs to create a self-organizing and ethically governed ecosystem of AI agents. Drawing inspiration from the ACE Framework, HAAS introduces a novel approach to AI governance and operation, where a hierarchy of specialized agents, each with distinct roles and capabilities, collaborate to solve complex problems and perform a wide array of tasks.
  
  The HAAS is designed to be a self-expanding system where a core set of agents, governed by a Supreme Oversight Board (SOB), can design, provision, and manage an arbitrary number of sub-agents tailored to specific needs. This document serves as a comprehensive guide to the theoretical underpinnings, architectural design, and operational principles of the HAAS.
- https://github.com/daveshap/OpenAI_Agent_Swarm/discussions
https://github.com/daveshap/ACE_Framework
- ACE (Autonomous Cognitive Entities) - 100% local and open source autonomous agents
- We will be committed to using 100% open source software (OSS) for this project. This is to ensure maximimum accessibility and democratic access.
https://github.com/ShishirPatil/gorilla
- Gorilla: An API store for LLMs
- Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them! Hop on our Discord, or open a PR, or email us if you would like to have your API incorporated as well.
- https://gorilla.cs.berkeley.edu/
  - Gorilla: Large Language Model Connected with Massive APIs
- https://github.com/ShishirPatil/gorilla/tree/main/openfunctions
  - Gorilla Openfunctions
  - Gorilla OpenFunctions extends Large Language Model(LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.
  - Comes with Parallel Function Calling!
  - OpenFunctions is compatible with OpenAI Functions
  - https://gorilla.cs.berkeley.edu/blogs/4_open_functions.html
  - OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. Imagine if the LLM could fill in parameters for a variety of services, ranging from Instagram and DoorDash to tools like Google Calendar and Stripe. Even users who are less familiar with API calling procedures and programming can use the model to generate API calls to the desired function. Gorilla OpenFunctions is an LLM that we train using a curated set of API documentation, and Question-Answer pairs generated from the API documentations. We have continued to expand on the Gorilla Paradigm and sought to improve the quality and accuracy of valid function calling generation. This blog is about developing an open-source alternative for function calling similar to features seen in proprietary models, in particular, function calling in OpenAI's GPT-4. Our solution is based on the Gorilla recipe, and with a model with just 7B parameters, its accuracy is, surprisingly, comparable to GPT-4.
- https://github.com/gorilla-llm/gorilla-cli
  - LLMs for your CLI
  - Gorilla CLI Gorilla CLI powers your command-line interactions with a user-centric tool. Simply state your objective, and Gorilla CLI will generate potential commands for execution. Gorilla today supports ~1500 APIs, including Kubernetes, AWS, GCP, Azure, GitHub, Conda, Curl, Sed, and many more. No more recalling intricate CLI arguments! 🦍

Code Generation / Execution

See also:
- AI Agents / etc

Code Leaderboards / Benchmarks

https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
- Big Code Models Leaderboard
- Inspired from the 🤗 Open LLM Leaderboard and 🤗 Open LLM-Perf Leaderboard 🏋️, we compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. We also measure throughput and provide information about the models. We only compare open pre-trained multilingual code models, that people can start from as base models for their trainings.
https://evalplus.github.io/leaderboard.html
- EvalPlus Leaderboard
- EvalPlus evaluates AI Coders with rigorous tests.
- https://github.com/evalplus/evalplus
  - EvalPlus
  - EvalPlus is a rigorous evaluation framework for LLM4Code, with:
    - ✨ HumanEval+: 80x more tests than the original HumanEval!
    - ✨ MBPP+: 35x more tests than the original MBPP!
    - ✨ Evaluation framework: our packages/images/tools can easily and safely evaluate LLMs on above benchmarks.
  - https://evalplus.github.io/
    - Benchmarks @ EvalPlus EvalPlus team aims to build high-quality benchmarks for evaluating LLMs for code. Below are the benchmarks we have beening building so far
    - HumanEval+ & MBPP+ HumanEval and MBPP initially came with limited tests. EvalPlus made HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous eval.
    - RepoQA: Long-Context Code Understanding Repository understanding is crucial for intelligent code agents. At RepoQA, we are designing evaluators of long-context code understanding.
      - https://evalplus.github.io/repoqa.html
        
        RepoQA The First Benchmark for Long-Context Code Understanding
        
        The goal of RepoQA: is to create a series of long-context code understanding tasks to challenge chat/instruction models for code:
        
        Multi-Lingual: RepoQA covers 50 high-quality respositories from 5 programming langauges.
        
        Application-Driven: While "Needle in the Code" by CodeQwen uses a synthetic task to examine the vulnerable parts over the LLM's long context, RepoQA focuses on tasks that can reflect real-world uses.
        
        🔍 Searching Needle Function (🔗): Search a function given its description.
        
        🚧 RepoQA is still under development... More types of QA tasks are coming soon... Stay tuned!

Vision / Multimodal

OpenAI

https://platform.openai.com/docs/guides/vision
- Vision
- Learn how to use GPT-4 to understand images
- GPT-4 with Vision, sometimes referred to as GPT-4V or gpt-4-vision-preview in the API, allows the model to take in images and answer questions about them.

LLaVA / etc

https://llava-vl.github.io/
- LLaVA: Large Language and Vision Assistant
- LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.
- LLaVA-1.5 achieves SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods that use billion-scale data.
- Demo: https://llava.hliu.cc/
- https://github.com/haotian-liu/LLaVA
  - LLaVA: Large Language and Vision Assistant
  - Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
  - https://github.com/haotian-liu/LLaVA#release
    - The following are just a couple of notes that jumped out at me:
    - 11/10 LLaVA-Plus is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). Project Page Demo Code Paper
    - 11/2 LLaVA-Interactive is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. Project Page Demo Code Paper
    - 10/26 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts, script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA.
    - 10/12 LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support!
    - 10/5 LLaVA-1.5 is out! Achieving SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods like Qwen-VL-Chat that use billion-scale data. Check out the technical report, and explore the demo! Models are available in Model Zoo.
    - 6/11 We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here.
    - 6/1 We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page.
- https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md
https://github.com/LLaVA-VL/LLaVA-Plus-Codebase
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
- Learning to Use Tools For Creating Multimodal Agents.
- Demo: https://llavaplus.ngrok.io/
- https://github.com/LLaVA-VL/LLaVA-Plus-Codebase/blob/main/docs/llava-plus/modelzoo.md
- https://llava-vl.github.io/llava-plus/
https://github.com/LLaVA-VL/LLaVA-NeXT
- LLaVA-NeXT: Open Large Multimodal Models
- https://llava-vl.github.io/blog/2024-01-30-llava-next/
  - LLaVA-NeXT: Improved reasoning, OCR, and world knowledge
  - Today, we are thrilled to present LLaVA-NeXT, with improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.
  - Compared with LLaVA-1.5, LLaVA-NeXT has several improvements:
    - Increasing the input image resolution to 4x more pixels. This allows it to grasp more visual details. It supports three aspect ratios, up to 672x672, 336x1344, 1344x336 resolution.
    - Better visual reasoning and OCR capability with an improved visual instruction tuning data mixture.
    - Better visual conversation for more scenarios, covering different applications. Better world knowledge and logical reasoning.
    - Efficient deployment and inference with SGLang.
- https://llava-vl.github.io/blog/2024-04-30-llava-next-video/
  - LLaVA-NeXT: A Strong Zero-shot Video Understanding Model
  - In today’s exploration, we delve into the performance of LLaVA-NeXT within the realm of video understanding tasks. We reveal that LLaVA-NeXT surprisingly has strong performance in understanding video content.
  - SoTA Performance! Without seeing any video data, LLaVA-Next demonstrates strong zero-shot modality transfer ability, outperforming all the existing open-source LMMs (e.g., LLaMA-VID) that have been specifically trained for videos. Compared with proprietary ones, it achieves comparable performance with Gemini Pro on NextQA and ActivityNet-QA.
- https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/
  - LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities in the Wild
https://github.com/microsoft/LLaVA-Med
- LLaVA-Med: Large Language and Vision Assistant for BioMedicine
- Visual instruction tuning towards building large language and vision models with GPT-4 level capabilities in the biomedicine space.

Unsorted

https://github.com/tldraw/draw-a-ui
- draw-a-ui This is an app that uses tldraw and the gpt-4-vision api to generate html based on a wireframe you draw.
- Draw a mockup and generate html for it
- https://makereal.tldraw.com/
- https://github.com/SawyerHood/draw-a-ui
  - Original repo that was forked for the above
  - https://www.draw-a-ui.com/
https://github.com/jordansinger/build-it-figma-ai
- Draw and sketch UI in Figma and FigJam with this widget. Inspired by SawyerHood/draw-a-ui and tldraw/draw-a-ui
https://github.com/jordansinger/UIDraw
- Draw and build a website on your phone.
- Uses GPT-4 Vision and PencilKit/PKCanvasView to draw a UI and convert it into HTML.
- https://twitter.com/jsngr/status/1728848624048853442
https://github.com/microsoft/SoM
- Set-of-Mark Prompting for LMMs
- Set-of-Mark Visual Prompting for GPT-4V
- We present Set-of-Mark (SoM) prompting, simply overlaying a number of spatial and speakable marks on the images, to unleash the visual grounding abilities in the strongest LMM -- GPT-4V. Let's using visual prompting for vision!
- https://arxiv.org/abs/2310.11441
  - Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
  - We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V. As illustrated in Fig. 1 (right), we employ off-the-shelf interactive segmentation models, such as SEEM/SAM, to partition an image into regions at different levels of granularity, and overlay these regions with a set of marks e.g., alphanumerics, masks, boxes. Using the marked image as input, GPT-4V can answer the questions that require visual grounding. We perform a comprehensive empirical study to validate the effectiveness of SoM on a wide range of fine-grained vision and multimodal tasks. For example, our experiments show that GPT-4V with SoM in zero-shot setting outperforms the state-of-the-art fully-finetuned referring expression comprehension and segmentation model on RefCOCOg. Code for SoM prompting is made public at: this https URL.
- https://github.com/facebookresearch/segment-anything
  - Segment Anything
  - The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
  - The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.
- https://github.com/UX-Decoder/Semantic-SAM
  - Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
  - In this work, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We have trained on the whole SA-1B dataset and our model can reproduce SAM and beyond it.
  - Segment everything for one image. We output controllable granularity masks from semantic, instance to part level when using different granularity prompts.
- https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once
  - SEEM: Segment Everything Everywhere All at Once
  - [NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
  - We introduce SEEM that can Segment Everything Everywhere with Multi-modal prompts all at once. SEEM allows users to easily segment an image using prompts of different types including visual prompts (points, marks, boxes, scribbles and image segments) and language prompts (text and audio), etc. It can also work with any combination of prompts or generalize to custom prompts!
- https://github.com/IDEA-Research/GroundingDINO
  - Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
- https://github.com/IDEA-Research/OpenSeeD
  - [ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
- https://github.com/IDEA-Research/MaskDINO
  - [CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
- https://github.com/facebookresearch/VLPart
  - [ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation
  - Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object descriptions, object parts. In this work, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.
https://github.com/OthersideAI/self-operating-computer
- Self-Operating Computer Framework A framework to enable multimodal models to operate a computer.
  
  Using the same inputs and outputs of a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.
https://github.com/ddupont808/GPT-4V-Act
- GPT-4V-Act: Chromium Copilot
- AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
- GPT-4V-Act serves as an eloquent multimodal AI assistant that harmoniously combines GPT-4V(ision) with a web browser. It's designed to mirror the input and output of a human operator—primarily screen feedback and low-level mouse/keyboard interaction. The objective is to foster a smooth transition between human-computer operations, facilitating the creation of tools that considerably boost the accessibility of any user interface (UI), aid workflow automation, and enable automated UI testing.
- GPT-4V-Act leverages both GPT-4V(ision) and Set-of-Mark Prompting, together with a tailored auto-labeler. This auto-labeler assigns a unique numerical ID to each interactable UI element.
  
  By incorporating a task and a screenshot as input, GPT-4V-Act can deduce the subsequent action required to accomplish a task. For mouse/keyboard output, it can refer to the numerical labels for exact pixel coordinates.
  - https://openai.com/research/gpt-4v-system-card
    - GPT-4V(ision)
https://github.com/Jiayi-Pan/GPT-V-on-Web
- 👀🧠 GPT-4 Vision x 💪⌨️ Vimium = Autonomous Web Agent
- This project leverages GPT4V to create an autonomous / interactive web agent. The action space are discretized by Vimium.
https://github.com/bdekraker/WebcamGPT-Vision
- Lightweight GPT-4 Vision processing over the Webcam
- WebcamGPT-Vision is a lightweight web application that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The application captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results.

Image Generation

Automatic1111 (Stable Diffusion WebUI)

https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Stable Diffusion web UI
- A browser interface based on Gradio library for Stable Diffusion.
- https://github.com/AUTOMATIC1111/stable-diffusion-webui#installation-on-apple-silicon
  - https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon

ComfyUI

https://github.com/comfyanonymous/ComfyUI
- The most powerful and modular stable diffusion GUI with a graph/nodes interface.
- https://github.com/comfyanonymous/ComfyUI#apple-mac-silicon
  - You can install ComfyUI in Apple Mac silicon (M1 or M2) with any recent macOS version.
- https://comfyanonymous.github.io/ComfyUI_examples/
  - ComfyUI Examples This repo contains examples of what is achievable with ComfyUI. All the images in this repo contain metadata which means they can be loaded into ComfyUI with the Load button (or dragged onto the window) to get the full workflow that was used to create the image.
- Comfy-Org/ComfyUI#42
  - Alternative UI
    - https://github.com/rvion/CushyStudio
      - The AI and Generative Art platform for everyone
    - https://github.com/space-nuko/ComfyBox
      - Customizable Stable Diffusion frontend for ComfyUI
      - ComfyBox is a frontend to Stable Diffusion that lets you create custom image generation interfaces without any code. It uses ComfyUI under the hood for maximum power and extensibility.
- Comfy-Org/ComfyUI#389
  - Separation of UI presentation and graph
- Comfy-Org/ComfyUI#497
  - [Feature Request] Chip based groups / general group update
- Comfy-Org/ComfyUI#669
  - [FEATURE REQUEST] sub workflows with customizable inputs / output pins
- Comfy-Org/ComfyUI#724
  - Subgraph support
- Comfy-Org/ComfyUI#931
  - Node Expansion, While Loops, Components, and Lazy Evaluation
- Comfy-Org/ComfyUI#1132
  - Simple changes to massively simplify ComfyUI in basic use-cases
- Comfy-Org/ComfyUI#1310
  - Switch the version of litegraph used to litegraph.ts
- Comfy-Org/ComfyUI#1776
  - Group nodes
- https://github.com/ltdrdata/ComfyUI-Workflow-Component
  - This is a side project to experiment with using workflows as components.
https://github.com/ltdrdata/ComfyUI-Manager
- ComfyUI Manager ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.
https://github.com/Chaoses-Ib/ComfyScript
- A Python front end for ComfyUI
- Can apparently use this to generate more complex workflows without having to mess with making the graph manually
- https://github.com/Chaoses-Ib/ComfyScript#workflow-generation
  - Workflow generation
- https://github.com/Chaoses-Ib/ComfyScript#transpiler
  - Transpiler The transpiler can translate ComfyUI's workflows to ComfyScript.
https://github.com/0xbitches/ComfyUI-LCM
- Latent Consistency Model for ComfyUI
- Archival Notice: ComfyUI has officially implemented LCM scheduler, see this commit. Please update your install and use the official implementation.
https://github.com/Fannovel16/ComfyUI-MotionDiff
- ComfyUI MotionDiff Implementation of MDM, MotionDiffuse and ReMoDiffuse into ComfyUI

Unsorted

https://playgroundai.com/
https://github.com/invoke-ai/InvokeAI
- Invoke AI - Generative AI for Professional Creatives
- InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.
- https://invoke.ai/
https://github.com/Sygil-Dev/sygil-webui
- Stable Diffusion web UI
https://github.com/easydiffusion/easydiffusion
- Easy Diffusion 3.0
- Easiest 1-click way to create beautiful artwork on your PC using AI, with no tech knowledge. Provides a browser UI for generating images from text prompts and images. Just enter your text prompt, and see the generated image.
- https://easydiffusion.github.io/
https://github.com/varunshenoy/opendream
- Opendream: A Web UI For the Rest of Us 💭 🎨
- An extensible, easy-to-use, and portable diffusion web UI 👨‍🎨
- Opendream brings much needed and familiar features, such as layering, non-destructive editing, portability, and easy-to-write extensions, to your Stable Diffusion workflows.
- https://www.reddit.com/r/StableDiffusion/comments/15rzu8h/opendream_a_layer_based_stable_diffusion_web_ui/
  - Opendream: A Layer Based Stable Diffusion Web UI

Song / Audio Generation

Udio

https://www.udio.com/
- Udio | Make your music

Suno

https://www.suno.ai/
- Make any song you can imagine
- https://app.suno.ai/
https://github.com/suno-ai/bark
- Text-Prompted Generative Audio Model

Stable Audio

https://arxiv.org/abs/2404.10301
- Long-form music generation with latent diffusion (2024)
- Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.
- https://stability-ai.github.io/stable-audio-2-demo/
  - stable-audio-2-demo
  - Additional creative capabilities Audio-to-audio With diffusion models is possible to perform some degree of style-transfer by initializing the noise with audio during sampling. This capability can be used to modify the aesthetics of an existing recording based on a given text prompt, whilst maintaining the reference audio’s structure (e.g., a beatbox recording could be style-transfered to produce realistic-sounding drums). As a result, our model can be influenced by not only text prompts but also audio inputs, enhancing its controllability and expressiveness. We noted that when initialized with voice recordings (such as beatbox or onomatopoeias), there is a sensation of control akin to an instrument.
  - Memorization analysis Recent works examined the potential of generative models to memorize training data, especially for repeated elements in the training set. Further, musicLM conducted a memorization analysis to address concerns on the potential misappropriation of creative content. Adhering to principles of responsible model development, we also run a comprehensive study on memorization.
    
    Considering the increased probability of memorizing repeated music within the dataset, we start by studying if our training set contains repeated data. We embed all our training data using the LAION-CLAP audio encoder to select audios that are close in this space based on a manually set threshold. The threshold is set such that the selected audios correspond to exact replicas. With this process, we identify 5566 repeated audios in our training set.
    
    We compare our model’s generations against the training set in LAION-CLAP space. Generations are from 5566 prompts within the repeated training data (in-distribution), and 586 prompts from the Song Describer Dataset (no-singing, out-of-distribution). We then identify the top-50 generated music that is closest to the training data and listen.
    
    We extensively listened to potential memorization candidates, and could not find memorization.
https://www.stableaudio.com/
Stable Audio Create music with AI
- https://www.stableaudio.com/user-guide/text-to-audio
  - Text-to-audio
- https://www.stableaudio.com/user-guide/audio-to-audio
  - Audio-to-audio
- https://www.stableaudio.com/user-guide/model-2
  - Stable Audio 2.0 Model
  - Our groundbreaking Stable Audio AudioSparx 2.0 model has been designed to generate full tracks with coherent structure at 3 minutes and 10 seconds. Our new model is available for everyone to generate full tracks on our Stable Audio product.
  - Key features:
    - Stable Audio 2.0 sets a new standard in AI generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1KHz stereo.
    - The new model introduces audio-to-audio generation by allowing users to upload and transform samples using natural language prompts.
    - Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators.
https://stability.ai/news?tags=Audio
- https://stability.ai/news/stable-audio-using-ai-to-generate-music
  - Announcing Stable Audio, a product for music & sound generation
- https://stability.ai/news/stable-audio-2-0
  - Introducing Stable Audio 2.0
- https://stability.ai/news/introducing-stable-audio-open
  - Introducing Stable Audio Open - An Open Source Model for Audio Samples and Sound Design
  - Key Takeaways:
    - Stable Audio Open is an open source text-to-audio model for generating up to 47 seconds of samples and sound effects.
    - Users can create drum beats, instrument riffs, ambient sounds, foley and production elements.
    - The model enables audio variations and style transfer of audio samples.
  - https://huggingface.co/stabilityai/stable-audio-open-1.0
    - Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. It comprises three components: an autoencoder that compresses waveforms into a manageable sequence length, a T5-based text embedding for text conditioning, and a transformer-based diffusion (DiT) model that operates in the latent space of the autoencoder.
    - This model is made to be used with the stable-audio-tools library for inference
      - https://github.com/Stability-AI/stable-audio-tools
        
        stable-audio-tools Training and inference code for audio generation models
        
        https://github.com/Stability-AI/stable-audio-tools#fine-tuning
        
        Fine-tuning Fine-tuning a model involves continuning a training run from a pre-trained checkpoint.
        
        https://github.com/diontimmer/audio-diffusion-gradio
        
        audio-diffusion-gradio
        
        Decked-out gradio client for audio diffusion, mainly stable-audio-tools.
        
        The Audio Diffusion Gradio Interface is a user-friendly graphical user interface (GUI) made in Gradio that simplifies the process of working with audio diffusion models, autoencoders, diffusion autoencoders, and various models trainable using the stable-audio-tools package. This interface not only streamlines your audio diffusion tasks but also provides a modular extension system, enabling users to easily integrate additional functionalities.
        
        https://github.com/lks-ai/ComfyUI-StableAudioSampler
        
        ComfyUI-StableAudioSampler The New Stable Audio Open 1.0 Sampler In a ComfyUI Node. Make some beats!
        
        https://huggingface.co/spaces/ameerazam08/stableaudio-open-1.0
        
        Stable Audio Multiplayer Live Generate audio with text, share and learn from others how to best prompt this new model

AudioCraft: MusicGen, AudioGen, etc

https://github.com/facebookresearch/audiocraft
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
- https://github.com/facebookresearch/audiocraft#models
  - At the moment, AudioCraft contains the training code and inference code for:
    - MusicGen: A state-of-the-art controllable text-to-music model.
      - https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md
        
        MusicGen: Simple and Controllable Music Generation AudioCraft provides the code and models for MusicGen, a simple and controllable model for music generation. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.
    - AudioGen: A state-of-the-art text-to-sound model.
      - https://github.com/facebookresearch/audiocraft/blob/main/docs/AUDIOGEN.md
        
        AudioGen: Textually-guided audio generation AudioCraft provides the code and a model re-implementing AudioGen, a textually-guided audio generation model that performs text-to-sound generation.
        
        The provided AudioGen reimplementation follows the LM model architecture introduced in MusicGen and is a single stage auto-regressive Transformer model trained over a 16kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. This model variant reaches similar audio quality than the original implementation introduced in the AudioGen publication while providing faster generation speed given the smaller frame rate.
    - EnCodec: A state-of-the-art high fidelity neural audio codec.
      - https://github.com/facebookresearch/audiocraft/blob/main/docs/ENCODEC.md
        
        EnCodec: High Fidelity Neural Audio Compression AudioCraft provides the training code for EnCodec, a state-of-the-art deep learning based audio codec supporting both mono and stereo audio, presented in the High Fidelity Neural Audio Compression paper.
    - Multi Band Diffusion: An EnCodec compatible decoder using diffusion.
      - https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md
        
        MultiBand Diffusion AudioCraft provides the code and models for MultiBand Diffusion, From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion. MultiBand diffusion is a collection of 4 models that can decode tokens from EnCodec tokenizer into waveform audio.
    - MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.
      - https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md
        
        MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer AudioCraft provides the code and models for MAGNeT, Masked Audio Generation using a Single Non-Autoregressive Transformer.
        
        MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike prior work on masked generative audio Transformers, such as SoundStorm and VampNet, MAGNeT doesn't require semantic token conditioning, model cascading or audio prompting, and employs a full text-to-audio using a single non-autoregressive Transformer.

Neural Audio Codecs

https://haoheliu.github.io/SemantiCodec/
- SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
- Highlights
  - Ultra-Low bit rate We focus on bitrate between 0.31 kbps and 1.43 kbps, with token rate of 25, 50, or 100 per second.
  - Strong semantic in the audio token Indicated by classification accuracy.
  - Supporting variable vocabulary sizes One model that supporting four different vocabulary sizes.
- https://github.com/haoheliu/SemantiCodec
  - SemantiCodec
- https://arxiv.org/abs/2405.00233
  - SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
  - Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these challenges, we introduce SemantiCodec, a novel codec designed to compress audio into fewer than a hundred tokens per second across diverse audio types, including speech, general audio, and music, without compromising quality. SemantiCodec features a dual-encoder architecture: a semantic encoder using a self-supervised AudioMAE, discretized using k-means clustering on extensive audio data, and an acoustic encoder to capture the remaining details. The semantic and acoustic encoder outputs are used to reconstruct audio via a diffusion-model-based decoder. SemantiCodec is presented in three variants with token rates of 25, 50, and 100 per second, supporting a range of ultra-low bit rates between 0.31 kbps and 1.43 kbps. Experimental results demonstrate that SemantiCodec significantly outperforms the state-of-the-art Descript codec on reconstruction quality. Our results also suggest that SemantiCodec contains significantly richer semantic information than all evaluated audio codecs, even at significantly lower bitrates.
https://github.com/yangdongchao/AcademiCodec
- AcademiCodec: An Open Source Audio Codec Model for Academic Research
- Audio codec models are widely used in audio communication as a crucial technique for compressing audio into discrete representations. Nowadays, audio codec models are increasingly utilized in generation fields as intermediate representations. For instance, AudioLM is ann audio generation model that uses the discrete representation of SoundStream as a training target, while VALL-E employs the Encodec model as an intermediate feature to aid TTS tasks. Despite their usefulness, two challenges persist: (1) training these audio codec models can be difficult due to the lack of publicly available training processes and the need for large-scale data and GPUs; (2) achieving good reconstruction performance requires many codebooks, which increases the burden on generation models. In this study, we propose a group-residual vector quantization (GRVQ) technique and use it to develop a novel \textbf{Hi}gh \textbf{Fi}delity Audio Codec model, HiFi-Codec, which only requires 4 codebooks. We train all the models using publicly available TTS data such as LibriTTS, VCTK, AISHELL, and more, with a total duration of over 1000 hours, using 8 GPUs. Our experimental results show that HiFi-Codec outperforms Encodec in terms of reconstruction performance despite requiring only 4 codebooks. To facilitate research in audio codec and generation, we introduce AcademiCodec, the first open-source audio codec toolkit that offers training codes and pre-trained models for Encodec, SoundStream, and HiFi-Codec.
- https://github.com/yangdongchao/AcademiCodec#what-the-difference-between-soundstream-encodec-and-hifi-codec
  - In our view, the mian difference between SoundStream and Encodec is the different Discriminator choice. For Encodec, it only uses a STFT-dicriminator, which forces the STFT-spectrogram be more real. SoundStream use two types of Discriminator, one forces the waveform-level to be more real, one forces the specrogram-level to be more real. In our code, we adopt the waveform-level discriminator from HIFI-GAN. The spectrogram level discrimimator from Encodec. In thoery, we think SoundStream enjoin better performance.
- https://arxiv.org/abs/2305.02765
  - HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
  - Audio codec models are widely used in audio communication as a crucial technique for compressing audio into discrete representations. Nowadays, audio codec models are increasingly utilized in generation fields as intermediate representations. For instance, AudioLM is an audio generation model that uses the discrete representation of SoundStream as a training target, while VALL-E employs the Encodec model as an intermediate feature to aid TTS tasks. Despite their usefulness, two challenges persist: (1) training these audio codec models can be difficult due to the lack of publicly available training processes and the need for large-scale data and GPUs; (2) achieving good reconstruction performance requires many codebooks, which increases the burden on generation models. In this study, we propose a group-residual vector quantization (GRVQ) technique and use it to develop a novel \textbf{Hi}gh \textbf{Fi}delity Audio Codec model, HiFi-Codec, which only requires 4 codebooks. We train all the models using publicly available TTS data such as LibriTTS, VCTK, AISHELL, and more, with a total duration of over 1000 hours, using 8 GPUs. Our experimental results show that HiFi-Codec outperforms Encodec in terms of reconstruction performance despite requiring only 4 codebooks. To facilitate research in audio codec and generation, we introduce AcademiCodec, the first open-source audio codec toolkit that offers training codes and pre-trained models for Encodec, SoundStream, and HiFi-Codec.
https://github.com/descriptinc/descript-audio-codec
- Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
- State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio
- - With Descript Audio Codec, you can compress 44.1 KHz audio into discrete codes at a low 8 kbps bitrate.
  - That's approximately 90x compression while maintaining exceptional fidelity and minimizing artifacts.
  - Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.
  - It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.)
- https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5
  - Descript Audio Codec Welcome to the demo page for the paper “High Fidelity Compression Algorithm with Improved RVQGAN”. Here, we provide samples from our ablation studies and other competitive baselines.
- https://arxiv.org/abs/2306.06546
  - High-Fidelity Audio Compression with Improved RVQGAN
  - Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses. We compress all domains (speech, environment, music, etc.) with a single universal model, making it widely applicable to generative modeling of all audio. We compare with competing audio compression algorithms, and find our method outperforms them significantly. We provide thorough ablations for every design choice, as well as open-source code and trained model weights. We hope our work can lay the foundation for the next generation of high-fidelity audio modeling.
- https://github.com/DBraun/DAC-JAX
  - DAC-JAX
  - A JAX Implementation of the Descript Audio Codec
  - Descript Audio Codec (.dac) is a high-fidelity general neural audio codec introduced in the paper "High-Fidelity Audio Compression with Improved RVQGAN". This repository is an unofficial JAX implementation of the PyTorch-based DAC and has no affiliation with Descript.
https://github.com/AudiogenAI/agc
- Audiogen Codec (agc) We are announcing the open source release of Audiogen Codec (agc) 🎉. A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵.
  
  It comes in two flavors:
  - agc-continuous 🔄 KL regularized, 32 channels, 100hz.
  - agc-discrete 🔢 24 stages of residual vector quantization, 50hz.
  AGC (Audiogen Codec) is a convolutional autoencoder based on the DAC architecture, which holds SOTA 🏆. We found that training with EMA and adding a perceptual loss term with CLAP features improved performance. These codecs, being low compression, outperform Meta's EnCodec and DAC on general audio as validated from internal blind ELO games 🎲.
  
  We trained (relatively) very low compression codecs in the pursuit of solving a core issue regarding general music and audio generation, low acoustic quality and audible artifacts, which hinder industry use for these models 🚫🎶. Our hope is to encourage researchers to build hierarchical generative audio models that can efficiently use high sequence length representations without sacrificing semantic abilities 🧠.
  
  This codec will power Audiogen's upcoming models. Stay tuned! 🚀
- https://audiogen.notion.site/Audiogen-Codec-Examples-546fe64596f54e20be61deae1c674f20
  - Audiogen Codec Examples
https://github.com/facebookresearch/encodec
- EnCodec: High Fidelity Neural Audio Compression
- State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
- We provide our two multi-bandwidth models:
  - A causal model operating at 24 kHz on monophonic audio trained on a variety of audio data.
  - A non-causal model operating at 48 kHz on stereophonic audio trained on music-only data.
  The 24 kHz model can compress to 1.5, 3, 6, 12 or 24 kbps, while the 48 kHz model support 3, 6, 12 and 24 kbps. We also provide a pre-trained language model for each of the models, that can further compress the representation by up to 40% without any further loss of quality.
- https://github.com/facebookresearch/encodec#-transformers
  - Encodec has now been added to Transformers. For more information, please refer to Transformers' Encodec docs
    - https://huggingface.co/docs/transformers/main/en/model_doc/encodec
  - Using 🤗 Transformers, you can leverage Encodec at scale along with all the other supported models and datasets.
- https://arxiv.org/abs/2210.13438
  - High Fidelity Neural Audio Compression
  - We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective, architectural changes and a study of various perceptual loss functions. We present an extensive subjective evaluation (MUSHRA tests) together with an ablation study for a range of bandwidths and audio domains, including speech, noisy-reverberant speech, and music. Our approach is superior to the baselines methods across all evaluated settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio.

Audio Separation / Stem Splitting / Sound Demixing / Music Source Separation

https://mvsep.com/
- Music & Voice Separation MVSEP performs separation of audio on voice and music parts
- https://mvsep.com/en/quality_checker
  - Quality Checker
  - On this page you can find a tools for checking the quality of models for splitting tracks into different stems like vocal, bass, drums etc. As well as a table of the last performed checks.
- https://mvsep.com/en/quality
  - Comparison of algorithms quality
  - There are a lot of algorithms at MVSep now. Which algorithm to choose?
    - If you need good isolated vocals or instrumental then use one of: Ultimate Vocal Remover HQ, MDX-B, Demucs3 (Model B)
    - If you need good bass, drums, other: Demucs3 (Model B)
    For comparsion of algorithm we use SDR (signal-to-distortion ratio) metric. The larger the metric the better the result of algorithm.
- https://mvsep.com/quality_checker/synth_leaderboard
  - Synth Leaderboard (Full)
- https://mvsep.com/quality_checker/synth_leaderboard?ensemble=0
  - Synth Leaderboard (Single Models)
- https://mvsep.com/quality_checker/synth_leaderboard?ensemble=1
  - Synth Leaderboard (Ensemble)
- https://mvsep.com/quality_checker/multisong_leaderboard
  - Multisong Leaderboard
  - https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum
    - Multisong Leaderboard (Instrum)
  - https://mvsep.com/quality_checker/multisong_leaderboard?sort=vocals
    - Multisong Leaderboard (Vocals)
  - https://mvsep.com/quality_checker/multisong_leaderboard?sort=bass
    - Multisong Leaderboard (Bass)
  - https://mvsep.com/quality_checker/multisong_leaderboard?sort=drums
    - Multisong Leaderboard (Drums)
  - https://mvsep.com/quality_checker/multisong_leaderboard?sort=other
    - Multisong Leaderboard (Other)
- https://mvsep.com/quality_checker/other_leaderboards
  - Other Leaderboards
  - Piano
  - Lead/Back Vocals
  - Guitar
  - Medley Vox
  - Strings
  - Wind
  - DNR v3 Test
  - Super Resolution Checker for Music
  - Drums Separation (5 stems)
  - Male/Female vocals separation
https://paperswithcode.com/task/music-source-separation
- Music Source Separation
- Music source separation is the task of decomposing music into its constitutive components, e. g., yielding separated stems for the vocals, bass, and drums.
- https://paperswithcode.com/task/music-source-separation#benchmarks
  - Benchmarks
    
    These leaderboards are used to track progress in Music Source Separation
    - MUSDB18
    - MUSDB18-HQ
    - Slakh2100
  - https://paperswithcode.com/sota/music-source-separation-on-musdb18
    - Music Source Separation on MUSDB18
    - Leaderboard
  - https://paperswithcode.com/sota/music-source-separation-on-musdb18-hq
    - Music Source Separation on MUSDB18-HQ
    - Leaderboard
  - https://paperswithcode.com/sota/music-source-separation-on-slakh2100
    - Music Source Separation on Slakh2100
    - Leaderboard
- https://paperswithcode.com/task/music-source-separation#task-libraries
  - Libraries
    
    Use these libraries to find Music Source Separation models and implementations
- https://paperswithcode.com/task/music-source-separation#datasets
  - Datasets
- https://paperswithcode.com/task/music-source-separation#papers-list
  - Papers
https://transactions.ismir.net/search?q=Music%20Source%20Separation
- Searching for: Music Source Separation
https://transactions.ismir.net/articles/10.5334/tismir.171
- The Sound Demixing Challenge 2023 – Music Demixing Track
- This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX’23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions.
https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit?tab=t.0#heading=h.roiuj54hzww3
- Instrumental, vocal & other stems separation & mix/master guide - UVR/MDX/Demucs/GSEP & others
https://ultimatevocalremover.com/
- Ultimate Vocal Remover v5
- https://github.com/Anjok07/ultimatevocalremovergui
  - Ultimate Vocal Remover GUI
  - GUI for a Vocal Remover that uses Deep Neural Networks.
  - This application uses state-of-the-art source separation models to remove vocals from audio files. UVR's core developers trained all of the models provided in this package (except for the Demucs v3 and v4 4-stem models).
  - Anjok07/ultimatevocalremovergui#344
    - The answer to the most asked question: What is the model which provides the best results? [Read this, very important info inside!]
  - Anjok07/ultimatevocalremovergui#430
    - BSRNN code is release
    - Band-Split RNN code is release : yoongi43/music_source_separation
    - https://github.com/yoongi43/music_source_separation
      - Models
        
        Band-split rnn
        
        Band-split Conformer
    - Would love to see this added, as it's currently top of the MUSDB18 benchmarks for vocals:
      
      https://paperswithcode.com/sota/music-source-separation-on-musdb18
      
      SDR (vocals): 10.47
      
      https://paperswithcode.com/sota/music-source-separation-on-musdb18-hq
      
      SDR (vocals): 10.47
      
      See also:
      
      https://github.com/amanteur/BandSplitRNN-Pytorch
      
      Unofficial PyTorch implementation of Music Source Separation with Band-split RNN
      
      https://github.com/crlandsc/music-demixing-with-band-split-rnn
      
      An unofficial PyTorch implementation of Music Source Separation with Band-split RNN for MDX-23 ("Label Noise" Track)
- https://buymeacoffee.com/uvr5/vip-model-download-instructions
  - VIP Model Download Instructions:
    - Make you have UVR v5.4 installed
    - Click the "Settings" button (it's the wrench icon to the left of the conversion button)
    - Go to the "Download Center" tab
    - Click the button with the key icon at the bottom.
    - Input the following and make sure "VIP" is in all caps -
      
      User Code: VIP
      
      Download Code: 02aeb35c203ed0a9
    Now you will see the VIP models available for download!
https://github.com/stemrollerapp/stemroller
- StemRoller
  
  StemRoller is the first free app which enables you to separate vocal and instrumental stems from any song with a single click! StemRoller uses Facebook's state-of-the-art Demucs algorithm for demixing songs and integrates search results from YouTube.
  
  Simply type the name/artist of any song into the search bar and click the Split button that appears in the results! You'll need to wait several minutes for splitting to complete. Once stems have been extracted, you'll see an Open button next to the song - click that to access your stems!
https://audiostrip.co.uk/#isolate
- AudioStrip
- Near Perfect Instrumental And Vocal Isolation For Free!
https://www.lalal.ai/
- LALAL.AI
- Extract vocal, accompaniment and various instruments from any audio and video
- A next-generation vocal remover and music source separation service for fast, easy and precise stem extraction. Remove vocal, instrumental, drums, bass, piano, electric guitar, acoustic guitar, and synthesizer tracks without quality loss.
- https://www.lalal.ai/apps-and-plugins/
  - AI Vocal Remover App & Plugins Enhance your audio and video editing with the powerful AI tools available across multiple platforms. Extract 10 stems and remove noise on iOS, Android, Windows, macOS, and Linux.
https://vocalremover.org/
- Vocal Remover and Isolation
- Separate voice from music out of a song free with powerful AI algorithms
https://github.com/facebookresearch/demucs
- Demucs Music Source Separation
- Code for the paper Hybrid Spectrogram and Waveform Source Separation
- Important: As I am no longer working at Meta, this repository is not maintained anymore. I've created a fork at github.com/adefossez/demucs. Note that this project is not actively maintained anymore and only important bug fixes will be processed on the new repo.
- https://github.com/adefossez/demucs
  - This is the officially maintained Demucs now that I (Alexandre Défossez) have left Meta to join Kyutai. Note that I'm not actively working on Demucs anymore, so expect slow replies and no new feature for now.

Audio Super Resolution

https://github.com/haoheliu/versatile_audio_super_resolution
- AudioSR: Versatile Audio Super-resolution at Scale
- Versatile audio super resolution (any -> 48kHz) with AudioSR.
- Pass your audio in, AudioSR will make it high fidelity!
  
  Work on all types of audio (e.g., music, speech, dog, raining, ...) & all sampling rates.
- https://replicate.com/nateraw/audio-super-resolution
  - AudioSR: Versatile Audio Super-resolution at Scale

Unsorted

https://cassetteai.com/
- Cassette is your Copilot for AI Music Generation.
  
  Our cutting edge Artificial Intelligence technology built using Latent Diffusion models (LDMs) makes music production, customization & listening available to everyone. Creating music is now as simple as writing a prompt.

Infrastructure and Hosting (Cloud GPUs, etc)

TODO: Fill this section out with more details.

Runpod

https://www.runpod.io/
- Runpod
- AI infrastructure developers trust
- Everything you need to train, deploy, and scale AI all in one place.
- https://www.runpod.io/pricing
- https://www.runpod.io/changelog
  - Runpod Changelog
  - Release notes on what's new, improved, and fixed
- https://www.runpod.io/product/runpod-hub
  - Runpod Hub
  - The fastest way to fork and deploy open-source AI.
  - Customize, launch, and contribute to open-source packages–from GitHub to production
- https://www.runpod.io/product/cloud-gpus
  - Cloud GPUs
  - High-performance GPUs on demand.
  - Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend.
- https://www.runpod.io/product/serverless
  - Serverless
  - Dedicated Serverless GPU API endpoints
  - Skip the infrastructure headaches. Our auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating.
- https://www.runpod.io/product/instant-clusters
  - Instant Clusters
  - 3,200 Gbps Infiniband GPU Clusters On-Demand
  - Launch high-performance multi-node GPU clusters for AI, ML, LLMs, and HPC workloads—fully optimized, rapidly deployed, and cost-effective.
- https://docs.runpod.io/overview
  - Docs
  - Explore our guides and examples to deploy your AI/ML application on Runpod.
  - Runpod is a cloud computing platform built for AI, machine learning, and general compute needs. Whether you’re running deep learning models, training AI, or deploying cloud-based applications, Runpod provides scalable, high-performance GPU and CPU resources to power your workloads.

Vector Databases/Search, Similarity Search, Clustering, etc

See Also:
- Vector Embedding Databases (0xdevalias' gist - subsection)
TODO: add more things here

Faiss

https://github.com/facebookresearch/faiss
- Faiss
- A library for efficient similarity search and clustering of dense vectors.
- Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed primarily at Meta's Fundamental AI Research group.
- https://faiss.ai/

Benchmarks / Leaderboards

See also:
- Agent Benchmarks / Leaderboards
- Code Embeddings - Benchmarks / Leaderboards / etc (0xdevalias' gist - subsection)
https://chat.lmsys.org/
- LMSYS Chatbot Arena Leaderboard
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
- Open LLM Leaderboard
https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard
- Open Medical-LLM Leaderboard
- https://huggingface.co/blog/leaderboard-medicalllm
  - The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare
https://github.com/EleutherAI/lm-evaluation-harness
- Language Model Evaluation Harness
- A framework for few-shot evaluation of language models.
https://github.com/openai/evals
- OpenAI Evals
- Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
- Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to write your own custom evals for use cases you care about. You can also use your data to build private evals which represent the common LLMs patterns in your workflow without exposing any of that data publicly.
  
  If you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.
https://github.com/openai/simple-evals
- This repository contains a lightweight library for evaluating language models. We are open sourcing it so we can be transparent about the accuracy numbers we're publishing alongside our latest models (starting with gpt-4-turbo-2024-04-09). Evals are sensitive to prompting, and there's significant variation in the formulations used in recent publications and libraries. Some use few-shot prompts or role playing prompts ("You are an expert software programmer..."). These approaches are carryovers from evaluating base models (rather than instruction/chat-tuned models) and from models that were worse at following instructions.
  
  For this library, we are emphasizing the zero-shot, chain-of-thought setting, with simple instructions like "Solve the following multiple choice problem". We believe that this prompting technique is a better reflection of the models' performance in realistic usage.

Prompts / Prompt Engineering / etc

https://github.com/mshumer/gpt-prompt-engineer
- gpt-prompt-engineer Prompt engineering is kind of like alchemy. There's no clear way to predict what will work best. It's all about experimenting until you find the right prompt. gpt-prompt-engineer is a tool that takes this experimentation to a whole new level.
  
  Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best.
- Prompt Testing: The real magic happens after the generation. The system tests each prompt against all the test cases, comparing their performance and ranking them using an ELO rating system.
- ELO Rating System: Each prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This way, you can easily see which prompts are the most effective.
  - https://en.wikipedia.org/wiki/Elo_rating_system
    - The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess.
    - The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.
    - A player's Elo rating is a number which may change depending on the outcome of rated games played. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. If the higher-rated player wins, then only a few rating points will be taken from the lower-rated player. However, if the lower-rated player scores an upset win, many rating points will be transferred. The lower-rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. Players whose ratings are too low or too high should, in the long run, do better or worse correspondingly than the rating system predicts and thus gain or lose rating points until the ratings reflect their true playing strength.
https://github.com/dair-ai/Prompt-Engineering-Guide
- Prompt Engineering Guide
- https://www.promptingguide.ai/
https://github.com/daveshap/ChatGPT_Custom_Instructions
- Repo of custom instructions that you can use for ChatGPT
https://github.com/daveshap/PTSD_prompts
- GPT based PTSD experiments - USE AT OWN RISK - EXPERIMENTAL ONLY
https://github.com/yzfly/Awesome-Multimodal-Prompts
- Awesome Multimodal Prompts
- Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
https://arxiv.org/abs/2402.03620
- Self-Discover: Large Language Models Self-Compose Reasoning Structures Submitted on 6 Feb 2024 We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

Other Useful Tools / Libraries / etc

Unsorted

See Also
- https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#profiling
https://github.com/pypa/pipx
- pipx — Install and Run Python Applications in Isolated Environments
- https://pipx.pypa.io/stable/
  - pipx is a tool to help you install and run end-user applications written in Python. It's roughly similar to macOS's brew, JavaScript's npx, and Linux's apt.
    
    It's closely related to pip. In fact, it uses pip, but is focused on installing and managing Python packages that can be run from the command line directly as applications.
    - https://pipx.pypa.io/stable/comparisons/
      - Comparison to Other Tools
    - https://pipx.pypa.io/stable/how-pipx-works/
      - How pipx works
https://pipedream.com/requestbin
- Request Bin
- Inspect webhooks and HTTP requests Get a URL to collect HTTP or webhook requests and inspect them in a human-friendly way. Optionally connect APIs, run code and return a custom response on each request.
https://github.com/googleapis/release-please
- Release Please Release Please automates CHANGELOG generation, the creation of GitHub releases, and version bumps for your projects.
  
  It does so by parsing your git history, looking for Conventional Commit messages, and creating release PRs.
  
  It does not handle publication to package managers or handle complex branch management.
- https://github.com/google-github-actions/release-please-action
  - automated releases based on conventional commits
  - Release Please Action Automate releases with Conventional Commit Messages.
- https://www.conventionalcommits.org/
https://github.com/winstonjs/winston
- winston A logger for just about everything.
- https://github.com/winstonjs/winston#usage
https://github.com/tldraw/tldraw
- a very good whiteboard
- tldraw is a collaborative digital whiteboard available at tldraw.com. Its editor, user interface, and other underlying libraries are open source and available in this repository. They are also distributed on npm. You can use tldraw to create a drop-in whiteboard for your product or as the foundation on which to build your own infinite canvas applications.
- https://tldraw.dev/
  - You can use the Tldraw React component to embed a fully featured and extendable whiteboard in your app.
  - For multiplayer whiteboards, you can plug the component into the collaboration backend of your choice.
  - You can use the Editor API to create, update, and delete shapes, control the camera—or do just about anything else. You can extend tldraw with your own custom shapes and custom tools. You can use our user interface overrides to change the contents of menus and toolbars, or else hide the UI and replace it with your own.
  - If you want to go even deeper, you can use the TldrawEditor component as a more minimal engine without the default tldraw shapes or user interface.
JavaScript (full text) Search Libraries
- https://www.npmjs.com/search?q=full%20text%20search
- https://byby.dev/js-search-libraries
- https://github.com/nextapps-de/flexsearch
  - Next-Generation full text search library for Browser and Node.js
  - Web's fastest and most memory-flexible full-text search library with zero dependencies.
  - When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching.
    
    Depending on the used options it also provides the most memory-efficient index. FlexSearch introduce a new scoring algorithm called "contextual index" based on a pre-scored lexical dictionary architecture which actually performs queries up to 1,000,000 times faster compared to other libraries. FlexSearch also provides you a non-blocking asynchronous processing model as well as web workers to perform any updates or queries on the index in parallel through dedicated balanced threads.
  - https://github.com/nextapps-de/flexsearch#consumption
    - Memory Consumption
  - https://nextapps-de.github.io/flexsearch/bench/
    - Benchmark of Full-Text-Search Libraries (Stress Test)
  - https://nextapps-de.github.io/flexsearch/bench/match.html
    - Relevance Scoring Comparison
  - https://github.com/angeloashmore/react-use-flexsearch
    - React hook to search a FlexSearch index
    - The useFlexSearch hook takes your search query, index, and store and returns results as an array. Searches are memoized to ensure efficient searching.
- https://github.com/krisk/fuse
  - Lightweight fuzzy-search, in JavaScript
  - Fuse.js is a lightweight fuzzy-search, in JavaScript, with zero dependencies.
  - https://www.fusejs.io/
- https://github.com/weixsong/elasticlunr.js
  - Based on lunr.js, but more flexible and customized.
  - Elasticlunr.js Elasticlunr.js is a lightweight full-text search engine developed in JavaScript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting, field search, more rational scoring/ranking methodology, fast computation speed and so on. Elasticlunr.js is a bit like Solr, but much smaller and not as bright, but also provide flexible configuration, query-time boosting, field search and other features.
  - Contributor Welcome!!! As I'm now focusing on new domain, hope that someone who interested on this project could help to maintain this repository.
  - http://elasticlunr.com/
- https://github.com/olivernn/lunr.js
  - Lunr.js A bit like Solr, but much smaller and not as bright
  - Lunr.js is a small, full-text search library for use in the browser. It indexes JSON documents and provides a simple search interface for retrieving documents that best match text queries.
  - For web applications with all their data already sitting in the client, it makes sense to be able to search that data on the client too. It saves adding extra, compacted services on the server. A local search index will be quicker, there is no network overhead, and will remain available and usable even without a network connection.
  - https://lunrjs.com/
- https://github.com/apache/solr
  - Apache Solr
  - Solr is the popular, blazing fast open source search platform for all your enterprise, e-commerce, and analytics needs, built on Apache Lucene.

Node-based UI's, Graph Execution, Flow Based Programming, etc

https://github.com/xyflow/awesome-node-based-uis
- A curated list with resources about node-based UIs
https://github.com/xyflow/xyflow
- React Flow | Svelte Flow - Powerful open source libraries for building node-based UIs with React (https://reactflow.dev) or Svelte (https://svelteflow.dev). Ready out-of-the-box and infinitely customizable.
- https://www.xyflow.com/
  - Powerful open source libraries for building node-based UIs with React or Svelte. Ready out-of-the-box and infinitely customizable
- https://reactflow.dev
  - Wire Your Ideas with React Flow A customizable React component for building node-based editors and interactive diagrams
- https://svelteflow.dev
  - Wire Your Ideas with Svelte Flow A customizable Svelte component for building node-based editors and interactive diagrams by the creators of React Flow
- https://github.com/tisoap/react-flow-smart-edge
  - React Flow Smart Edge Custom Edges for React Flow that never intersect with other nodes, using pathfinding.
- https://github.com/beeglebug/behave-flow
  - Behave Flow Behave Flow is a UI for editing behave-graph behaviour graphs using react-flow
  - https://github.com/bhouston/behave-graph
    - Behave-Graph Open, extensible, small and simple behaviour-graph execution engine.
    - Behave-Graph is a standalone library that implements the concept of "behavior graphs" as a portable TypeScript library with no required external run-time dependencies. Behavior graphs are expressive, deterministic, and extensible state machines that can encode arbitrarily complex behavior.
      
      Behavior graphs are used extensively in game development as a visual scripting language. For example, look at Unreal Engine Blueprints or Unity's Visual Scripting or NVIDIA Omniverse's OmniGraph behavior graphs.
      
      This library is intended to follow industry best practices in terms of behavior graphs. It is also designed to be compatible with these existing implementations in terms of capabilities. Although, like all node-based systems, behavior graphs are always limited by their node implementations.
    - https://github.com/bhouston/behave-graph#command-line-examples
      - Command Line Examples The example behavior graphs are in the /examples folder. You can execute these from the command line to test out how this library works.
    - https://github.com/bhouston/behave-graph/tree/main/docs
      - https://github.com/bhouston/behave-graph/blob/main/docs/Abstractions.md
        
        Abstractions
        
        Behave-graph is designed as a light weight library that can be plugged into other engines, such as Three.js or Babylon.js. In order to simplify pluggin into other engines, it defines the functionality required for interfacing with these engines as "abstractions", which can then be implemented by the engines.
      - https://github.com/bhouston/behave-graph/blob/main/docs/ExecutionModel.md
        
        Behave-Graph Execution Pseudocode
        
        Based nearly exactly from http://github.com/bhouston/behave-graph, specifically these files:
        
        https://github.com/bhouston/behave-graph/blob/main/packages/core/src/Execution/Engine.ts
        
        https://github.com/bhouston/behave-graph/blob/main/packages/core/src/Execution/Fiber.ts
        
        https://github.com/bhouston/behave-graph/blob/main/packages/core/src/Execution/resolveSocketValue.ts
      - https://github.com/bhouston/behave-graph/blob/main/docs/TypesOfNodes.md
      - https://github.com/bhouston/behave-graph/blob/main/docs/Values.md
        
        Behave-graph supports a pluggable value system where you can easily add new values to the system. Values are what are passed between nodes via sockets.
        
        Values are registered into the central registry as instances of the ValueType class. The value type class controls creation, serialization, deserialization.
    - bhouston/behave-graph#166
      - Merge behave-flow as a package @behave-graph/react-flow
      - https://github.com/bhouston/behave-graph/tree/main/packages/flow
        
        Behave Flow Behave Flow is a UI for editing behave-graph behaviour graphs using react-flow.
https://github.com/retejs/rete
- JavaScript framework for visual programming
- Rete.js is a framework for creating visual interfaces and workflows. It provides out-of-the-box solutions for visualization using various libraries and frameworks, as well as solutions for processing graphs based on dataflow and control flow approaches.
- https://retejs.org/
  - A tailorable TypeScript-first framework for creating processing-oriented node-based editors
  - https://retejs.org/examples
    - https://retejs.org/examples/processing/dataflow
      - Data Flow
        
        This example showcases a data processing pipeline using rete-engine, where data flows from left to right through nodes. Each node features a data method, which receives arrays of incoming data from their respective input sockets and delivers an object containing data corresponding to the output sockets. To initiate their execution, you can make use of the engine.fetch method by specifying the identifier of the target node. Consequently, the engine will execute all predecessors recursively, extracting their output data and delivering it to the specified node.
    - https://retejs.org/examples/processing/control-flow
      - Control Flow
        
        This example showcases an executing of schema via control flow using rete-engine, where each node dynamically decides which of its outgoing nodes will receive control. Each node features an execute method that takes an input port key as a control source, and a function for conveying control to outgoing nodes through a defined output port. To initiate the execution of the flow, you can use engine.execute method, specifying the identifier of the starting node. Consequently, the outgoing nodes will be executed sequentially, starting from the designated node.
    - https://retejs.org/examples/processing/hybrid-engine
      - Hybrid Engine
        
        This example shows how rete-engine allows for the simultaneous integration of both dataflow and control flow. Consequently, certain nodes serve as data sources, others manage the flow, and a third set incorporates both of these approaches.
    - https://retejs.org/examples/modules
      - This example showcases a schema reusability technique, where processing is carried out using DataflowEngine. This is accomplished by creating a dedicated Module node that loads a nested schema containing Input and Output nodes, subsequently generating corresponding sockets. As a result, the module node initializes the engine, feeds it with input data, executes it, and retrieves the output data.
    - https://retejs.org/examples/scopes
      - Scopes
        
        The structures shown in this example may also be referred to as subgraphs or nested nodes. This functionality is achieved using the advanced rete-scopes-plugin plugin. Changing a node's parent is easy: simply long-press the node and move it over the new parent node.
    - https://retejs.org/examples/selectable-connections
      - Selectable connections The editor doesn't offer a built-in connection selection feature. However, if you're using BidirectFlow and can't delete connections from UI, or you need to select connections for other purposes, you can create a custom connection and sync it with AreaExtensions.selector
    - https://retejs.org/examples/reroute
      - Reroute This particular example shows the usage of a plugin designed for user-controlled connection rerouting. Users can insert rerouting points by clicking on a connection or remove them by right-clicking. These points can be dragged or selected by users (similarly to nodes) to move multiple points at once.
    - https://retejs.org/examples/codegen
      - Code generation This example showcases the embedding of Rete Studio's Playground, enabling you to input JavaScript code and check its graph representation, which can also be transformed into JavaScript code.
      - https://github.com/retejs/rete-studio
        
        Rete Studio Rete Studio is a general-purpose code generation tool powered by Rete.js. Its primary goal is to seamlessly bridge the gap between textual and visual programming languages. With Rete Studio, you can transform a textual programming language into a visual representation, which can then be transformed back into textual language.
        
        https://studio.retejs.org/
        
        A general-purpose code generation tool powered by Rete.js
        
        https://studio.retejs.org/playground
        
        https://studio.retejs.org/lab
        
        https://studio.retejs.org/editor
  - https://retejs.org/docs
    - Visualization: you can choose React.js, Vue.js, Angular or Svelte to visualize nodes, sockets, controls, and connections. These visual components can be tailored to your specific needs by creating custom components for each framework, and they can all coexist in a single editor.
    - Processing: the framework offers various types of engines that enable processing diagrams based on their nature, including dataflow and control flow. These types can be combined within the same graph.
  - https://retejs.org/docs/development/rete-kit
    - The purpose of this tool is to improve efficiency when developing plugins or projects using this framework.
  - https://retejs.org/docs/api/rete-engine
    - DataflowEngine is a plugin that integrates Dataflow with NodeEditor making it easy to use. Additionally, it provides a cache for the data of each node in order to avoid recurring calculations.
    - ControlFlowEngine is a plugin that integrates ControlFlow with NodeEditor making it easy to use
https://github.com/graphology/graphology
- Graphology graphology is a robust & multipurpose Graph object for JavaScript and TypeScript.
  
  It aims at supporting various kinds of graphs with the same unified interface.
  
  A graphology graph can therefore be directed, undirected or mixed, allow self-loops or not, and can be simple or support parallel edges.
  
  Along with this Graph object, one will also find a comprehensive standard library full of graph theory algorithms and common utilities such as graph generators, layouts, traversals etc.
  
  Finally, graphology graphs are able to emit a wide variety of events, which makes them ideal to build interactive renderers for the browser.
- https://graphology.github.io/
https://github.com/cytoscape/cytoscape.js
- Graph theory (network) library for visualisation and analysis
- https://js.cytoscape.org/
  - https://js.cytoscape.org/#notation
    - Cytoscape.js supports many different graph theory usecases. It supports directed graphs, undirected graphs, mixed graphs, loops, multigraphs, compound graphs (a type of hypergraph), and so on.
  - https://js.cytoscape.org/#core/graph-manipulation
https://github.com/jagenjo/litegraph.js
- A graph node engine and editor written in Javascript similar to PD or UDK Blueprints, comes with its own editor in HTML5 Canvas2D. The engine can run client side or server side using Node. It allows to export graphs as JSONs to be included in applications independently.
https://github.com/noflo/noflo
- NoFlo: Flow-based programming for JavaScript NoFlo is an implementation of flow-based programming for JavaScript running on both Node.js and the browser. From WikiPedia:
  
  In computer science, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.
- NoFlo itself is just a library for implementing flow-based programs in JavaScript. There is an ecosystem of tools around NoFlo and the fbp protocol that make it more powerful. Here are some of them:
  - Flowhub -- browser-based visual programming IDE for NoFlo and other flow-based systems
  - noflo-nodejs -- command-line interface for running NoFlo programs on Node.js
  - noflo-browser-app -- template for building NoFlo programs for the web
  - noflo-assembly -- industrial approach for designing NoFlo programs
  - fbp-spec -- data-driven tests for NoFlo and other FBP environments
  - flowtrace -- tool for retroactive debugging of NoFlo programs. Supports visual replay with Flowhub
  See also the list of reusable NoFlo modules on NPM.
- https://noflojs.org/
  - https://noflojs.org/visualize/
    - FBP Graph Visualizer
- https://flowhub.io/ide/
  - Flowhub IDE is a tool for building full-stack applications in a visual way. With the ecosystem of flow-based programming environments, you can use Flowhub to create anything from distributed data processing applications to internet-connected artworks.
- https://flowbased.github.io/fbp-protocol/
  - FBP Network Protocol The Flow-Based Programming network protocol (FBP protocol) has been designed primarily for flow-based programming interfaces like the Flowhub to communicate with various FBP runtimes. However, it can also be utilized for communication between different runtimes, for example server-to-server or server-to-microcontroller.
  - https://github.com/flowbased/fbp
    - FBP flow definition language parser The fbp library provides a parser for a domain-specific language for flow-based-programming (FBP), used for defining graphs for FBP programming environments like NoFlo, MicroFlo and MsgFlo.
- https://en.wikipedia.org/wiki/Flow-based_programming
  - In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of black box processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.
  - https://en.wikipedia.org/wiki/Component-based_software_engineering
    - Component-based software engineering (CBSE), also called component-based development (CBD), is a style of software engineering that aims to build software out of loosely-coupled, modular components. It emphasizes the separation of concerns among different parts of a software system.

https://nodered.org/

Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways.

It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click.
https://github.com/node-red/node-red
- Low-code programming for event-driven applications

https://nodered.org/docs/api/modules/v/1.3/@node-red_runtime.html

@node-red/runtime This module provides the core runtime component of Node-RED. It does not include the Node-RED editor. All interaction with this module is done using the api provided.

https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/index.js#L125-L234

```
var redNodes = require("./nodes");
```

function start() {

Start the runtime

return redNodes.load().then(function() {

return redNodes.loadContextsPlugin().then(function () {
  redNodes.loadFlows().then(() => { redNodes.startFlows() }).catch(function(err) {});
  started = true;
});

https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/nodes/index.js#L198-L267

var registry = require("@node-red/registry");
var flows = require("../flows");
var context = require("./context");

module.exports = {
    // Lifecycle
    init: init,
    load: registry.load,

    // ..snip..

    // Flow handling
    loadFlows:  flows.load,
    startFlows: flows.startFlows,
    stopFlows:  flows.stopFlows,
    setFlows:   flows.setFlows,
    getFlows:   flows.getFlows,

    addFlow:     flows.addFlow,
    getFlow:     flows.getFlow,
    updateFlow:  flows.updateFlow,
    removeFlow:  flows.removeFlow,

    // ..snip..

    // Contexts
    loadContextsPlugin: context.load,
    closeContextsPlugin: context.close,
    listContextStores: context.listStores,
};

Unsorted

https://github.com/google-gemini/cookbook
- Gemini API Cookbook
- A collection of guides and examples for the Gemini API.
- This is a collection of guides and examples for the Gemini API, including quickstart tutorials for writing prompts and using different features of the API, and examples of things you can build.
- https://ai.google.dev/gemini-api/docs
  - Get started with Gemini API
https://github.com/NaturalNode/natural/
- Natural
- general natural language facilities for node
- "Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing.
- https://naturalnode.github.io/natural/
  - “Natural” is a general natural language facility for nodejs. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported.
  - https://naturalnode.github.io/natural/tfidf.html
    - tf-idf Term Frequency–Inverse Document Frequency (tf-idf) is implemented to determine how important a word (or words) is to a document relative to a corpus.
- https://blog.logrocket.com/natural-language-processing-node-js/
  - Natural language processing with Node.js
https://github.com/pytorch/torchtune
- A Native-PyTorch Library for LLM Fine-tuning
- https://github.com/pytorch/torchtune#llama3
  - torchtune supports fine-tuning for the Llama3 8B models with support for 70B on its way. We currently support LoRA, QLoRA and Full-finetune on a single GPU as well as LoRA and Full fine-tune on multiple devices.
- https://pytorch.org/blog/torchtune-fine-tune-llms/
https://llama.meta.com/llama3/
- Meta Llama 3 Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications
- https://github.com/meta-llama/llama3
  - Meta Llama 3
  - The official Meta Llama 3 GitHub site
  - We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.
    
    This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models — including sizes of 8B to 70B parameters.
    
    This repository is a minimal example of loading Llama 3 models and running inference. For more detailed examples, see llama-recipes.
    - https://github.com/meta-llama/llama-recipes
      - Llama Recipes: Examples to get started using the Llama models from Meta
      - Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
https://zapier.com/blog/train-chatgpt-to-write-like-you/
- How to train ChatGPT to write like you
https://github.com/EleutherAI/gpt-neox
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
- GPT-NeoX This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. This library is in widespread use in academic, industry, and government labs, including by researchers at Oak Ridge National Lab, CarperAI, Stability AI, Together.ai, Korea University, Carnegie Mellon University, and the University of Tokyo among others. Uniquely among similar libraries GPT-NeoX supports a wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.
  
  If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.
- https://github.com/EleutherAI/gpt-neox#why-gpt-neox
  - Why GPT-NeoX?
    
    GPT-NeoX leverages many of the same features and technologies as the popular Megatron-DeepSpeed library but with substantially increased usability and novel optimizations. Major features include:
    - Distributed training with ZeRO and 3D parallelism
    - A wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.
    - Cutting edge architectural innovations including rotary and alibi positional embeddings, parallel feedforward attention layers, and flash attention.
    - Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 & 2
    - Curriculum Learning
    - Easy connections with the open source ecosystem, including Hugging Face's tokenizers and transformers libraries, logging via WandB, and evaluation via our Language Model Evaluation Harness.
https://microsoft.github.io/promptflow/
- Prompt flow Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
  
  With prompt flow, you will be able to:
  - Create flows that link LLMs, prompts, Python code and other tools together in a executable workflow.
  - Debug and iterate your flows, especially the interaction with LLMs with ease.
  - Evaluate your flows, calculate quality and performance metrics with larger datasets.
  - Integrate the testing and evaluation into your CI/CD system to ensure quality of your flow.
  - Deploy your flows to the serving platform you choose or integrate into your app’s code base easily.
  - (Optional but highly recommended) Collaborate with your team by leveraging the cloud version of Prompt flow in Azure AI.
- https://microsoft.github.io/promptflow/concepts/concept-flows.html
  - Flows
  - While how LLMs work may be elusive to many developers, how LLM apps work is not - they essentially involve a series of calls to external services such as LLMs/databases/search engines, or intermediate data processing, all glued together.
- https://microsoft.github.io/promptflow/reference/index.html
  - Reference
- https://github.com/microsoft/autogen/tree/main/samples/apps/promptflow-autogen
  - Pomptflow Autogen Example
https://github.com/stanfordnlp/dspy
- DSPy: The framework for programming—not prompting—foundation models
- DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.
  
  To make this more systematic and much more powerful, DSPy does two things. First, it separates the flow of your program (modules) from the parameters (LM prompts and weights) of each step. Second, DSPy introduces new optimizers, which are LM-driven algorithms that can tune the prompts and/or the weights of your LM calls, given a metric you want to maximize.
  
  DSPy can routinely teach powerful models like GPT-3.5 or GPT-4 and local models like T5-base or Llama2-13b to be much more reliable at tasks, i.e. having higher quality and/or avoiding specific failure patterns. DSPy optimizers will "compile" the same program into different instructions, few-shot prompts, and/or weight updates (finetunes) for each LM. This is a new paradigm in which LMs and their prompts fade into the background as optimizable pieces of a larger system that can learn from data. tldr; less prompting, higher scores, and a more systematic approach to solving hard tasks with LMs.
- https://dspy-docs.vercel.app/
  - DSPy - Programming—not prompting—Language Models
  - The Way of DSPy
    - Systematic Optimization: Choose from a range of optimizers to enhance your program. Whether it's generating refined instructions, or fine-tuning weights, DSPy's optimizers are engineered to maximize efficiency and effectiveness.
    - Modular Approach: With DSPy, you can build your system using predefined modules, replacing intricate prompting techniques with straightforward, effective solutions.
    - Cross-LM Compatibility: Whether you're working with powerhouse models like GPT-3.5 or GPT-4, or local models such as T5-base or Llama2-13b, DSPy seamlessly integrates and enhances their performance in your system.
https://github.com/sgl-project/sglang
- SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
- https://lmsys.org/blog/2024-01-17-sglang/
  - Fast and Expressive LLM Inference with RadixAttention and SGLang
  - On the backend, we propose RadixAttention, a technique for automatic and efficient KV cache reuse across multiple LLM generation calls.
  - On the frontend, we develop a flexible domain-specific language embedded in Python to control the generation process. This language can be executed in either interpreter mode or compiler mode.
  - KV cache reuse means different prompts with the same prefix can share the intermediate KV cache and avoid redundant memory and computation.
  - To systematically exploit these reuse opportunities, we introduce RadixAttention, a novel technique for automatic KV cache reuse during runtime. Instead of discarding the KV cache after finishing a generation request, our approach retains the KV cache for both prompts and generation results in a radix tree. This data structure enables efficient prefix search, insertion, and eviction. We implement a Least Recently Used (LRU) eviction policy, complemented by a cache-aware scheduling policy, to enhance the cache hit rate.
  - On the frontend, we introduce SGLang, a domain-specific language embedded in Python. It allows you to express advanced prompting techniques, control flow, multi-modality, decoding constraints, and external interaction easily. A SGLang function can be run through various backends, such as OpenAI, Anthropic, Gemini, and local models.
  - Figure 5 shows a concrete example. It implements a multi-dimensional essay judge utilizing the branch-solve-merge prompting technique. This function uses LLMs to evaluate the quality of an essay from multiple dimensions, merges the judgments, generates a summary, and assigns a final grade.
  - The syntax of SGLang is largely inspired by Guidance. However, we additionally introduce new primitives and handle intra-program parallelism and batching
    - https://github.com/guidance-ai/guidance
      - Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use case—while reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly.
  - SGLang outperformed the baseline systems in all benchmarks, achieving up to 5 times higher throughput. It also excelled in terms of latency, particularly for the first token latency, where a prefix cache hit can be significantly beneficial. These improvements are attributed to the automatic KV cache reuse with RadixAttention, the intra-program parallelism enabled by the interpreter, and the co-design of the frontend and backend systems. Additionally, our ablation study revealed no noticeable overhead even in the absence of cache hits, leading us to always enable the RadixAttention feature in the runtime.
https://github.com/mozilla-Ocho/llamafile
- Distribute and run LLMs with a single file
- llamafile lets you distribute and run LLMs with a single file Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
- https://hacks.mozilla.org/2023/11/introducing-llamafile/
  - Introducing llamafile
https://github.com/microsoft/LLMLingua
- To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
- https://github.com/microsoft/LLMLingua/blob/main/examples/Retrieval.ipynb
  - We know that LLMs have a 'lost in the middle' issue, where the position of key information in the prompt significantly impacts the final result.
  - How to build an accurate positional relationship between the document and the question has become an important issue. We evaluated the effects of four types of reranker methods on a dataset (NaturalQuestions Multi-document QA that is very close to the actual RAG scenario, e.g. BingChat).
  - The results show that reranker-based methods are significantly better than embedding methods. The LongLLMLingua method is even better than the current SoTA reranker methods, and it can more accurately capture the relationship between the query and the document, thus alleviating the 'lost in the middle' issue.
- https://llmlingua.com/
  - (Long)LLMLingua | Designing a Language for LLMs via Prompt Compression
- https://blog.llamaindex.ai/longllmlingua-bye-bye-to-middle-loss-and-save-on-your-rag-costs-via-prompt-compression-54b559b9ddf7
  - LongLLMLingua: Bye-bye to Middle Loss and Save on Your RAG Costs via Prompt Compression
https://github.com/apoorvumang/prompt-lookup-decoding
- In several LLM use cases where you're doing input grounded generation (summarization, document QA, multi-turn chat, code editing), there is high n-gram overlap between LLM input (prompt) and LLM output. This could be entity names, phrases, or code chunks that the LLM directly copies from the input while generating the output. Prompt lookup exploits this pattern to speed up autoregressive decoding in LLMs.
- On both summarization and context-QA, we get a relatively consistent 2.4x speedup (on average).
- https://twitter.com/apoorv_umang/status/1728831397153104255
  - Prompt lookup decoding: Get 2x-4x reduction in latency for input grounded LLM generation with no drop in quality using this speculative decoding technique
- huggingface/transformers#27722
  - Adding support for prompt lookup decoding (variant of assisted generation)
- ggml-org/llama.cpp#4226
  - lookahead-prompt: add example
https://github.com/vercel/ai
- Vercel AI SDK The Vercel AI SDK is a library for building AI-powered streaming text and chat UIs.
- Build AI-powered applications with React, Svelte, Vue, and Solid
- https://sdk.vercel.ai/docs
  - Vercel AI SDK An open source library for building AI-powered user interfaces.
    
    The Vercel AI SDK is an open-source library designed to help developers build conversational streaming user interfaces in JavaScript and TypeScript. The SDK supports React/Next.js, Svelte/SvelteKit, and Vue/Nuxt as well as Node.js, Serverless, and the Edge Runtime.
https://github.com/oobabooga/text-generation-webui
- A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
- Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation.
- https://github.com/oobabooga/text-generation-webui-extensions
  - This is a directory of extensions for oobabooga/text-generation-webui
https://github.com/huggingface/chat-ui
- Open source codebase powering the HuggingChat app
https://github.com/lm-sys/FastChat
- FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
- FastChat powers Chatbot Arena, serving over 5 million chat requests for 30+ LLMs
  - https://chat.lmsys.org/
- Arena has collected over 100K human votes from side-by-side LLM battles to compile an online LLM Elo leaderboard
  - https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
https://github.com/philipturner/metal-benchmarks
- Apple GPU microarchitecture
- This document thoroughly explains the Apple GPU microarchitecture, focusing on its GPGPU performance. Details include latencies for each ALU assembly instruction, cache sizes, and the number of unique instruction pipelines. This document enables evidence-based reasoning about performance on the Apple GPU, helping people diagnose bottlenecks in real-world software. It also compares Apple silicon to generations of AMD and Nvidia microarchitectures, showing where it might exhibit different performance patterns. Finally, the document examines how Apple's design choices improve power efficiency compared to other vendors.
  
  This repository also contains open-source benchmarking scripts. They allow anyone to reproduce and verify the author's claims about performance. A complementary library reports the hardware specifications of any Apple-designed GPU.
  - https://github.com/philipturner/applegpuinfo
    - Print all known information about the GPU on Apple-designed chips
    - This is a mini-framework for querying parameters of an Apple-designed GPU. It also contains a command-line tool, gpuinfo, which reports information similarly to clinfo. It was co-authored with an AI.
    - https://github.com/Oblomov/clinfo
      - Print all known information about all available OpenCL platforms and devices in the system
      - clinfo is a simple command-line application that enumerates all possible (known) properties of the OpenCL platform and devices available on the system.
https://github.com/tinygrad/tinygrad
- You like pytorch? You like micrograd? You love tinygrad! ❤️
- This may not be the best deep learning framework, but it is a deep learning framework.
  
  Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.
- https://tinygrad.org/
https://github.com/microsoft/DirectML
- DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

0xdevalias/ai-ml-toolkit.md

AI / ML Toolkit

Table of Contents

See Also

My Other Related Deepdive Gist's and Projects

OpenRouter

ollama

llama.cpp

node-llama-cpp

vLLM

LiteLLM

Protocols / Standards / etc

Agent2Agent Protocol (A2A)

Model Context Protocol (MCP)

SDKs / Toolkits / etc

Google Agent Development Kit (ADK)

Vercel AI SDK / Toolkit

GenKit

LangChain, LangServe, LangSmith, LangFlow, LangGraph, etc

AI Agents / Assistants / etc

Agent Benchmarks / Leaderboards

OpenAI Assistants / ChatGPT custom GPTs

OpenGPTs

GitHub Copilot Agent / Workspace / CLI

Claude Code

OpenAI Codex CLI

Codex CLI: frontier reasoning in the terminal

aider

llm

Continue - Custom AI Code Assistant

Cline

Roo-Code (formerly Roo Cline)

Autogen / FLAML / etc

OpenHands (formerly OpenDevin)

SWE-agent

ChatDev

AutoCoder

OpenCodeInterpreter

OpenInterpreter

Unsorted

Code Generation / Execution

Code Leaderboards / Benchmarks

Vision / Multimodal

OpenAI

LLaVA / etc

Unsorted

Image Generation

Automatic1111 (Stable Diffusion WebUI)

ComfyUI

Unsorted

Song / Audio Generation

Udio

Suno

Stable Audio

AudioCraft: MusicGen, AudioGen, etc

Neural Audio Codecs

Audio Separation / Stem Splitting / Sound Demixing / Music Source Separation

Audio Super Resolution

Unsorted

See Also

Infrastructure and Hosting (Cloud GPUs, etc)

Runpod

Vector Databases/Search, Similarity Search, Clustering, etc

Faiss

Benchmarks / Leaderboards

Prompts / Prompt Engineering / etc

Other Useful Tools / Libraries / etc

Unsorted

Node-based UI's, Graph Execution, Flow Based Programming, etc

Unsorted

wd021 commented Jul 9, 2025

Uh oh!