Skip to content

Instantly share code, notes, and snippets.

#!/usr/bin/env python3
"""
Dynamic and Adaptive Python Environment in a Bubblewrap Sandbox
Overview:
This project demonstrates a minimal, safe, and self-contained Python execution
environment using bubblewrap (bwrap) and uv. The goal is to provide a lightweight
alternative to Docker for running agent code—in this case, dynamically generated
Python code along with its dependencies—within an isolated sandbox.
@grahama1970
grahama1970 / unsloth_student_teacher_training.py
Last active February 17, 2025 01:53
Student-Teacher-GRPO-Proof-of-Concept
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Enhanced Unsloth GRPO with Student-Teacher Reward Mechanism
------------------------------------------------------------
This script extends Unsloth's GRPO by implementing a novel student-teacher reward
mechanism for improved reasoning chains.
Key Components:
1. Student Model (Unsloth GRPO):
@grahama1970
grahama1970 / errors_to_json.py
Created February 15, 2025 15:43
Snippet: Example of using the bulk insert command for Python-Arango
import re
import json
from loguru import logger
from pathlib import Path
from arango import ArangoClient
from arango.database import StandardDatabase
from typing import Dict, Any
from search_tool.arangodb.generate_schema_for_llm.utils.arango_utils import initialize_database
from search_tool.shared_utils.json_utils import save_json_to_file
@grahama1970
grahama1970 / arango_dump.sh
Last active February 13, 2025 20:29
Arango Dump and Restore bash scripts
#!/bin/bash
# arangodump_db.sh
# This script dumps an ArangoDB database from a Docker container with a low memory batch size.
#
# Usage: ./arangodump_db.sh
# Make sure arangodump is in your PATH.
# This script dumps an ArangoDB database from a Docker container with a low memory batch size.
# The dump path follows the structure: backups/<database_name>/<timestamp>/arangodump
#
# Usage: ./arangodump_db.sh
@grahama1970
grahama1970 / create_error_code_collection.py
Last active February 12, 2025 13:27
This script handles the creation and population of an ArangoDB collection containing error codes and messages. It downloads an error code dataset from a specified URL, parses it into JSON format, and stores it in a collection for easy reference and querying. The script is designed to be run as a standalone utility to initialize or update the err…
"""
This script handles the creation and population of an ArangoDB collection containing error codes and messages.
It downloads an error code dataset from a specified URL, parses it into JSON format, and stores it in a collection
for easy reference and querying. The script is designed to be run as a standalone utility to initialize or update
the error code collection in an ArangoDB database.
"""
import requests
import csv
from arango import ArangoClient
@grahama1970
grahama1970 / 01_README.md
Last active February 8, 2025 13:18
This README details how SmolAgents leverages async tools like SummarizationTool for efficient LitelLLM processing. It explains handling long text with chunking, parallel API calls, and synthesis while maintaining a simple agent interface. Async tools improve performance, cost efficiency, and reliability. 🚀

🛠️ Async Tools for SmolAgents (LiteLLM)

This directory contains tools for handling LLM operations—most notably for document summarization—using asynchronous (async) techniques. Async Tools allow SmolAgents to process large inputs efficiently by splitting work into smaller chunks and running parallel API calls, all while keeping integration with the agent framework simple.


📏 Preventing Context Length Issues

The Problem

Large text documents can exceed the fixed context window of LLMs (e.g. GPT-4), which may result in:

@grahama1970
grahama1970 / 00_README.md
Last active February 7, 2025 20:44
This README details how SmolAgents leverages async tools like SummarizationTool for efficient LLM processing. It explains handling long text with chunking, parallel API calls, and synthesis while maintaining a simple agent interface. Async tools improve performance, cost efficiency, and reliability. 🚀

🚨 Bug Report: SmolAgents CodeAgent Truncates Input Before Tool Execution

Summary

SmolAgents' CodeAgent incorrectly processes input before invoking tools, leading to truncation of long text even when a tool is explicitly designed to handle it. This defeats the purpose of tools and prevents reliable execution in workflows requiring large text processing.

Steps to Reproduce

  1. Define a Tool (e.g., SummarizationTool) that accepts long text and chunks it internally.
  2. Register the Tool with CodeAgent.
  3. Send a large text input via agent.run(prompt).
  4. Observe the Issue:
@grahama1970
grahama1970 / 01_docker-compose-sglang.yml
Last active February 5, 2025 21:19
Trying to get Qwen2.5-VL-7B to work with CUDA 12.8.
services:
# ---------------------------
# SGLang Service
# ---------------------------
sglang-service:
# image: lmsysorg/sglang:latest
build:
context: .
dockerfile: Dockerfile_v2.sglang
container_name: sglang-service
from arango import ArangoClient
from loguru import logger
def validate_embeddings(db, collection_name, dimension):
"""Validate embeddings with boolean AQL result."""
try:
# Collection name interpolation (must be sanitized)
query = f"""
RETURN COUNT(
from arango import ArangoClient
from loguru import logger
def validate_embeddings(db, collection_name, dimension):
"""Validate embeddings with boolean AQL result."""
try:
# Collection name interpolation (must be sanitized)
query = f"""
RETURN COUNT(