Skip to content

Instantly share code, notes, and snippets.

@decagondev
Created April 4, 2026 17:37
Show Gist options
  • Select an option

  • Save decagondev/d3b3c9ebabc1d5cf288989470cc1eeb2 to your computer and use it in GitHub Desktop.

Select an option

Save decagondev/d3b3c9ebabc1d5cf288989470cc1eeb2 to your computer and use it in GitHub Desktop.

VegSentinel

Building Real-Time Produce Quality Detection with YOLOv9 and SOLID Modular Design

Before You Start: Pre-Search (time: 1-2 hours)

You must complete the Pre-Search appendix before writing a single line of code. Your Pre-Search output — the full saved AI conversation or detailed notes — is a required part of your final submission.

This week's methodology emphasis is SOLID principles and modular architecture. Pre-Search forces you to map every decision (YOLOv9 variant selection, Pygame event architecture, dataset pipeline, dependency injection) before any implementation. You will not be allowed to refactor core modules after the MVP deadline. The goal is production-grade maintainability: a codebase where the detector, UI, trainer, and data layers can be swapped or extended independently.

Background

In the retail and grocery sector, companies such as Walmart, Whole Foods, Kroger, and computer-vision startups like Clarifruit and Afresh Technologies deploy YOLO-based systems to automate fresh-produce inspection at scale. These tools identify spoilage and defects in real time, cutting food waste by 30-40 % and replacing inconsistent manual checks that cost stores millions annually in labor and shrink.

You will build VegSentinel: a standalone desktop application that loads a YOLOv9 model, processes images via PIL, detects vegetables, classifies each as good or rotten, and visualizes results in a Pygame UI. The core technical challenge is creating a fully modular, SOLID-compliant system that supports both inference (file or webcam) and an interactive trainer UI for building and exporting custom good/rotten datasets. Every module must follow Single Responsibility and Dependency Inversion so the same detector can later run headless on edge devices or in a cloud pipeline.

Gate: Project completion + interviews required for Austin admission.

Project Overview

One-week sprint with three deadlines:

Checkpoint Deadline Focus
Pre-Search Before any coding Constraints, architecture, stack decisions
MVP Tuesday EOD (24 hrs) Core detection + basic Pygame inference UI
Early Submission Friday EOD (4 days) Trainer UI, webcam, dataset export
Final Sunday 10:59 PM CT Polish, documentation, performance targets

MVP Requirements (24 Hours)

Hard gate. All items required to pass:

  • YOLOv9 model loaded via official Python API; PIL used to preprocess any input image to tensor
  • Pygame window (800×600 minimum) with two modes toggled by keyboard: Inference and Idle
  • File upload via drag-and-drop or dialog; image displayed with overlaid bounding boxes, class labels ("good" / "rotten"), and confidence scores
  • At least one vegetable detected and classified correctly on the three provided sample images (good apple, rotten banana, mixed vegetables)
  • Clean modular structure: separate detector.py, ui.py, and image_processor.py modules with no circular imports
  • Application launches and runs inference end-to-end on a standard laptop without crashes
  • Packaged executable (PyInstaller) included in repo root with clear README.md run instructions

A simple YOLOv9 detector with clean SOLID modules beats a complex multi-model system with tangled Pygame code.

Core Technical Requirements

Detection Layer

Feature Requirements
Model Loading Load YOLOv9 (nano or small variant) from local weights; support CPU/GPU auto-detection
PIL Preprocessing Convert any image (JPG/PNG) to RGB tensor; resize to model input size while preserving aspect ratio
Inference Run model.predict() with confidence threshold 0.45 and NMS; return list of (bbox, class_id, confidence)
Class Mapping Map YOLO class indices to "good_vegetable" or "rotten_vegetable" (or multi-class vegetable + quality)
Error Handling Graceful fallback if no objects detected; log PIL/YOLO exceptions without crashing UI

UI / Rendering Layer

Feature Requirements
Pygame Window 800×600 default; resizable; title "VegSentinel — Produce Quality Detector"
Overlay Rendering Draw rectangles, class text, and confidence using Pygame surfaces; color-code green/red
Mode Switching Keyboard (I = Inference, T = Trainer, ESC = quit); no global state leaks
Input Handling File drag-and-drop, button for "Open Image", webcam toggle

Data & Training Pipeline Layer

Feature Requirements
Sample Loader Load folder of images; display thumbnails in Pygame grid
Label Assignment Click-to-assign good/rotten per image or per detected object
YOLO Dataset Export Generate images/ and labels/ folders with normalized .txt annotation files

We will test:

  1. Upload a photo of a fresh carrot → single green box labeled "good_vegetable" with conf ≥ 0.6
  2. Upload a photo of a rotten tomato → red box labeled "rotten_vegetable" with conf ≥ 0.6
  3. Drag-and-drop multiple images → all processed sequentially without memory leak
  4. Webcam live feed (30 fps target) → boxes update in real time on vegetable movement
  5. Trainer mode: load 10 sample images → assign labels → export valid YOLO-format dataset
  6. Package executable on a clean Windows/Mac machine → runs without Python installed
  7. Edge case: image with zero vegetables → shows "No produce detected" message
  8. Edge case: corrupt image file → does not crash; shows user-friendly error

Performance Targets

Metric Target Measurement Method
Inference latency (single image, 640px) < 250 ms (CPU) / < 80 ms (GPU) Average of 50 runs
Webcam FPS ≥ 8 FPS Real-time overlay update
Dataset export time (50 images) < 8 seconds End-to-end timing
Memory usage (peak) < 1.2 GB psutil monitoring
Model size on disk ≤ 45 MB (nano variant) File size check
Classification accuracy (held-out test set) ≥ 82 % on rotten class Manual test set of 40 images

Domain-Specific Deep Section

Signature technical challenge: Interactive dataset builder and YOLOv9 fine-tuning pipeline inside a Pygame UI.

The trainer must let users rapidly build a custom good/rotten produce dataset that can be used to fine-tune the base YOLOv9 model.

Required capabilities (example flows):

  • Load directory of sample images → display grid of thumbnails
  • Click an image → enter "annotation mode" → draw bounding box with mouse drag (Pygame events) → assign class via hotkeys (G = good, R = rotten)
  • Auto-detect option: run current model inference, then let user correct labels
  • Export button → writes images + normalized .txt label files in standard YOLO format (class x_center y_center width height)

Tool / API signatures you must implement:

class ProduceDetector:
    def __init__(self, weights_path: str, device: str = "auto")
    def detect(self, pil_image: Image.Image) -> List[Detection]  # Detection = namedtuple(bbox, cls, conf)

class DatasetBuilder:
    def add_image(self, path: str, annotations: List[Annotation])
    def export_yolo_dataset(self, output_dir: str)

Evaluation criteria (exact input → expected output):

  • Input: clear photo of fresh broccoli → Output: one box, class=good_vegetable, conf > 0.7
  • Input: photo of moldy cucumber → Output: one box, class=rotten_vegetable, conf > 0.65
  • Input: mixed good/rotten basket → Output: multiple boxes with correct per-object labels

Implement at least 4 of the following:

  • Real-time webcam annotation with live preview
  • Support for 4+ vegetable types (carrot, banana, tomato, lettuce) with quality suffix
  • Confidence heatmap overlay on rotten regions
  • One-click "fine-tune" trigger that calls a training script (5 epochs minimum) and reloads updated weights
  • Dataset versioning (timestamped exports)

Performance targets specific to this section:

  • Bounding-box drawing latency < 16 ms per frame
  • Export 100 annotated images in < 12 seconds
  • Post-fine-tune [email protected] ≥ 0.78 on 40-image validation set

AI Cost Analysis (Required)

Development & Testing Costs (track in a cost_log.md):

  • Local GPU/CPU training time per epoch (hours)
  • Number of fine-tuning runs and total images processed
  • Peak VRAM usage during inference
  • Any cloud GPU hours (Colab / RunPod) if used

Production Cost Projections

Scale Daily Inferences Projected Daily Cost (edge device) Projected Daily Cost (cloud GPU) Notes
100 users 500 $0.00 (local) $0.45 Power draw only
1K users 5,000 $0.00 $4.20
10K users 50,000 $0.00 $38.00
100K users 500,000 $0.00 $340.00 Scale to multiple edge units

Include assumptions:

  • Average store scans 500 produce items per day
  • Model runs on NVIDIA Jetson or Intel NUC (CPU-only fallback)
  • Fine-tuning performed once per week on 200 new labeled images

Technical Stack

Layer Technology (choose any that help you ship)
Backend Python 3.10+
AI / Model YOLOv9 (official implementation or Ultralytics-compatible)
UI / Rendering Pygame 2.x
Image Processing PIL (Pillow) + OpenCV (for webcam capture)
Storage Local filesystem (YOLO dataset folders) or SQLite
Deployment PyInstaller (single executable)

Use whatever stack helps you ship. Complete the Pre-Search process to make informed decisions.

Build Strategy

Priority Order (start with hardest subsystem first):

  1. Core ProduceDetector class with YOLOv9 inference and PIL preprocessing (SOLID SRP enforced)
  2. Pygame rendering loop with overlay drawing and event handling
  3. Modular input handlers (file drag, webcam thread) using Dependency Inversion
  4. DatasetBuilder and annotation UI with mouse-driven bounding boxes
  5. Export pipeline to YOLO format
  6. Training script wrapper (optional fine-tune trigger)
  7. Packaging with PyInstaller + README
  8. Performance instrumentation and logging

Critical Guidance:

  • Use dependency injection for the detector so UI never imports YOLO directly
  • Keep Pygame surface updates in a single render() method
  • Never mutate global state — pass data explicitly
  • Write unit tests for detection output shape before UI integration
  • Profile inference latency after every major change
  • Document every module’s single responsibility in code comments

Required Documentation

Submit a 1-2 page architecture document (ARCHITECTURE.md) containing:

Section Content
Modules & Responsibilities One paragraph per module proving SRP and how it follows SOLID
Data Flow Diagram Text-based diagram (Mermaid or ASCII) showing image → processor → detector → UI
Extension Points Where new vegetable classes or models can be added without touching core code
Trade-off Decisions Why Pygame over Tkinter/Streamlit; YOLOv9 variant chosen

Submission Requirements

Deadline: Sunday 10:59 PM CT

Deliverable Requirements
GitHub Repository Public, clean history, requirements.txt, PyInstaller spec file
Demo Video (3-5 min) Walkthrough of file, webcam, trainer, and fine-tune flow
Pre-Search Document Full saved conversation or notes
Architecture Document ARCHITECTURE.md as specified
AI / Compute Cost Log cost_log.md with all numbers
Packaged Executable .exe or .app in GitHub Releases
Deployed Application Runnable on reviewer’s machine with zero setup
Social Post LinkedIn or X post tagging @GauntletAI with 30-second clip

Interview Preparation

Technical Topics:

  • How you enforced Dependency Inversion between UI and detector
  • Trade-offs between YOLOv9 nano vs. medium on retail hardware
  • Pygame event loop design and why it does not violate Open/Closed
  • Dataset annotation format decisions and why YOLO .txt was chosen
  • Performance bottlenecks you hit and how you resolved them
  • How the modular design would allow swapping to a future YOLOv10 or edge TFLite model

Mindset & Growth:

  • One decision you reversed after Pre-Search and why
  • How pressure of the 24-hour MVP changed your engineering approach
  • What surprised you most about integrating YOLOv9 with Pygame
  • One SOLID violation you caught during code review and fixed

Final Note

A simple YOLOv9 detector with clean SOLID modules beats a complex multi-model system with tangled Pygame code.

Gate: Project completion + interviews required for Austin admission.


Appendix: Pre-Search Checklist

Complete this before writing code. Save your AI conversation as a reference document.

Phase 1: Define Your Constraints (5 sections)

  1. Scale & Load Expectations

    • What is realistic throughput for a single store checkout or back-room inspection station (images per minute)?
    • Will the app run continuously during an 8-hour shift?
    • What is the maximum number of simultaneous webcam feeds you must support?
    • How many produce items per basket on average?
  2. Hardware & Budget

    • Target hardware: standard retail PC (CPU-only) or GPU-enabled?
    • What is your maximum acceptable model size on disk?
    • Local GPU hours budget for fine-tuning during development?
    • Any power-consumption limits for edge deployment?
  3. Timeline & Scope

    • Which features are explicitly MVP vs. nice-to-have within 7 days?
    • How many sample vegetable types will you support by final deadline?
    • Do you need multi-language labels or just English?
  4. Data Sensitivity & Compliance

    • Will any real customer/store images be used? If yes, how will you anonymize?
    • Are there any food-safety regulatory requirements for the classification output?
    • How will you handle biased datasets (e.g., only one vegetable type)?
  5. Team / Skills

    • Which SOLID principles are you personally weakest on?
    • Have you used Pygame event loops before?
    • Experience level with YOLO annotation formats?

Phase 2: Architecture Discovery (6 sections)

  1. YOLOv9 Model Selection

    • Which YOLOv9 variant (nano/small/medium) gives best speed/accuracy on your hardware?
    • Pre-trained on COCO vs. custom produce weights — which starting point?
    • How will you handle custom class mapping (good/rotten)?
  2. Pygame UI Architecture

    • How will you structure the main loop to support both inference and trainer modes without violating Open/Closed?
    • Strategy for mouse-driven bounding-box drawing without blocking inference?
    • How to separate rendering from business logic (Dependency Inversion)?
  3. Image Processing Pipeline

    • PIL vs. OpenCV for webcam frames — performance and compatibility trade-offs?
    • Exact resize strategy to maintain YOLO aspect ratio?
    • Threading plan for webcam capture so UI never freezes?
  4. Dataset Management

    • Exact folder structure and .txt format required by YOLOv9 training?
    • How will you version datasets so previous exports are never overwritten?
    • Strategy for auto-generating initial bounding boxes using current model?
  5. Modular Design Decisions

    • Concrete classes/interfaces for Detector, ImageProcessor, DatasetBuilder?
    • Where will you apply Interface Segregation (e.g., separate annotation vs. export APIs)?
    • Plan for Liskov Substitution if you later swap YOLOv9 for another detector?
  6. Training Integration

    • Will fine-tuning run inside the app or as a separate CLI script called from UI?
    • How many epochs and what batch size are realistic in one training run?

Phase 3: Post-Stack Refinement (5 sections)

  1. Security & Failure Modes

    • How will you validate image files before passing to PIL/YOLO?
    • Strategy for handling out-of-memory on very large images?
    • Graceful degradation if webcam disconnects mid-session?
  2. Testing Strategy

    • Unit tests for detection output shape and confidence thresholds?
    • How will you create a 40-image held-out test set for accuracy?
    • Integration test plan for full file → detect → render flow?
  3. Tooling & Observability

    • Logging library and level for inference latency?
    • How will you surface FPS and memory usage in the UI?
    • Profiler choice for identifying Pygame bottlenecks?
  4. Deployment & Packaging

    • PyInstaller command and hidden imports needed for YOLOv9 + Pygame?
    • Cross-platform testing plan (Windows/Mac)?
    • Single-file executable size target?
  5. Observability & Iteration

    • How will you capture user feedback on misclassified produce for next training round?
    • Metrics dashboard or simple CSV log for accuracy over time?
    • Plan for A/B testing different YOLOv9 variants in production?

Total questions: 59. Answer every one in your Pre-Search document before touching code. This is the difference between a shippable SOLID application and a fragile prototype.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment