VegSentinel

Building Real-Time Produce Quality Detection with YOLOv9 and SOLID Modular Design

Before You Start: Pre-Search (time: 1-2 hours)

You must complete the Pre-Search appendix before writing a single line of code. Your Pre-Search output — the full saved AI conversation or detailed notes — is a required part of your final submission.

This week's methodology emphasis is SOLID principles and modular architecture. Pre-Search forces you to map every decision (YOLOv9 variant selection, Pygame event architecture, dataset pipeline, dependency injection) before any implementation. You will not be allowed to refactor core modules after the MVP deadline. The goal is production-grade maintainability: a codebase where the detector, UI, trainer, and data layers can be swapped or extended independently.

Background

In the retail and grocery sector, companies such as Walmart, Whole Foods, Kroger, and computer-vision startups like Clarifruit and Afresh Technologies deploy YOLO-based systems to automate fresh-produce inspection at scale. These tools identify spoilage and defects in real time, cutting food waste by 30-40 % and replacing inconsistent manual checks that cost stores millions annually in labor and shrink.

You will build VegSentinel: a standalone desktop application that loads a YOLOv9 model, processes images via PIL, detects vegetables, classifies each as good or rotten, and visualizes results in a Pygame UI. The core technical challenge is creating a fully modular, SOLID-compliant system that supports both inference (file or webcam) and an interactive trainer UI for building and exporting custom good/rotten datasets. Every module must follow Single Responsibility and Dependency Inversion so the same detector can later run headless on edge devices or in a cloud pipeline.

Gate: Project completion + interviews required for Austin admission.

Project Overview

One-week sprint with three deadlines:

Checkpoint	Deadline	Focus
Pre-Search	Before any coding	Constraints, architecture, stack decisions
MVP	Tuesday EOD (24 hrs)	Core detection + basic Pygame inference UI
Early Submission	Friday EOD (4 days)	Trainer UI, webcam, dataset export
Final	Sunday 10:59 PM CT	Polish, documentation, performance targets

MVP Requirements (24 Hours)

Hard gate. All items required to pass:

YOLOv9 model loaded via official Python API; PIL used to preprocess any input image to tensor
Pygame window (800×600 minimum) with two modes toggled by keyboard: Inference and Idle
File upload via drag-and-drop or dialog; image displayed with overlaid bounding boxes, class labels ("good" / "rotten"), and confidence scores
At least one vegetable detected and classified correctly on the three provided sample images (good apple, rotten banana, mixed vegetables)
Clean modular structure: separate detector.py, ui.py, and image_processor.py modules with no circular imports
Application launches and runs inference end-to-end on a standard laptop without crashes
Packaged executable (PyInstaller) included in repo root with clear README.md run instructions

A simple YOLOv9 detector with clean SOLID modules beats a complex multi-model system with tangled Pygame code.

Core Technical Requirements

Detection Layer

Feature	Requirements
Model Loading	Load YOLOv9 (nano or small variant) from local weights; support CPU/GPU auto-detection
PIL Preprocessing	Convert any image (JPG/PNG) to RGB tensor; resize to model input size while preserving aspect ratio
Inference	Run `model.predict()` with confidence threshold 0.45 and NMS; return list of (bbox, class_id, confidence)
Class Mapping	Map YOLO class indices to "good_vegetable" or "rotten_vegetable" (or multi-class vegetable + quality)
Error Handling	Graceful fallback if no objects detected; log PIL/YOLO exceptions without crashing UI

UI / Rendering Layer

Feature	Requirements
Pygame Window	800×600 default; resizable; title "VegSentinel — Produce Quality Detector"
Overlay Rendering	Draw rectangles, class text, and confidence using Pygame surfaces; color-code green/red
Mode Switching	Keyboard (I = Inference, T = Trainer, ESC = quit); no global state leaks
Input Handling	File drag-and-drop, button for "Open Image", webcam toggle

Data & Training Pipeline Layer

Feature	Requirements
Sample Loader	Load folder of images; display thumbnails in Pygame grid
Label Assignment	Click-to-assign good/rotten per image or per detected object
YOLO Dataset Export	Generate `images/` and `labels/` folders with normalized .txt annotation files

We will test:

Upload a photo of a fresh carrot → single green box labeled "good_vegetable" with conf ≥ 0.6
Upload a photo of a rotten tomato → red box labeled "rotten_vegetable" with conf ≥ 0.6
Drag-and-drop multiple images → all processed sequentially without memory leak
Webcam live feed (30 fps target) → boxes update in real time on vegetable movement
Trainer mode: load 10 sample images → assign labels → export valid YOLO-format dataset
Package executable on a clean Windows/Mac machine → runs without Python installed
Edge case: image with zero vegetables → shows "No produce detected" message
Edge case: corrupt image file → does not crash; shows user-friendly error

Performance Targets

Metric	Target	Measurement Method
Inference latency (single image, 640px)	< 250 ms (CPU) / < 80 ms (GPU)	Average of 50 runs
Webcam FPS	≥ 8 FPS	Real-time overlay update
Dataset export time (50 images)	< 8 seconds	End-to-end timing
Memory usage (peak)	< 1.2 GB	psutil monitoring
Model size on disk	≤ 45 MB (nano variant)	File size check
Classification accuracy (held-out test set)	≥ 82 % on rotten class	Manual test set of 40 images

Domain-Specific Deep Section

Signature technical challenge: Interactive dataset builder and YOLOv9 fine-tuning pipeline inside a Pygame UI.

The trainer must let users rapidly build a custom good/rotten produce dataset that can be used to fine-tune the base YOLOv9 model.

Required capabilities (example flows):

Load directory of sample images → display grid of thumbnails
Click an image → enter "annotation mode" → draw bounding box with mouse drag (Pygame events) → assign class via hotkeys (G = good, R = rotten)
Auto-detect option: run current model inference, then let user correct labels
Export button → writes images + normalized .txt label files in standard YOLO format (class x_center y_center width height)

Tool / API signatures you must implement:

class ProduceDetector:
    def __init__(self, weights_path: str, device: str = "auto")
    def detect(self, pil_image: Image.Image) -> List[Detection]  # Detection = namedtuple(bbox, cls, conf)

class DatasetBuilder:
    def add_image(self, path: str, annotations: List[Annotation])
    def export_yolo_dataset(self, output_dir: str)

Evaluation criteria (exact input → expected output):

Input: clear photo of fresh broccoli → Output: one box, class=good_vegetable, conf > 0.7
Input: photo of moldy cucumber → Output: one box, class=rotten_vegetable, conf > 0.65
Input: mixed good/rotten basket → Output: multiple boxes with correct per-object labels

Implement at least 4 of the following:

Real-time webcam annotation with live preview
Support for 4+ vegetable types (carrot, banana, tomato, lettuce) with quality suffix
Confidence heatmap overlay on rotten regions
One-click "fine-tune" trigger that calls a training script (5 epochs minimum) and reloads updated weights
Dataset versioning (timestamped exports)

Performance targets specific to this section:

Bounding-box drawing latency < 16 ms per frame
Export 100 annotated images in < 12 seconds
Post-fine-tune [email protected] ≥ 0.78 on 40-image validation set

AI Cost Analysis (Required)

Development & Testing Costs (track in a cost_log.md):

Local GPU/CPU training time per epoch (hours)
Number of fine-tuning runs and total images processed
Peak VRAM usage during inference
Any cloud GPU hours (Colab / RunPod) if used

Production Cost Projections

Scale	Daily Inferences	Projected Daily Cost (edge device)	Projected Daily Cost (cloud GPU)	Notes
100 users	500	$0.00 (local)	$0.45	Power draw only
1K users	5,000	$0.00	$4.20	–
10K users	50,000	$0.00	$38.00	–
100K users	500,000	$0.00	$340.00	Scale to multiple edge units

Include assumptions:

Average store scans 500 produce items per day
Model runs on NVIDIA Jetson or Intel NUC (CPU-only fallback)
Fine-tuning performed once per week on 200 new labeled images

Technical Stack

Layer	Technology (choose any that help you ship)
Backend	Python 3.10+
AI / Model	YOLOv9 (official implementation or Ultralytics-compatible)
UI / Rendering	Pygame 2.x
Image Processing	PIL (Pillow) + OpenCV (for webcam capture)
Storage	Local filesystem (YOLO dataset folders) or SQLite
Deployment	PyInstaller (single executable)

Use whatever stack helps you ship. Complete the Pre-Search process to make informed decisions.

Build Strategy

Priority Order (start with hardest subsystem first):

Core ProduceDetector class with YOLOv9 inference and PIL preprocessing (SOLID SRP enforced)
Pygame rendering loop with overlay drawing and event handling
Modular input handlers (file drag, webcam thread) using Dependency Inversion
DatasetBuilder and annotation UI with mouse-driven bounding boxes
Export pipeline to YOLO format
Training script wrapper (optional fine-tune trigger)
Packaging with PyInstaller + README
Performance instrumentation and logging

Critical Guidance:

Use dependency injection for the detector so UI never imports YOLO directly
Keep Pygame surface updates in a single render() method
Never mutate global state — pass data explicitly
Write unit tests for detection output shape before UI integration
Profile inference latency after every major change
Document every module’s single responsibility in code comments

Required Documentation

Submit a 1-2 page architecture document (ARCHITECTURE.md) containing:

Section	Content
Modules & Responsibilities	One paragraph per module proving SRP and how it follows SOLID
Data Flow Diagram	Text-based diagram (Mermaid or ASCII) showing image → processor → detector → UI
Extension Points	Where new vegetable classes or models can be added without touching core code
Trade-off Decisions	Why Pygame over Tkinter/Streamlit; YOLOv9 variant chosen

Submission Requirements

Deadline: Sunday 10:59 PM CT

Deliverable	Requirements
GitHub Repository	Public, clean history, `requirements.txt`, PyInstaller spec file
Demo Video (3-5 min)	Walkthrough of file, webcam, trainer, and fine-tune flow
Pre-Search Document	Full saved conversation or notes
Architecture Document	`ARCHITECTURE.md` as specified
AI / Compute Cost Log	`cost_log.md` with all numbers
Packaged Executable	`.exe` or `.app` in GitHub Releases
Deployed Application	Runnable on reviewer’s machine with zero setup
Social Post	LinkedIn or X post tagging @GauntletAI with 30-second clip

Interview Preparation

Technical Topics:

How you enforced Dependency Inversion between UI and detector
Trade-offs between YOLOv9 nano vs. medium on retail hardware
Pygame event loop design and why it does not violate Open/Closed
Dataset annotation format decisions and why YOLO .txt was chosen
Performance bottlenecks you hit and how you resolved them
How the modular design would allow swapping to a future YOLOv10 or edge TFLite model

Mindset & Growth:

One decision you reversed after Pre-Search and why
How pressure of the 24-hour MVP changed your engineering approach
What surprised you most about integrating YOLOv9 with Pygame
One SOLID violation you caught during code review and fixed

Final Note

A simple YOLOv9 detector with clean SOLID modules beats a complex multi-model system with tangled Pygame code.