Building Real-Time Produce Quality Detection with YOLOv9 and SOLID Modular Design
You must complete the Pre-Search appendix before writing a single line of code. Your Pre-Search output — the full saved AI conversation or detailed notes — is a required part of your final submission.
This week's methodology emphasis is SOLID principles and modular architecture. Pre-Search forces you to map every decision (YOLOv9 variant selection, Pygame event architecture, dataset pipeline, dependency injection) before any implementation. You will not be allowed to refactor core modules after the MVP deadline. The goal is production-grade maintainability: a codebase where the detector, UI, trainer, and data layers can be swapped or extended independently.
In the retail and grocery sector, companies such as Walmart, Whole Foods, Kroger, and computer-vision startups like Clarifruit and Afresh Technologies deploy YOLO-based systems to automate fresh-produce inspection at scale. These tools identify spoilage and defects in real time, cutting food waste by 30-40 % and replacing inconsistent manual checks that cost stores millions annually in labor and shrink.
You will build VegSentinel: a standalone desktop application that loads a YOLOv9 model, processes images via PIL, detects vegetables, classifies each as good or rotten, and visualizes results in a Pygame UI. The core technical challenge is creating a fully modular, SOLID-compliant system that supports both inference (file or webcam) and an interactive trainer UI for building and exporting custom good/rotten datasets. Every module must follow Single Responsibility and Dependency Inversion so the same detector can later run headless on edge devices or in a cloud pipeline.
Gate: Project completion + interviews required for Austin admission.
One-week sprint with three deadlines:
| Checkpoint | Deadline | Focus |
|---|---|---|
| Pre-Search | Before any coding | Constraints, architecture, stack decisions |
| MVP | Tuesday EOD (24 hrs) | Core detection + basic Pygame inference UI |
| Early Submission | Friday EOD (4 days) | Trainer UI, webcam, dataset export |
| Final | Sunday 10:59 PM CT | Polish, documentation, performance targets |
Hard gate. All items required to pass:
- YOLOv9 model loaded via official Python API; PIL used to preprocess any input image to tensor
- Pygame window (800×600 minimum) with two modes toggled by keyboard: Inference and Idle
- File upload via drag-and-drop or dialog; image displayed with overlaid bounding boxes, class labels ("good" / "rotten"), and confidence scores
- At least one vegetable detected and classified correctly on the three provided sample images (good apple, rotten banana, mixed vegetables)
- Clean modular structure: separate
detector.py,ui.py, andimage_processor.pymodules with no circular imports - Application launches and runs inference end-to-end on a standard laptop without crashes
- Packaged executable (PyInstaller) included in repo root with clear
README.mdrun instructions
A simple YOLOv9 detector with clean SOLID modules beats a complex multi-model system with tangled Pygame code.
| Feature | Requirements |
|---|---|
| Model Loading | Load YOLOv9 (nano or small variant) from local weights; support CPU/GPU auto-detection |
| PIL Preprocessing | Convert any image (JPG/PNG) to RGB tensor; resize to model input size while preserving aspect ratio |
| Inference | Run model.predict() with confidence threshold 0.45 and NMS; return list of (bbox, class_id, confidence) |
| Class Mapping | Map YOLO class indices to "good_vegetable" or "rotten_vegetable" (or multi-class vegetable + quality) |
| Error Handling | Graceful fallback if no objects detected; log PIL/YOLO exceptions without crashing UI |
| Feature | Requirements |
|---|---|
| Pygame Window | 800×600 default; resizable; title "VegSentinel — Produce Quality Detector" |
| Overlay Rendering | Draw rectangles, class text, and confidence using Pygame surfaces; color-code green/red |
| Mode Switching | Keyboard (I = Inference, T = Trainer, ESC = quit); no global state leaks |
| Input Handling | File drag-and-drop, button for "Open Image", webcam toggle |
| Feature | Requirements |
|---|---|
| Sample Loader | Load folder of images; display thumbnails in Pygame grid |
| Label Assignment | Click-to-assign good/rotten per image or per detected object |
| YOLO Dataset Export | Generate images/ and labels/ folders with normalized .txt annotation files |
We will test:
- Upload a photo of a fresh carrot → single green box labeled "good_vegetable" with conf ≥ 0.6
- Upload a photo of a rotten tomato → red box labeled "rotten_vegetable" with conf ≥ 0.6
- Drag-and-drop multiple images → all processed sequentially without memory leak
- Webcam live feed (30 fps target) → boxes update in real time on vegetable movement
- Trainer mode: load 10 sample images → assign labels → export valid YOLO-format dataset
- Package executable on a clean Windows/Mac machine → runs without Python installed
- Edge case: image with zero vegetables → shows "No produce detected" message
- Edge case: corrupt image file → does not crash; shows user-friendly error
| Metric | Target | Measurement Method |
|---|---|---|
| Inference latency (single image, 640px) | < 250 ms (CPU) / < 80 ms (GPU) | Average of 50 runs |
| Webcam FPS | ≥ 8 FPS | Real-time overlay update |
| Dataset export time (50 images) | < 8 seconds | End-to-end timing |
| Memory usage (peak) | < 1.2 GB | psutil monitoring |
| Model size on disk | ≤ 45 MB (nano variant) | File size check |
| Classification accuracy (held-out test set) | ≥ 82 % on rotten class | Manual test set of 40 images |
Signature technical challenge: Interactive dataset builder and YOLOv9 fine-tuning pipeline inside a Pygame UI.
The trainer must let users rapidly build a custom good/rotten produce dataset that can be used to fine-tune the base YOLOv9 model.
Required capabilities (example flows):
- Load directory of sample images → display grid of thumbnails
- Click an image → enter "annotation mode" → draw bounding box with mouse drag (Pygame events) → assign class via hotkeys (G = good, R = rotten)
- Auto-detect option: run current model inference, then let user correct labels
- Export button → writes images + normalized .txt label files in standard YOLO format (
class x_center y_center width height)
Tool / API signatures you must implement:
class ProduceDetector:
def __init__(self, weights_path: str, device: str = "auto")
def detect(self, pil_image: Image.Image) -> List[Detection] # Detection = namedtuple(bbox, cls, conf)
class DatasetBuilder:
def add_image(self, path: str, annotations: List[Annotation])
def export_yolo_dataset(self, output_dir: str)Evaluation criteria (exact input → expected output):
- Input: clear photo of fresh broccoli → Output: one box, class=good_vegetable, conf > 0.7
- Input: photo of moldy cucumber → Output: one box, class=rotten_vegetable, conf > 0.65
- Input: mixed good/rotten basket → Output: multiple boxes with correct per-object labels
Implement at least 4 of the following:
- Real-time webcam annotation with live preview
- Support for 4+ vegetable types (carrot, banana, tomato, lettuce) with quality suffix
- Confidence heatmap overlay on rotten regions
- One-click "fine-tune" trigger that calls a training script (5 epochs minimum) and reloads updated weights
- Dataset versioning (timestamped exports)
Performance targets specific to this section:
- Bounding-box drawing latency < 16 ms per frame
- Export 100 annotated images in < 12 seconds
- Post-fine-tune [email protected] ≥ 0.78 on 40-image validation set
Development & Testing Costs (track in a cost_log.md):
- Local GPU/CPU training time per epoch (hours)
- Number of fine-tuning runs and total images processed
- Peak VRAM usage during inference
- Any cloud GPU hours (Colab / RunPod) if used
Production Cost Projections
| Scale | Daily Inferences | Projected Daily Cost (edge device) | Projected Daily Cost (cloud GPU) | Notes |
|---|---|---|---|---|
| 100 users | 500 | $0.00 (local) | $0.45 | Power draw only |
| 1K users | 5,000 | $0.00 | $4.20 | – |
| 10K users | 50,000 | $0.00 | $38.00 | – |
| 100K users | 500,000 | $0.00 | $340.00 | Scale to multiple edge units |
Include assumptions:
- Average store scans 500 produce items per day
- Model runs on NVIDIA Jetson or Intel NUC (CPU-only fallback)
- Fine-tuning performed once per week on 200 new labeled images
| Layer | Technology (choose any that help you ship) |
|---|---|
| Backend | Python 3.10+ |
| AI / Model | YOLOv9 (official implementation or Ultralytics-compatible) |
| UI / Rendering | Pygame 2.x |
| Image Processing | PIL (Pillow) + OpenCV (for webcam capture) |
| Storage | Local filesystem (YOLO dataset folders) or SQLite |
| Deployment | PyInstaller (single executable) |
Use whatever stack helps you ship. Complete the Pre-Search process to make informed decisions.
Priority Order (start with hardest subsystem first):
- Core
ProduceDetectorclass with YOLOv9 inference and PIL preprocessing (SOLID SRP enforced) - Pygame rendering loop with overlay drawing and event handling
- Modular input handlers (file drag, webcam thread) using Dependency Inversion
- DatasetBuilder and annotation UI with mouse-driven bounding boxes
- Export pipeline to YOLO format
- Training script wrapper (optional fine-tune trigger)
- Packaging with PyInstaller + README
- Performance instrumentation and logging
Critical Guidance:
- Use dependency injection for the detector so UI never imports YOLO directly
- Keep Pygame surface updates in a single
render()method - Never mutate global state — pass data explicitly
- Write unit tests for detection output shape before UI integration
- Profile inference latency after every major change
- Document every module’s single responsibility in code comments
Submit a 1-2 page architecture document (ARCHITECTURE.md) containing:
| Section | Content |
|---|---|
| Modules & Responsibilities | One paragraph per module proving SRP and how it follows SOLID |
| Data Flow Diagram | Text-based diagram (Mermaid or ASCII) showing image → processor → detector → UI |
| Extension Points | Where new vegetable classes or models can be added without touching core code |
| Trade-off Decisions | Why Pygame over Tkinter/Streamlit; YOLOv9 variant chosen |
Deadline: Sunday 10:59 PM CT
| Deliverable | Requirements |
|---|---|
| GitHub Repository | Public, clean history, requirements.txt, PyInstaller spec file |
| Demo Video (3-5 min) | Walkthrough of file, webcam, trainer, and fine-tune flow |
| Pre-Search Document | Full saved conversation or notes |
| Architecture Document | ARCHITECTURE.md as specified |
| AI / Compute Cost Log | cost_log.md with all numbers |
| Packaged Executable | .exe or .app in GitHub Releases |
| Deployed Application | Runnable on reviewer’s machine with zero setup |
| Social Post | LinkedIn or X post tagging @GauntletAI with 30-second clip |
Technical Topics:
- How you enforced Dependency Inversion between UI and detector
- Trade-offs between YOLOv9 nano vs. medium on retail hardware
- Pygame event loop design and why it does not violate Open/Closed
- Dataset annotation format decisions and why YOLO .txt was chosen
- Performance bottlenecks you hit and how you resolved them
- How the modular design would allow swapping to a future YOLOv10 or edge TFLite model
Mindset & Growth:
- One decision you reversed after Pre-Search and why
- How pressure of the 24-hour MVP changed your engineering approach
- What surprised you most about integrating YOLOv9 with Pygame
- One SOLID violation you caught during code review and fixed
A simple YOLOv9 detector with clean SOLID modules beats a complex multi-model system with tangled Pygame code.
Gate: Project completion + interviews required for Austin admission.
Complete this before writing code. Save your AI conversation as a reference document.
-
Scale & Load Expectations
- What is realistic throughput for a single store checkout or back-room inspection station (images per minute)?
- Will the app run continuously during an 8-hour shift?
- What is the maximum number of simultaneous webcam feeds you must support?
- How many produce items per basket on average?
-
Hardware & Budget
- Target hardware: standard retail PC (CPU-only) or GPU-enabled?
- What is your maximum acceptable model size on disk?
- Local GPU hours budget for fine-tuning during development?
- Any power-consumption limits for edge deployment?
-
Timeline & Scope
- Which features are explicitly MVP vs. nice-to-have within 7 days?
- How many sample vegetable types will you support by final deadline?
- Do you need multi-language labels or just English?
-
Data Sensitivity & Compliance
- Will any real customer/store images be used? If yes, how will you anonymize?
- Are there any food-safety regulatory requirements for the classification output?
- How will you handle biased datasets (e.g., only one vegetable type)?
-
Team / Skills
- Which SOLID principles are you personally weakest on?
- Have you used Pygame event loops before?
- Experience level with YOLO annotation formats?
-
YOLOv9 Model Selection
- Which YOLOv9 variant (nano/small/medium) gives best speed/accuracy on your hardware?
- Pre-trained on COCO vs. custom produce weights — which starting point?
- How will you handle custom class mapping (good/rotten)?
-
Pygame UI Architecture
- How will you structure the main loop to support both inference and trainer modes without violating Open/Closed?
- Strategy for mouse-driven bounding-box drawing without blocking inference?
- How to separate rendering from business logic (Dependency Inversion)?
-
Image Processing Pipeline
- PIL vs. OpenCV for webcam frames — performance and compatibility trade-offs?
- Exact resize strategy to maintain YOLO aspect ratio?
- Threading plan for webcam capture so UI never freezes?
-
Dataset Management
- Exact folder structure and .txt format required by YOLOv9 training?
- How will you version datasets so previous exports are never overwritten?
- Strategy for auto-generating initial bounding boxes using current model?
-
Modular Design Decisions
- Concrete classes/interfaces for Detector, ImageProcessor, DatasetBuilder?
- Where will you apply Interface Segregation (e.g., separate annotation vs. export APIs)?
- Plan for Liskov Substitution if you later swap YOLOv9 for another detector?
-
Training Integration
- Will fine-tuning run inside the app or as a separate CLI script called from UI?
- How many epochs and what batch size are realistic in one training run?
-
Security & Failure Modes
- How will you validate image files before passing to PIL/YOLO?
- Strategy for handling out-of-memory on very large images?
- Graceful degradation if webcam disconnects mid-session?
-
Testing Strategy
- Unit tests for detection output shape and confidence thresholds?
- How will you create a 40-image held-out test set for accuracy?
- Integration test plan for full file → detect → render flow?
-
Tooling & Observability
- Logging library and level for inference latency?
- How will you surface FPS and memory usage in the UI?
- Profiler choice for identifying Pygame bottlenecks?
-
Deployment & Packaging
- PyInstaller command and hidden imports needed for YOLOv9 + Pygame?
- Cross-platform testing plan (Windows/Mac)?
- Single-file executable size target?
-
Observability & Iteration
- How will you capture user feedback on misclassified produce for next training round?
- Metrics dashboard or simple CSV log for accuracy over time?
- Plan for A/B testing different YOLOv9 variants in production?
Total questions: 59. Answer every one in your Pre-Search document before touching code. This is the difference between a shippable SOLID application and a fragile prototype.