Topic	Standard
Settings	Split: `base.py`, `dev.py`, `prod.py`, `test.py`
Testing	`pytest`, ≥90% coverage, `factory_boy` for fixtures
Logging	Structured JSON (prod), human-readable (dev)
Observability	OpenTelemetry + Prometheus + Jaeger
Commits	Conventional (`feat:`, `fix:`, `docs:`, etc.)
Python	3.12, type hints, dataclasses, modern patterns

Lydgen Lead-Gen Pipeline – Technical Specification (New Project)

1. System Overview

Core Purpose & Goals
Automate discovery, triage, and deep auditing of King County SMB websites, surfacing “Hot” modernization leads complete with a branded PDF and outreach copy.
Primary Use-Cases
1. Seed keyword → receive ranked lead list.
2. Review Hot leads, skim Warm leads, ignore Cold.
3. Generate/inspect full audit, copy-paste tailored email.
High-Level Architecture
Browser-based Django admin → REST/HTMX views → Celery worker pool (Redis broker) → PostgreSQL.
Chrome-inside-Docker container for Lighthouse/axe.
Local disk volume for artifacts.
External: Google Maps Places API.

2. Technology & Tools

Layer	Choice	Notes
Backend	Python 3.12, Django 5	ASGI-ready
Task Queue	Celery 5, Redis 7	configurable concurrency
DB / ORM	PostgreSQL 16, Django ORM	pgvector optional
Scraping	`googlemaps` SDK, Requests, BeautifulSoup
Tech Detection	PyWappalyzer (JSON sigs cached)
Headless Audit	Chrome 118 in `alpine` container
PDF	WeasyPrint (+ custom CSS)
Frontend	HTMX 1.9 + Tailwind CSS v3
DevOps	Docker Compose (web, worker, chrome, redis, db)
CI/CD	GitHub Actions: test → build → push images

3. Project Structure

lydgen/ ├── manage.py ├── config/ # settings, celery, routing ├── apps/ │ ├── leads/ # Keyword, Site, Score, Audit models │ ├── discovery/ # DiscoveryRun, agents │ ├── sniff/ # SniffSnapshot, tech-detect utils │ └── dashboard/ # HTMX views, templates ├── static/ # Tailwind build └── docker/ ├── web.Dockerfile └── chrome.Dockerfile

Conventions: snake_case modules, singular model names, migrations per app.

4. Feature Specification

4.1 Keyword Management & Discovery Trigger (simple)

Story: Admin creates keyword → clicks “Run”.
Details: POST /admin/leads/keyword/{id}/run/ spawns Celery task.
Edge-Cases: Duplicate domain merges into existing Site.
UI: Inline button + last-run stats.

4.2 Sniff Agent (moderate)

Fetch if no snapshot < SNAPSHOT_REFRESH_DAYS.
Parse headers, HTML, run Wappalyzer; JSON stored.

4.3 Scoring Agent (simple)

Pull weights from singleton ScoreConfig.
Compute score; bucket; upsert Score.

4.4 Audit Agent (hairy)

Spin Chrome container; run Lighthouse categories (perf, bp, seo, a11y).
axe-core scan; screenshot; WeasyPrint PDF.
Store Audit; link file paths.

4.5 Dashboard (simple)

HTMX table with filters, live status pills, actions: Force Audit, Mark Contacted, Delete Artifacts.

(Additional minor features listed in v0.6 are implicitly included.)

5. Database Schema

Table	Fields (essential)
keyword	id, term (unique), city, is_active, created_at
discovery_run	id, keyword_id, started_at, status, stats_json
site	id, domain (unique), url, name, address_json, first_seen
sniff_snapshot	id, site_id, discovery_run_id, captured_at, data_json
score	id, site_id, value, reasons_json, rank_bucket, updated_at
audit	id, site_id, pdf_path, lighthouse_json, created_at
score_config	id=1, weights_json, hot_threshold, warm_threshold

Foreign keys ON DELETE CASCADE except Site (protect). Unique index on (site_id, captured_at) day-bucket for snapshots.

6. Server Actions

6.1 CRUD / Endpoints (class-based views)

Verb	Path	Auth	Action
GET	/dashboard/leads	staff	Paginated HTMX partial
POST	/keyword/{id}/run	staff	Kick Celery task
POST	/site/{id}/force-audit	staff	Queue Audit

ORM examples in appendix.

6.2 Background Jobs

discover_keyword(keyword_id) → batch Google calls respecting MAX_MAPS_QPS.
sniff_site(site_id) → probe + snapshot.
score_site(site_id) → compute & save.
audit_site(site_id) → Chrome+PDF.
Nightly cleanup_artifacts() deletes > ARTIFACT_RETENTION_DAYS.

7. Design System

Palette: Dark #181E26 bg, accent #9AD7BA, CTAs #EE7F22.
Typography: Lexend (headlines), Inter (body).
Components: Status pill, score badge, modal confirm.
Accessibility: WCAG AA color contrast, keyboard nav.

8. Component Architecture

Backend: services/*.py per agent; repositories for DB ops.
Frontend: Pure HTMX partials, minimal JS; Turbo-style optimistic UI.

9. Authentication & Authorization

Stock Django auth; superuser & staff.
@staff_member_required decorators on dashboard/actions.

10. Data Flow

Keyword run → Celery chain: discovery ↠ sniff ↠ score ↠ (conditional) audit.
Results pushed to DB → HTMX poll endpoint updates UI.

11. Payment Integration

Not applicable (internal tool).

12. Analytics Integration

Optional self-hosted Plausible snippet for page views, task timings logged to DB.

13. Security & Compliance

HTTPS only; HSTS.
Secrets via Docker .env.
Google API key restricted to server IP.
GDPR: 60-day artifact purge; PII minimal.

14. Environment Configuration & Deployment

Local: make dev → Compose stack.
Staging/Prod: Fly.io or Render; env files; rolling deploy.
CI: GitHub Actions: lint → pytest → docker-build → push.

15. Testing

Unit: pytest + FactoryBoy for agents.
Integration: django-test-migrations, Celery-worker harness.
E2E: Playwright headless Chrome for dashboard flows.
Performance: Locust script for Maps QPS; Lighthouse run time threshold.

Summary & Next Steps

Lydgen MVP is a Django + Celery stack with cached probe layers, admin-editable scoring, and headless audits — all deployable via Docker.

Implementation Plan

0. Preparation

Step 0.1: Create repository & baseline docs
- Task: New private GitHub repo lydgen; add README.md, CONTRIBUTING.md, and MIT LICENSE.
- Files:
  - README.md – project vision, local setup TL;DR
  - docs/CONTRIBUTING.md – coding conventions, commit style
  - LICENSE – MIT text
- Dependencies: None
- User Instructions: Create repo, push initial commit.

1. Project Bootstrap

Step 1.1: Scaffold Django project & apps
- Task:
  - django-admin startproject config .
  - python manage.py startapp leads
  - python manage.py startapp discovery
  - python manage.py startapp sniff
  - python manage.py startapp dashboard
- Files (≈10 auto-generated + below):
  - config/settings/base.py – split settings (base/dev/prod)
  - config/settings/dev.py, config/settings/prod.py
- Dependencies: Step 0.1
Step 1.2: Add Docker & Compose
- Task: Containerize web, worker, redis, postgres, chrome.
- Files:
  - docker/web.Dockerfile – installs Python, poetry/pip, Chrome deps
  - docker/chrome.Dockerfile – lightweight headless Chrome
  - docker-compose.yml – services definition
  - .dockerignore
- Dependencies: Step 1.1
- User Instructions: Install Docker Engine locally.
Step 1.3: Dependency management
- Task: Poetry or pip-tools; pin major libs.
- Files:
  - pyproject.toml or requirements.in|txt – django 5, celery 5, redis, googlemaps, pywappalyzer, weasyprint, htmx, tailwind, pytest, black, pre-commit
  - .pre-commit-config.yaml – black/isort/flake8
- Dependencies: Step 1.1
- User Instructions: Run pre-commit install.

2. Core Models & Migrations

Step 2.1: Define data models in leads/models.py
- Task: Add Keyword, Site, Score, Audit, ScoreConfig.
- Files:
  - leads/models.py
  - leads/admin.py (register + inline “Run Discovery”)
- Dependencies: Step 1.1
Step 2.2: Discovery & Sniff models
- Task:
  - discovery/models.py → DiscoveryRun
  - sniff/models.py → SniffSnapshot
- Files: 2 models + admin registration.
- Dependencies: Step 2.1
Step 2.3: Initial migrations & fixture
- Task: python manage.py makemigrations + create singleton ScoreConfig fixture.
- Files:
  - leads/fixtures/score_config.json
- Dependencies: Steps 2.1-2.2

3. Celery Integration

Step 3.1: Configure Celery in config/celery.py
- Task: Basic broker (Redis), result backend, autodiscovery.
- Files:
  - config/celery.py
  - config/__init__.py (import Celery app)
- Dependencies: Step 1.1
Step 3.2: Add worker service in Compose
- Task: Extend docker-compose.yml with worker (command: celery -A config worker -l info).
- Dependencies: Step 1.2

4. Agent Services

Step 4.1: Discovery Agent task
- Task: In discovery/tasks.py implement Google Maps search, create/update Site + DiscoveryRun.
- Files:
  - discovery/tasks.py
  - discovery/services/google_maps.py helper
- Dependencies: Step 3.1
Step 4.2: Sniff Agent task
- Task: Probe site (requests, SSL, headers, pywappalyzer); cache to SniffSnapshot.
- Files:
  - sniff/tasks.py
  - sniff/services/probe.py
- Dependencies: Step 4.1
Step 4.3: Scoring Agent task
- Task: Pull weights from ScoreConfig, calculate, bucket, save Score.
- Files:
  - leads/tasks.py
  - leads/services/scoring.py
- Dependencies: Step 4.2
Step 4.4: Audit Agent task
- Task: Launch Chrome container, run Lighthouse & axe, screenshot, WeasyPrint PDF → save Audit.
- Files:
  - audit/tasks.py (new module inside leads or audit app)
  - audit/services/lighthouse_runner.py
- Dependencies: Step 4.3

5. Dashboard & HTMX Views

Step 5.1: Tailwind & base templates
- Task: Install Tailwind CLI, create base.html with dark theme.
- Files:
  - templates/base.html
  - static/css/tailwind.css, build config
- Dependencies: Step 1.3
Step 5.2: Lead list & filters
- Task: dashboard/views.py HTMX view returning partial table; add filters (rank, stage).
- Files:
  - dashboard/views.py
  - templates/dashboard/lead_list.html
  - dashboard/urls.py
- Dependencies: Step 5.1
Step 5.3: Lead actions
- Task: HTMX endpoints for Force Audit, Mark Contacted, Delete Artifacts.
- Files:
  - update dashboard/views.py
  - templates/dashboard/partials/*.html
- Dependencies: Step 5.2

6. Admin UX Enhancements

Step 6.1: Keyword inline action
- Task: Custom admin button to enqueue DiscoveryRun.
- Files:
  - leads/admin.py override change_view.
- Dependencies: Step 2.1
Step 6.2: ScoreConfig JSON editor
- Task: Use JSONEditorWidget for weights; validate thresholds.
- Files:
  - leads/admin.py tweak ModelAdmin
- Dependencies: Step 2.3

7. Artifact Management

Step 7.1: Cron-like cleanup task
- Task: Celery beat or Django-Q; delete files older than ARTIFACT_RETENTION_DAYS.
- Files:
  - config/celery.py – add beat schedule
  - leads/tasks.py – cleanup_artifacts
- Dependencies: Step 3.1

8. Testing & Quality Gate

Step 8.1: Unit tests for models & scoring
- Task: FactoryBoy fixtures; assert bucket logic.
- Files:
  - tests/leads/test_scoring.py
  - tests/conftest.py
- Dependencies: Steps 2-4
- Step 8.2: Integration test for Discovery→Score pipeline
- Task: Mock Google API; run Celery tasks synchronously (eager).
- Files:
  - tests/pipeline/test_full_flow.py
- Dependencies: Step 4.3
Step 8.3: Playwright E2E for dashboard filters
- Task: Headless Chromium; snapshot table states.
- Files:
  - tests/e2e/dashboard.spec.ts
- Dependencies: Step 5.3

9. CI/CD & Deployment

Step 9.1: GitHub Actions workflow
- Task: Lint → test → build Docker images; push to GHCR.
- Files:
  - .github/workflows/ci.yml
- Dependencies: Step 8.3
Step 9.2: Production deploy script
- Task: Terraform or simple Fly.io fly.toml; define secrets, volumes.
- Files:
  - deploy/fly.toml or render.yaml
- Dependencies: Step 9.1
- User Instructions: Create Fly.io app, set env vars (DB url, API key).

10. Documentation & Final QA

Step 10.1: Update README with setup & usage
- Task: Add local dev, seed data, common commands.
- Files:
  - README.md
- Dependencies: All prior steps
Step 10.2: Manual smoke test checklist
- Task: Verify keyword run → Hot lead → PDF generated; dashboard actions.
- Files:
  - docs/SMOKE_TEST.md
- Dependencies: Step 10.1

Summary

This plan bootstraps a Dockerised Django 5 application with Celery workers, implements the full agent pipeline (Discovery→Sniff→Score→Audit), caches probe data, and exposes an HTMX admin dashboard. Steps progress logically: models → tasks → views → quality gates → deploy. Each step limits changes to <20 files, making it safe for iterative, automated code generation.

Key dependencies

Celery configuration (Steps 3.x) must precede agent task creation.
Dashboard views rely on models and scoring logic being in place.

Potential complexities

Headless Chrome inside containers: ensure proper fonts and sandbox flags.
Google Maps quota: secure API key, monitor costs.

Once Steps 0-3 run successfully, agents can be executed locally to iterate quickly on scraping accuracy before finishing UI and deployment.

Lydgen Lead-Gen Pipeline — Spec v0.6 (MVP Lock)

Convert King County SMB keyword lists into ranked, audit-ready modernization leads.

Target Audience

Solo consultants & micro-agencies seeking actionable, high-value leads with outdated web stacks.

Core Data Models

Model	Key Fields / Relationships	Purpose
Keyword	`term` (str, unique), `city` (nullable), `is_active` (bool)	Admin-managed seed terms for discovery runs
DiscoveryRun	FK `keyword`, `started_at`, `status`, `stats_json`	Tracks a single Google Maps scrape for one keyword
Site	`domain` (unique), `url`, `name`, `address_json`, `first_seen`	Canonical business/site entity
SniffSnapshot	FK `site`, FK `discovery_run`, `captured_at`, `sniff_data_json`	Cached low-cost probe results
Score	FK `site`, `value` (int 0-100), `reasons_json`, `rank_bucket`	Latest score & bucket
Audit	FK `site`, `pdf_path`, `lighthouse_json`, `created_at`	Deep audit artifacts for Hot leads
ScoreConfig	(Singleton) `weights_json`, `hot_threshold`, `warm_threshold`	Admin-editable scoring params (replaces YAML)

Why this structure?
Keyword drives discovery; Site deduplicates businesses hit by multiple keywords.
SniffSnapshot lets Sniff Agent skip refetch if the last snapshot is < X days old.

Pipeline Stages

Discovery Agent
- Runs per Keyword; stores or links to existing Site rows.
- Reuses cached address/name if the place_id already exists.
Sniff Agent
- Checks cache: use most-recent SniffSnapshot if < REFRESH_DAYS; else probe.
- Saves new snapshot & updates Site basic fields (SSL, CMS).
Scoring Agent
- Pulls weights & thresholds from ScoreConfig (admin UI).
- Writes/updates Score row; sets rank_bucket (Hot/Warm/Cold).
Audit Agent (Hot only)
- Generates Audit row with Lighthouse, axe-core, screenshot, PDF.
Dashboard
- List/filter by rank_bucket, keyword, stage.
- Actions: Force Audit, Mark Contacted, Retire Keyword.

Ops & Settings

Setting	Default	Notes
`SNAPSHOT_REFRESH_DAYS`	14	Skip sniffing if snapshot newer than X days
`ARTIFACT_RETENTION_DAYS`	60	Cron deletes old PDFs/PNGs
`MAX_MAPS_QPS`, `MAX_CHROME_CONCURRENCY`	5 / 2	Rate guardrails

Workflow Summary


Keyword → DiscoveryRun ⟶ Site
↘ SniffSnapshot (cached)
↘ Score (Hot/Warm/Cold)
↘ Audit (Hot only)

Design & Admin Enhancements

Keyword admin: inline “Run Discovery” button.
ScoreConfig admin: JSON textarea for weights, numeric inputs for thresholds.
Dashboard: HTMX live status pills (⏳ new, 👁 sniffed, 🔍 scored, 📝 audited).

scragz/AGENTS.md

LYDGEN — LLM Agent Guide

1. Project Context & Role

2. Development Environment

Prerequisites

Quick Start Commands

Key URLs

3. Workflow Expectations

4. Technical Standards

5. Core Models

Current Schema

6. Health & Monitoring

Health Endpoints

Observability

7. Testing Strategy

8. Common References

9. Troubleshooting

Common Issues

10. Next Actions