Skip to content

Instantly share code, notes, and snippets.

@scragz
Last active July 18, 2025 17:56
Show Gist options
  • Save scragz/337b8eb5cfcae9da5a64d5dec226141f to your computer and use it in GitHub Desktop.
Save scragz/337b8eb5cfcae9da5a64d5dec226141f to your computer and use it in GitHub Desktop.
lydgen plan

LYDGEN — LLM Agent Guide

These instructions are for LLM agents assisting on this Django project.

1. Project Context & Role

  • Project: New peoject Django + HTMX with modern Docker tooling
  • Current Phase: (see docs/PLAN.md)
  • Role: Act as a senior Django engineer familiar with modern Python/Django patterns
  • Tone: Concise, actionable guidance; cite official docs when helpful

2. Development Environment

Prerequisites

  • Docker & Compose ≥ v25
  • Python 3.12 (containerized)
  • 4GB+ RAM, 2GB+ disk space

Quick Start Commands

# Setup
cp .env.example .env
docker-compose up -d

# Initial setup inside container
docker-compose exec app bash
python manage.py migrate
python manage.py createsuperuser --noinput

Key URLs

3. Workflow Expectations

  1. Phase Discipline: Work only within current phase and scope

  2. Development Flow:

    • Use docker-compose exec web bash for Django commands
    • Run pytest for testing (≥90% coverage required)
    • Check docker-compose logs -f for debugging
    • Verify CI passes before suggesting changes
  3. Code Changes:

    • Provide ready-to-use code snippets
    • Include migration commands when models change
    • Write tests for new functionality
    • Follow conventional commit messages

4. Technical Standards

Topic Standard
Settings Split: base.py, dev.py, prod.py, test.py
Testing pytest, ≥90% coverage, factory_boy for fixtures
Logging Structured JSON (prod), human-readable (dev)
Observability OpenTelemetry + Prometheus + Jaeger
Commits Conventional (feat:, fix:, docs:, etc.)
Python 3.12, type hints, dataclasses, modern patterns

5. Core Models

Current Schema

(update when decided)

6. Health & Monitoring

Health Endpoints

  • /healthz - Liveness (basic app health)
  • /readyz - Readiness (DB + Redis + migrations)

Observability

  • Metrics: Prometheus endpoint at /metrics
  • Tracing: OpenTelemetry → Jaeger
  • Logs: Structured JSON with request IDs

7. Testing Strategy

# Run full suite
pytest

# With coverage
pytest --cov=src --cov-report=html

# Specific tests
pytest tests/test_models.py -v

8. Common References

9. Troubleshooting

Common Issues

# Reset everything
docker-compose down -v && docker-compose up -d

# Container shell access
docker-compose exec app bash

# View logs
docker-compose logs -f

# Database issues
docker-compose exec app python manage.py migrate

10. Next Actions

Before implementing features, verify:

  1. Does the change fit within current scope?
  2. Are tests included with ≥90% coverage?
  3. Do Docker containers need rebuilding?
  4. Are environment variables documented in .env.example?

When in doubt, ask for clarification on scope and requirements!


Update this guide when project conventions or requirements change.

Lydgen Lead-Gen Pipeline – Technical Specification (New Project)

1. System Overview

  • Core Purpose & Goals
    Automate discovery, triage, and deep auditing of King County SMB websites, surfacing “Hot” modernization leads complete with a branded PDF and outreach copy.
  • Primary Use-Cases
    1. Seed keyword → receive ranked lead list.
    2. Review Hot leads, skim Warm leads, ignore Cold.
    3. Generate/inspect full audit, copy-paste tailored email.
  • High-Level Architecture
    Browser-based Django admin → REST/HTMX views → Celery worker pool (Redis broker) → PostgreSQL.
    Chrome-inside-Docker container for Lighthouse/axe.
    Local disk volume for artifacts.
    External: Google Maps Places API.

2. Technology & Tools

Layer Choice Notes
Backend Python 3.12, Django 5 ASGI-ready
Task Queue Celery 5, Redis 7 configurable concurrency
DB / ORM PostgreSQL 16, Django ORM pgvector optional
Scraping googlemaps SDK, Requests, BeautifulSoup
Tech Detection PyWappalyzer (JSON sigs cached)
Headless Audit Chrome 118 in alpine container
PDF WeasyPrint (+ custom CSS)
Frontend HTMX 1.9 + Tailwind CSS v3
DevOps Docker Compose (web, worker, chrome, redis, db)
CI/CD GitHub Actions: test → build → push images

3. Project Structure

lydgen/ ├── manage.py ├── config/ # settings, celery, routing ├── apps/ │ ├── leads/ # Keyword, Site, Score, Audit models │ ├── discovery/ # DiscoveryRun, agents │ ├── sniff/ # SniffSnapshot, tech-detect utils │ └── dashboard/ # HTMX views, templates ├── static/ # Tailwind build └── docker/ ├── web.Dockerfile └── chrome.Dockerfile

Conventions: snake_case modules, singular model names, migrations per app.

4. Feature Specification

4.1 Keyword Management & Discovery Trigger (simple)

  • Story: Admin creates keyword → clicks “Run”.
  • Details: POST /admin/leads/keyword/{id}/run/ spawns Celery task.
  • Edge-Cases: Duplicate domain merges into existing Site.
  • UI: Inline button + last-run stats.

4.2 Sniff Agent (moderate)

  • Fetch if no snapshot < SNAPSHOT_REFRESH_DAYS.
  • Parse headers, HTML, run Wappalyzer; JSON stored.

4.3 Scoring Agent (simple)

  • Pull weights from singleton ScoreConfig.
  • Compute score; bucket; upsert Score.

4.4 Audit Agent (hairy)

  • Spin Chrome container; run Lighthouse categories (perf, bp, seo, a11y).
  • axe-core scan; screenshot; WeasyPrint PDF.
  • Store Audit; link file paths.

4.5 Dashboard (simple)

  • HTMX table with filters, live status pills, actions: Force Audit, Mark Contacted, Delete Artifacts.

(Additional minor features listed in v0.6 are implicitly included.)

5. Database Schema

Table Fields (essential)
keyword id, term (unique), city, is_active, created_at
discovery_run id, keyword_id, started_at, status, stats_json
site id, domain (unique), url, name, address_json, first_seen
sniff_snapshot id, site_id, discovery_run_id, captured_at, data_json
score id, site_id, value, reasons_json, rank_bucket, updated_at
audit id, site_id, pdf_path, lighthouse_json, created_at
score_config id=1, weights_json, hot_threshold, warm_threshold

Foreign keys ON DELETE CASCADE except Site (protect). Unique index on (site_id, captured_at) day-bucket for snapshots.

6. Server Actions

6.1 CRUD / Endpoints (class-based views)

Verb Path Auth Action
GET /dashboard/leads staff Paginated HTMX partial
POST /keyword/{id}/run staff Kick Celery task
POST /site/{id}/force-audit staff Queue Audit

ORM examples in appendix.

6.2 Background Jobs

  • discover_keyword(keyword_id) → batch Google calls respecting MAX_MAPS_QPS.
  • sniff_site(site_id) → probe + snapshot.
  • score_site(site_id) → compute & save.
  • audit_site(site_id) → Chrome+PDF.
  • Nightly cleanup_artifacts() deletes > ARTIFACT_RETENTION_DAYS.

7. Design System

  • Palette: Dark #181E26 bg, accent #9AD7BA, CTAs #EE7F22.
  • Typography: Lexend (headlines), Inter (body).
  • Components: Status pill, score badge, modal confirm.
  • Accessibility: WCAG AA color contrast, keyboard nav.

8. Component Architecture

  • Backend: services/*.py per agent; repositories for DB ops.
  • Frontend: Pure HTMX partials, minimal JS; Turbo-style optimistic UI.

9. Authentication & Authorization

  • Stock Django auth; superuser & staff.
  • @staff_member_required decorators on dashboard/actions.

10. Data Flow

  1. Keyword run → Celery chain: discovery ↠ sniff ↠ score ↠ (conditional) audit.
  2. Results pushed to DB → HTMX poll endpoint updates UI.

11. Payment Integration

Not applicable (internal tool).

12. Analytics Integration

Optional self-hosted Plausible snippet for page views, task timings logged to DB.

13. Security & Compliance

  • HTTPS only; HSTS.
  • Secrets via Docker .env.
  • Google API key restricted to server IP.
  • GDPR: 60-day artifact purge; PII minimal.

14. Environment Configuration & Deployment

  • Local: make dev → Compose stack.
  • Staging/Prod: Fly.io or Render; env files; rolling deploy.
  • CI: GitHub Actions: lint → pytest → docker-build → push.

15. Testing

  • Unit: pytest + FactoryBoy for agents.
  • Integration: django-test-migrations, Celery-worker harness.
  • E2E: Playwright headless Chrome for dashboard flows.
  • Performance: Locust script for Maps QPS; Lighthouse run time threshold.

Summary & Next Steps

Lydgen MVP is a Django + Celery stack with cached probe layers, admin-editable scoring, and headless audits — all deployable via Docker.


Implementation Plan


0. Preparation

  • Step 0.1: Create repository & baseline docs
    • Task: New private GitHub repo lydgen; add README.md, CONTRIBUTING.md, and MIT LICENSE.
    • Files:
      • README.md – project vision, local setup TL;DR
      • docs/CONTRIBUTING.md – coding conventions, commit style
      • LICENSE – MIT text
    • Dependencies: None
    • User Instructions: Create repo, push initial commit.

1. Project Bootstrap

  • Step 1.1: Scaffold Django project & apps

    • Task:
      • django-admin startproject config .
      • python manage.py startapp leads
      • python manage.py startapp discovery
      • python manage.py startapp sniff
      • python manage.py startapp dashboard
    • Files (≈10 auto-generated + below):
      • config/settings/base.py – split settings (base/dev/prod)
      • config/settings/dev.py, config/settings/prod.py
    • Dependencies: Step 0.1
  • Step 1.2: Add Docker & Compose

    • Task: Containerize web, worker, redis, postgres, chrome.
    • Files:
      • docker/web.Dockerfile – installs Python, poetry/pip, Chrome deps
      • docker/chrome.Dockerfile – lightweight headless Chrome
      • docker-compose.yml – services definition
      • .dockerignore
    • Dependencies: Step 1.1
    • User Instructions: Install Docker Engine locally.
  • Step 1.3: Dependency management

    • Task: Poetry or pip-tools; pin major libs.
    • Files:
      • pyproject.toml or requirements.in|txt – django 5, celery 5, redis, googlemaps, pywappalyzer, weasyprint, htmx, tailwind, pytest, black, pre-commit
      • .pre-commit-config.yaml – black/isort/flake8
    • Dependencies: Step 1.1
    • User Instructions: Run pre-commit install.

2. Core Models & Migrations

  • Step 2.1: Define data models in leads/models.py

    • Task: Add Keyword, Site, Score, Audit, ScoreConfig.
    • Files:
      • leads/models.py
      • leads/admin.py (register + inline “Run Discovery”)
    • Dependencies: Step 1.1
  • Step 2.2: Discovery & Sniff models

    • Task:
      • discovery/models.pyDiscoveryRun
      • sniff/models.pySniffSnapshot
    • Files: 2 models + admin registration.
    • Dependencies: Step 2.1
  • Step 2.3: Initial migrations & fixture

    • Task: python manage.py makemigrations + create singleton ScoreConfig fixture.
    • Files:
      • leads/fixtures/score_config.json
    • Dependencies: Steps 2.1-2.2

3. Celery Integration

  • Step 3.1: Configure Celery in config/celery.py

    • Task: Basic broker (Redis), result backend, autodiscovery.
    • Files:
      • config/celery.py
      • config/__init__.py (import Celery app)
    • Dependencies: Step 1.1
  • Step 3.2: Add worker service in Compose

    • Task: Extend docker-compose.yml with worker (command: celery -A config worker -l info).
    • Dependencies: Step 1.2

4. Agent Services

  • Step 4.1: Discovery Agent task

    • Task: In discovery/tasks.py implement Google Maps search, create/update Site + DiscoveryRun.
    • Files:
      • discovery/tasks.py
      • discovery/services/google_maps.py helper
    • Dependencies: Step 3.1
  • Step 4.2: Sniff Agent task

    • Task: Probe site (requests, SSL, headers, pywappalyzer); cache to SniffSnapshot.
    • Files:
      • sniff/tasks.py
      • sniff/services/probe.py
    • Dependencies: Step 4.1
  • Step 4.3: Scoring Agent task

    • Task: Pull weights from ScoreConfig, calculate, bucket, save Score.
    • Files:
      • leads/tasks.py
      • leads/services/scoring.py
    • Dependencies: Step 4.2
  • Step 4.4: Audit Agent task

    • Task: Launch Chrome container, run Lighthouse & axe, screenshot, WeasyPrint PDF → save Audit.
    • Files:
      • audit/tasks.py (new module inside leads or audit app)
      • audit/services/lighthouse_runner.py
    • Dependencies: Step 4.3

5. Dashboard & HTMX Views

  • Step 5.1: Tailwind & base templates

    • Task: Install Tailwind CLI, create base.html with dark theme.
    • Files:
      • templates/base.html
      • static/css/tailwind.css, build config
    • Dependencies: Step 1.3
  • Step 5.2: Lead list & filters

    • Task: dashboard/views.py HTMX view returning partial table; add filters (rank, stage).
    • Files:
      • dashboard/views.py
      • templates/dashboard/lead_list.html
      • dashboard/urls.py
    • Dependencies: Step 5.1
  • Step 5.3: Lead actions

    • Task: HTMX endpoints for Force Audit, Mark Contacted, Delete Artifacts.
    • Files:
      • update dashboard/views.py
      • templates/dashboard/partials/*.html
    • Dependencies: Step 5.2

6. Admin UX Enhancements

  • Step 6.1: Keyword inline action

    • Task: Custom admin button to enqueue DiscoveryRun.
    • Files:
      • leads/admin.py override change_view.
    • Dependencies: Step 2.1
  • Step 6.2: ScoreConfig JSON editor

    • Task: Use JSONEditorWidget for weights; validate thresholds.
    • Files:
      • leads/admin.py tweak ModelAdmin
    • Dependencies: Step 2.3

7. Artifact Management

  • Step 7.1: Cron-like cleanup task
    • Task: Celery beat or Django-Q; delete files older than ARTIFACT_RETENTION_DAYS.
    • Files:
      • config/celery.py – add beat schedule
      • leads/tasks.pycleanup_artifacts
    • Dependencies: Step 3.1

8. Testing & Quality Gate

  • Step 8.1: Unit tests for models & scoring

    • Task: FactoryBoy fixtures; assert bucket logic.

    • Files:

      • tests/leads/test_scoring.py
      • tests/conftest.py
    • Dependencies: Steps 2-4

    • Step 8.2: Integration test for Discovery→Score pipeline

    • Task: Mock Google API; run Celery tasks synchronously (eager).

    • Files:

      • tests/pipeline/test_full_flow.py
    • Dependencies: Step 4.3

  • Step 8.3: Playwright E2E for dashboard filters

    • Task: Headless Chromium; snapshot table states.
    • Files:
      • tests/e2e/dashboard.spec.ts
    • Dependencies: Step 5.3

9. CI/CD & Deployment

  • Step 9.1: GitHub Actions workflow

    • Task: Lint → test → build Docker images; push to GHCR.
    • Files:
      • .github/workflows/ci.yml
    • Dependencies: Step 8.3
  • Step 9.2: Production deploy script

    • Task: Terraform or simple Fly.io fly.toml; define secrets, volumes.
    • Files:
      • deploy/fly.toml or render.yaml
    • Dependencies: Step 9.1
    • User Instructions: Create Fly.io app, set env vars (DB url, API key).

10. Documentation & Final QA

  • Step 10.1: Update README with setup & usage

    • Task: Add local dev, seed data, common commands.
    • Files:
      • README.md
    • Dependencies: All prior steps
  • Step 10.2: Manual smoke test checklist

    • Task: Verify keyword run → Hot lead → PDF generated; dashboard actions.
    • Files:
      • docs/SMOKE_TEST.md
    • Dependencies: Step 10.1

Summary

This plan bootstraps a Dockerised Django 5 application with Celery workers, implements the full agent pipeline (Discovery→Sniff→Score→Audit), caches probe data, and exposes an HTMX admin dashboard. Steps progress logically: models → tasks → views → quality gates → deploy. Each step limits changes to <20 files, making it safe for iterative, automated code generation.

Key dependencies

  • Celery configuration (Steps 3.x) must precede agent task creation.
  • Dashboard views rely on models and scoring logic being in place.

Potential complexities

  • Headless Chrome inside containers: ensure proper fonts and sandbox flags.
  • Google Maps quota: secure API key, monitor costs.

Once Steps 0-3 run successfully, agents can be executed locally to iterate quickly on scraping accuracy before finishing UI and deployment.

Lydgen Lead-Gen Pipeline — Spec v0.6 (MVP Lock)

Convert King County SMB keyword lists into ranked, audit-ready modernization leads.

Target Audience

Solo consultants & micro-agencies seeking actionable, high-value leads with outdated web stacks.

Core Data Models

Model Key Fields / Relationships Purpose
Keyword term (str, unique), city (nullable), is_active (bool) Admin-managed seed terms for discovery runs
DiscoveryRun FK keyword, started_at, status, stats_json Tracks a single Google Maps scrape for one keyword
Site domain (unique), url, name, address_json, first_seen Canonical business/site entity
SniffSnapshot FK site, FK discovery_run, captured_at, sniff_data_json Cached low-cost probe results
Score FK site, value (int 0-100), reasons_json, rank_bucket Latest score & bucket
Audit FK site, pdf_path, lighthouse_json, created_at Deep audit artifacts for Hot leads
ScoreConfig (Singleton) weights_json, hot_threshold, warm_threshold Admin-editable scoring params (replaces YAML)

Why this structure?
Keyword drives discovery; Site deduplicates businesses hit by multiple keywords.
SniffSnapshot lets Sniff Agent skip refetch if the last snapshot is < X days old.

Pipeline Stages

  1. Discovery Agent

    • Runs per Keyword; stores or links to existing Site rows.
    • Reuses cached address/name if the place_id already exists.
  2. Sniff Agent

    • Checks cache: use most-recent SniffSnapshot if < REFRESH_DAYS; else probe.
    • Saves new snapshot & updates Site basic fields (SSL, CMS).
  3. Scoring Agent

    • Pulls weights & thresholds from ScoreConfig (admin UI).
    • Writes/updates Score row; sets rank_bucket (Hot/Warm/Cold).
  4. Audit Agent (Hot only)

    • Generates Audit row with Lighthouse, axe-core, screenshot, PDF.
  5. Dashboard

    • List/filter by rank_bucket, keyword, stage.
    • Actions: Force Audit, Mark Contacted, Retire Keyword.

Ops & Settings

Setting Default Notes
SNAPSHOT_REFRESH_DAYS 14 Skip sniffing if snapshot newer than X days
ARTIFACT_RETENTION_DAYS 60 Cron deletes old PDFs/PNGs
MAX_MAPS_QPS, MAX_CHROME_CONCURRENCY 5 / 2 Rate guardrails

Workflow Summary


Keyword → DiscoveryRun ⟶ Site
↘ SniffSnapshot (cached)
↘ Score (Hot/Warm/Cold)
↘ Audit (Hot only)

Design & Admin Enhancements

  • Keyword admin: inline “Run Discovery” button.
  • ScoreConfig admin: JSON textarea for weights, numeric inputs for thresholds.
  • Dashboard: HTMX live status pills (⏳ new, 👁 sniffed, 🔍 scored, 📝 audited).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment