You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lydgen Lead-Gen Pipeline – Technical Specification (New Project)
1. System Overview
Core Purpose & Goals
Automate discovery, triage, and deep auditing of King County SMB websites, surfacing “Hot” modernization leads complete with a branded PDF and outreach copy.
Primary Use-Cases
Seed keyword → receive ranked lead list.
Review Hot leads, skim Warm leads, ignore Cold.
Generate/inspect full audit, copy-paste tailored email.
High-Level Architecture
Browser-based Django admin → REST/HTMX views → Celery worker pool (Redis broker) → PostgreSQL.
Chrome-inside-Docker container for Lighthouse/axe.
Local disk volume for artifacts.
External: Google Maps Places API.
Task: HTMX endpoints for Force Audit, Mark Contacted, Delete Artifacts.
Files:
update dashboard/views.py
templates/dashboard/partials/*.html
Dependencies: Step 5.2
6. Admin UX Enhancements
Step 6.1: Keyword inline action
Task: Custom admin button to enqueue DiscoveryRun.
Files:
leads/admin.py override change_view.
Dependencies: Step 2.1
Step 6.2: ScoreConfig JSON editor
Task: Use JSONEditorWidget for weights; validate thresholds.
Files:
leads/admin.py tweak ModelAdmin
Dependencies: Step 2.3
7. Artifact Management
Step 7.1: Cron-like cleanup task
Task: Celery beat or Django-Q; delete files older than ARTIFACT_RETENTION_DAYS.
Files:
config/celery.py – add beat schedule
leads/tasks.py – cleanup_artifacts
Dependencies: Step 3.1
8. Testing & Quality Gate
Step 8.1: Unit tests for models & scoring
Task: FactoryBoy fixtures; assert bucket logic.
Files:
tests/leads/test_scoring.py
tests/conftest.py
Dependencies: Steps 2-4
Step 8.2: Integration test for Discovery→Score pipeline
Task: Mock Google API; run Celery tasks synchronously (eager).
Files:
tests/pipeline/test_full_flow.py
Dependencies: Step 4.3
Step 8.3: Playwright E2E for dashboard filters
Task: Headless Chromium; snapshot table states.
Files:
tests/e2e/dashboard.spec.ts
Dependencies: Step 5.3
9. CI/CD & Deployment
Step 9.1: GitHub Actions workflow
Task: Lint → test → build Docker images; push to GHCR.
Files:
.github/workflows/ci.yml
Dependencies: Step 8.3
Step 9.2: Production deploy script
Task: Terraform or simple Fly.io fly.toml; define secrets, volumes.
Files:
deploy/fly.toml or render.yaml
Dependencies: Step 9.1
User Instructions: Create Fly.io app, set env vars (DB url, API key).
10. Documentation & Final QA
Step 10.1: Update README with setup & usage
Task: Add local dev, seed data, common commands.
Files:
README.md
Dependencies: All prior steps
Step 10.2: Manual smoke test checklist
Task: Verify keyword run → Hot lead → PDF generated; dashboard actions.
Files:
docs/SMOKE_TEST.md
Dependencies: Step 10.1
Summary
This plan bootstraps a Dockerised Django 5 application with Celery workers, implements the full agent pipeline (Discovery→Sniff→Score→Audit), caches probe data, and exposes an HTMX admin dashboard. Steps progress logically: models → tasks → views → quality gates → deploy. Each step limits changes to <20 files, making it safe for iterative, automated code generation.
Key dependencies
Celery configuration (Steps 3.x) must precede agent task creation.
Dashboard views rely on models and scoring logic being in place.
Potential complexities
Headless Chrome inside containers: ensure proper fonts and sandbox flags.
Google Maps quota: secure API key, monitor costs.
Once Steps 0-3 run successfully, agents can be executed locally to iterate quickly on scraping accuracy before finishing UI and deployment.
Why this structure? Keyword drives discovery; Site deduplicates businesses hit by multiple keywords. SniffSnapshot lets Sniff Agent skip refetch if the last snapshot is < X days old.
Pipeline Stages
Discovery Agent
Runs per Keyword; stores or links to existing Site rows.
Reuses cached address/name if the place_id already exists.
Sniff Agent
Checks cache: use most-recent SniffSnapshot if < REFRESH_DAYS; else probe.
Saves new snapshot & updates Site basic fields (SSL, CMS).
Scoring Agent
Pulls weights & thresholds from ScoreConfig (admin UI).