Skip to content

Instantly share code, notes, and snippets.

@martinbowling
Created January 3, 2025 18:03
Show Gist options
  • Save martinbowling/c93250135f53940043390ead7b74cc96 to your computer and use it in GitHub Desktop.
Save martinbowling/c93250135f53940043390ead7b74cc96 to your computer and use it in GitHub Desktop.
o1 pro generated PRD for an Autonomous AI Link Placement agent

Below is a sample Product Requirements Document (PRD) for an autonomous AI Agent that handles outbound link placement requests. It outlines goals, requirements, workflows, and success criteria.


1. Overview

1.1 Purpose

The purpose of this AI Agent is to autonomously process requests to place outbound links to a specified website or page. The Agent will:

  1. Scrape the target site or page to extract its main topics.
  2. Distill the scraped content into 5 key categories using a Large Language Model (LLM).
  3. Search a database of owned or partner sites/pages for the best contextual fit.
  4. Insert a link naturally into an existing piece of content if possible.
  5. If no suitable content is found, create a new post and insert the outbound link.
  6. Notify the requestor via email that their link has been placed.

1.2 Scope

  • In Scope:

    • Automatically scraping and summarizing target URLs.
    • Categorizing topics via LLM analysis.
    • Matching target categories with a database of owned/partner pages.
    • Inserting links or creating new posts where needed.
    • Sending confirmation email to requestor.
  • Out of Scope:

    • Manual or human-led content editing.
    • Paid link placements or specialized outreach management.
    • Complex SEO optimization or compliance checks outside of basic category matching.

2. Goals and Success Criteria

  1. Accurate Categorization: The LLM should reliably distill the target page’s content into five overarching categories.

    • Success Metric: 90% of categories must be contextually relevant to the page’s content.
  2. Relevant Link Placement: Ensure outbound links are inserted in a contextually appropriate area of an existing or newly created page.

    • Success Metric: 95% of link insertions should align topically with the surrounding text.
  3. Efficient Workflow: The agent should handle the entire process end-to-end with minimal human intervention.

    • Success Metric: 85% of link requests fulfilled automatically without manual overrides.
  4. Timely Communication: Send confirmation emails to requestors within a reasonable time.

    • Success Metric: 100% of requestors receive an email within 24 hours.

3. User Stories and Use Cases

  1. User Story #1: As a digital marketing manager, I want to request a link to my website so that I can improve SEO and referral traffic.

    • Acceptance Criteria: The AI Agent processes the request, identifies relevant site content, places the link, and sends a confirmation email.
  2. User Story #2: As a site owner, I want the AI Agent to create a new post if no existing pages match the content categories so that I can ensure each outbound link is placed in a contextually relevant environment.

    • Acceptance Criteria: The AI Agent either finds a suitable existing page or auto-generates a new post with the link and notifies the requestor.
  3. User Story #3: As a marketing platform integrator, I want to track how many link requests are successfully fulfilled without manual intervention.

    • Acceptance Criteria: The system logs each request and whether it was auto-fulfilled or flagged for manual review.

4. Functional Requirements

4.1 Link Request Intake

  • FR-1: The Agent shall expose an API endpoint or webhook to receive link requests.
  • FR-2: Each request must include the target URL (the website/page to link to) and the recipient’s email address for confirmation.

4.2 Scraping and Categorization

  • FR-3: The Agent shall scrape the target URL to collect text-based content from the page (HTML parsing, ignoring ads, scripts, and non-relevant elements).
  • FR-4: The Agent shall feed the scraped content to an LLM to distill it into 5 primary categories or topics.
    • Example Output: [“Fitness”, “Nutrition”, “Exercise Equipment”, “Home Workouts”, “Lifestyle”]

4.3 Database Matching

  • FR-5: The Agent shall match the 5 categories to a database of existing sites/pages.
    • Implementation Detail: The database entries should have metadata (topics, tags, relevant keywords).
  • FR-6: The Agent should prioritize results based on closest topic matching to ensure contextual relevance.

4.4 Link Placement

  • FR-7: If an existing site or page is deemed contextually suitable, the Agent shall scan the text to find a natural insertion point for the outbound link.
    • Implementation Detail: The insertion point should be at a relevant sentence or paragraph.
  • FR-8: If no existing site/page meets a minimum relevance threshold, the Agent shall create a new post.
    • Implementation Detail: The new post should incorporate the target categories into the content and naturally include the outbound link.

4.5 Notification

  • FR-9: After successful link placement, the Agent shall send an email to the requestor, confirming the publication location and providing a direct link to it.
  • FR-10: Email must include at least the following details:
    1. URL of the new or existing post that contains the placed link.
    2. Short confirmation message regarding the request status.

4.6 Logging and Reporting

  • FR-11: The Agent shall log each request, its categorization, placement details (existing page or new post), and notification status.
  • FR-12: The Agent shall provide an interface (dashboard or API) to retrieve logs and analytics on link placements.

5. Non-Functional Requirements

5.1 Performance

  • NFR-1: The Agent should handle up to 100 concurrent link placement requests without significant performance degradation.
  • NFR-2: Categorization and insertion process should complete within an average of 2 minutes per request under normal load.

5.2 Reliability

  • NFR-3: The system should be available 99.9% of the time (excluding planned maintenance).
  • NFR-4: Must handle network timeouts gracefully and retry failed scrapes up to 3 times.

5.3 Security

  • NFR-5: The system should store minimal personally identifiable information (email addresses only) and securely hash or encrypt it if needed for compliance.
  • NFR-6: Data at rest should be encrypted, and data in transit (scraping, emailing) should use HTTPS/SSL.

5.4 Maintainability

  • NFR-7: Codebase should be modular, with separate modules for scraping, classification, DB matching, link insertion, and notification.
  • NFR-8: The LLM interface should be abstracted so future LLM upgrades or changes can be made with minimal refactoring.

6. Architecture and Workflow

flowchart LR
    A[Receive Link Request\n(API/Webhook)] --> B{Scrape Target Page}
    B --> C[Use LLM to Distill 5\nKey Categories]
    C --> D[Match Categories\nin Database]
    D --> E{Suitable Page Found?}
    E -- Yes --> F[Insert Link\nin Existing Content]
    E -- No --> G[Generate New Post\nwith AI Assistance]
    F --> H[Send Confirmation Email]
    G --> H[Send Confirmation Email]
    H --> I[Log Request and Status]
Loading
  1. Receive Link Request: The Agent receives the request containing the target URL and email.
  2. Scrape Target Page: Using an HTML parser, the Agent extracts textual content.
  3. Categorize via LLM: The Agent prompts an LLM to identify five key categories from the text.
  4. DB Matching: The Agent queries the internal DB of sites/pages, filtering by category.
  5. Decision: If a suitable existing page is found, proceed to Insert Link. If not, Generate New Post.
  6. Notification and Logging: Send an email confirmation and log the entire operation.

7. Data Model

Target Site Data

  • target_url: string (URL to be linked to)
  • scraped_content: text (scraped HTML content)
  • categories: array of strings (5 main categories)

DB Entry

  • site_id: unique identifier
  • url: string (the URL of the owned/partner site/page)
  • topics: array of strings (e.g., [“Fitness”, “Exercise Equipment”])
  • content_snippet: text (optional for matching preview)
  • last_updated: timestamp

Placement Log

  • request_id: unique identifier
  • target_url: string
  • placed_on_url: string
  • created_new_post: boolean
  • timestamp: timestamp
  • email_sent: boolean

8. Edge Cases & Assumptions

  1. Edge Cases

    • No Scraped Content: If the target URL is empty or behind a login, scraping may fail. The Agent should log the error and notify the requestor.
    • Multiple Potential Matches: If multiple existing pages are equally suitable, the Agent should select the best match based on a relevance score.
    • Truncated Content: Very large pages might cause token limitations in the LLM. Summaries should be chunked or truncated accordingly.
  2. Assumptions

    • The LLM used is robust enough to handle short to medium-length content (up to a few thousand words).
    • Database metadata is well-maintained (topics are accurate).
    • Email system integration is available and configured (e.g., SMTP credentials, service account, etc.).

9. Risks & Mitigations

  1. LLM Misclassification: If the LLM produces inaccurate categories, the link placement might be contextually misaligned.

    • Mitigation: Include a confidence threshold in the classification step, perform a fallback or partial manual review if the confidence is low.
  2. Over-Optimization for Specific SEO Terms: The Agent might incorrectly assume certain keywords are relevant.

    • Mitigation: Use a balanced approach by extracting broad topics rather than narrow SEO terms.
  3. Database Consistency: If topics in the database are outdated, matches may be incorrect.

    • Mitigation: Regularly update the site/page metadata via automated scans.

10. Testing & Validation

  1. Unit Tests

    • Scraping function: Ensure correct text extraction from varied HTML structures.
    • LLM categorization: Mock LLM calls, verify the system handles normal + edge-case outputs.
  2. Integration Tests

    • End-to-end flow: Pass a test link, confirm the system inserts it into the correct page or creates a new post.
    • Email notification: Verify the email is sent with correct content and logs are updated.
  3. User Acceptance Tests (UAT)

    • Conduct scenario-based tests with real or staging websites to ensure the link insertion is naturally integrated and relevant.

11. Deployment & Rollout

  • Stage 1: Deploy in a test environment with dummy or staging sites in the database, verifying flows.
  • Stage 2: Pilot with a limited set of production sites to ensure performance and reliability metrics are met.
  • Stage 3: Full rollout, incorporate scheduled maintenance and monitoring to handle increased load.

12. Maintenance & Future Improvements

  1. Adaptive Learning: Enhance the LLM’s training or prompting based on user feedback for improved categorization over time.
  2. Multiple Languages: Support scraping and categorizing content in non-English languages.
  3. Advanced SEO Practices: Incorporate domain authority or other SEO data to guide link placements.
  4. Content Style Guidelines: Extend the content generation module to match brand voice or specific style guides.

13. Approval & Sign-off

  • Product Owner: Name, Title, Date
  • Engineering Lead: Name, Title, Date
  • QA Lead: Name, Title, Date

End of Document

This PRD provides a complete outline of the requirements, expected behaviors, success metrics, and technical details for developing an autonomous AI Agent that handles outbound link placements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment