Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save prmichaelsen/88582d5870b45fb617deece8fa504d92 to your computer and use it in GitHub Desktop.
Save prmichaelsen/88582d5870b45fb617deece8fa504d92 to your computer and use it in GitHub Desktop.

AWS Claude Sonnet 3.7 Quota Increase Request

Request Details

Use-Case Description:

We at XXXXX are building a Model Context Protocol (MCP) client that functions as an agentic AI root cause analysis bot, which will be integrated into our enterprise observability and incident management platform. This enterprise-grade product is following a phased deployment approach:

Phase 1 (Current - Development): Supporting 1-3 XXXXX developers who are actively building and testing the workflow. During this phase, our engineering team needs sufficient quota to iteratively test and refine the system's capabilities to:

  • Use Claude to analyze user intent and determine appropriate investigative actions
  • Invoke multiple MCP server tools to fetch operational data from XXXXX platforms (XXXXX)
  • Process large API response payloads including complex JSON structures, documentation, and code samples from our enterprise monitoring systems
  • Synthesize findings across multiple data sources to identify root causes of complex enterprise infrastructure issues

Phase 2 (Production - Within 3-6 months): Scaling to support a cohort of 1-50 enterprise customers in a production environment. At this stage, the system will be business-critical for our customers, directly impacting mean time to resolution (MTTR) for production incidents across their infrastructure.

The large context window (up to 200k tokens) is essential throughout both phases as our enterprise-grade bot must simultaneously process multiple large datasets from XXXXX monitoring systems to effectively correlate information and identify causal relationships between system components during incident analysis. As established leaders in the XXXXX space, we're committed to delivering AI-powered solutions that meet the high standards our enterprise customers expect. We're requesting initial quotas to support our development phase with a plan to request increased quotas before production deployment.

Model ID(s):

Sonnet 3.7

Region(s):

us-east-2

Limit type:

Both RPM and TPM

Phase 1 (Development - Current Need)

Requested TPM (tokens per minute):

  • Steady-state: 200,000 TPM

    • Reasoning: With 3 XXXXX developers actively testing, each generating requests that process approximately 200,000 tokens (input + output across multiple steps), and requests occurring every 15 minutes on average during development and testing.
  • Peak: 600,000 TPM

    • Reasoning: During intensive testing periods or when simulating enterprise incident response scenarios, developers may generate 3x the normal request volume.

Requested RPM (requests per minute):

  • Steady-state: 6 RPM

    • Reasoning: Each developer interaction generates multiple Claude requests (initial query analysis, 3-5 data processing steps, final synthesis). With 3 developers making complete interactions every 15 minutes, that's approximately 6 individual Claude requests per minute.
  • Peak: 18 RPM

    • Reasoning: During intensive testing periods, request rates could triple as developers rapidly iterate on test cases for enterprise use cases.

Average input tokens per request:

25,000 tokens

Average output tokens per request:

15,000 tokens

Percentage of requests with input tokens greater than 25k:

60%

Phase 2 (Production - Future Need, for planning purposes)

Requested TPM (tokens per minute):

  • Steady-state: 1,000,000 TPM
  • Peak: 3,000,000 TPM

Requested RPM (requests per minute):

  • Steady-state: 30 RPM
  • Peak: 90 RPM

Average input tokens per request:

25,000 tokens

Average output tokens per request:

15,000 tokens

Percentage of requests with input tokens greater than 25k:

60%

Additional Context

We understand that we will need to submit a separate quota increase request before scaling to production levels, but providing our planned growth trajectory helps demonstrate the strategic importance of this project to XXXXX's enterprise observability and incident management offerings. As established enterprise software providers with a large customer base, we're committed to working with AWS to ensure our usage patterns are optimized and sustainable.

Technical Implementation Details

The MCP client architecture involves:

  1. User Intent Analysis Layer: Claude analyzes user queries about incidents or system anomalies to determine appropriate investigative actions.

  2. Data Retrieval Layer: MCP server tools fetch relevant operational data:

    • XXXXX from XXXXX monitoring systems
    • XXXXX
    • XXXXX
    • XXXXX
    • XXXXX
    • XXXXX
    • XXXXX
    • XXXXX
    • XXXXX
    • XXXXX
  3. Analysis Layer: Claude processes and correlates data across these sources, identifying patterns and causal relationships.

  4. Presentation Layer: Results are formatted for the UX client with actionable insights and recommendations.

Each user interaction typically involves 3-5 Claude API calls as the system iteratively gathers and analyzes information, with large context windows required to maintain conversation history and process multiple data sources simultaneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment