We at XXXXX are building a Model Context Protocol (MCP) client that functions as an agentic AI root cause analysis bot, which will be integrated into our enterprise observability and incident management platform. This enterprise-grade product is following a phased deployment approach:
Phase 1 (Current - Development): Supporting 1-3 XXXXX developers who are actively building and testing the workflow. During this phase, our engineering team needs sufficient quota to iteratively test and refine the system's capabilities to:
- Use Claude to analyze user intent and determine appropriate investigative actions
- Invoke multiple MCP server tools to fetch operational data from XXXXX platforms (XXXXX)
- Process large API response payloads including complex JSON structures, documentation, and code samples from our enterprise monitoring systems
- Synthesize findings across multiple data sources to identify root causes of complex enterprise infrastructure issues
Phase 2 (Production - Within 3-6 months): Scaling to support a cohort of 1-50 enterprise customers in a production environment. At this stage, the system will be business-critical for our customers, directly impacting mean time to resolution (MTTR) for production incidents across their infrastructure.
The large context window (up to 200k tokens) is essential throughout both phases as our enterprise-grade bot must simultaneously process multiple large datasets from XXXXX monitoring systems to effectively correlate information and identify causal relationships between system components during incident analysis. As established leaders in the XXXXX space, we're committed to delivering AI-powered solutions that meet the high standards our enterprise customers expect. We're requesting initial quotas to support our development phase with a plan to request increased quotas before production deployment.
Sonnet 3.7
us-east-2
Both RPM and TPM
-
Steady-state: 200,000 TPM
- Reasoning: With 3 XXXXX developers actively testing, each generating requests that process approximately 200,000 tokens (input + output across multiple steps), and requests occurring every 15 minutes on average during development and testing.
-
Peak: 600,000 TPM
- Reasoning: During intensive testing periods or when simulating enterprise incident response scenarios, developers may generate 3x the normal request volume.
-
Steady-state: 6 RPM
- Reasoning: Each developer interaction generates multiple Claude requests (initial query analysis, 3-5 data processing steps, final synthesis). With 3 developers making complete interactions every 15 minutes, that's approximately 6 individual Claude requests per minute.
-
Peak: 18 RPM
- Reasoning: During intensive testing periods, request rates could triple as developers rapidly iterate on test cases for enterprise use cases.
25,000 tokens
15,000 tokens
60%
- Steady-state: 1,000,000 TPM
- Peak: 3,000,000 TPM
- Steady-state: 30 RPM
- Peak: 90 RPM
25,000 tokens
15,000 tokens
60%
We understand that we will need to submit a separate quota increase request before scaling to production levels, but providing our planned growth trajectory helps demonstrate the strategic importance of this project to XXXXX's enterprise observability and incident management offerings. As established enterprise software providers with a large customer base, we're committed to working with AWS to ensure our usage patterns are optimized and sustainable.
The MCP client architecture involves:
-
User Intent Analysis Layer: Claude analyzes user queries about incidents or system anomalies to determine appropriate investigative actions.
-
Data Retrieval Layer: MCP server tools fetch relevant operational data:
- XXXXX from XXXXX monitoring systems
- XXXXX
- XXXXX
- XXXXX
- XXXXX
- XXXXX
- XXXXX
- XXXXX
- XXXXX
- XXXXX
-
Analysis Layer: Claude processes and correlates data across these sources, identifying patterns and causal relationships.
-
Presentation Layer: Results are formatted for the UX client with actionable insights and recommendations.
Each user interaction typically involves 3-5 Claude API calls as the system iteratively gathers and analyzes information, with large context windows required to maintain conversation history and process multiple data sources simultaneously.