Skip to content

Instantly share code, notes, and snippets.

@Jordanh1996
Jordanh1996 / repro_prompt_too_long.py
Created March 27, 2026 20:57
Reproduction: SummarizationMiddleware token underestimation with ChatAnthropicVertex
"""Reproduction: SummarizationMiddleware token underestimation with ChatAnthropicVertex.
LangChain's _get_approximate_token_counter checks model._llm_type == "anthropic-chat",
but ChatAnthropicVertex._llm_type returns "anthropic-chat-vertexai". This causes the
token counter to use 4.0 chars/token instead of 3.3, underestimating by ~16%.
The summarization middleware never triggers, and the API rejects the prompt.
Additionally:
- use_usage_metadata_scaling is gated on response_metadata["model_provider"], which
ChatAnthropicVertex never sets. The scaling safety net is a no-op.