Jordan Hury Jordanh1996

Jordanh1996 / repro_prompt_too_long.py

Created March 27, 2026 20:57

Reproduction: SummarizationMiddleware token underestimation with ChatAnthropicVertex

	"""Reproduction: SummarizationMiddleware token underestimation with ChatAnthropicVertex.

	LangChain's _get_approximate_token_counter checks model._llm_type == "anthropic-chat",
	but ChatAnthropicVertex._llm_type returns "anthropic-chat-vertexai". This causes the
	token counter to use 4.0 chars/token instead of 3.3, underestimating by ~16%.
	The summarization middleware never triggers, and the API rejects the prompt.

	Additionally:
	- use_usage_metadata_scaling is gated on response_metadata["model_provider"], which
	ChatAnthropicVertex never sets. The scaling safety net is a no-op.