Let’s talk about Smith’s ideal chat equation. (I’m Smith, and we’re makin’ it up right here.)
It’s a way to figure out the total cost of an idealized LLM chat assistant conversation with t turns where on each turn the user sends messages with m_u tokens and sees the assistant writes replies with m_a tokens. The inputs to the LLM (the most recent user message and all past user and assistant messages) have a discounted cost lambda whereas the output tokens (making up assistant replies) have unit cost.
I'd be happy to develop this concept with you, Smith! Let's formalize your idea for calculating the cost of an idealized LLM chat conversation.