What Is Token Yield?

Token Yield is the ratio of necessary spend to total spend in an enterprise AI agent deployment. Necessary spend is the cost of queries that could not have been answered from cache. Total spend includes both necessary queries and Rediscovery Tax spend on queries already answered. A deployment with 42% cache hit rate and a Token Yield of 58% means 42 cents of every dollar spent reached the frontier unnecessarily. Excipio raises Token Yield by eliminating the Rediscovery Tax component of total spend.

What is Token Yield?

Token Yield is the ratio of necessary spend to total spend in an enterprise AI agent deployment. Necessary spend covers queries that genuinely required a frontier LLM. Total spend includes those queries plus every dollar spent re-answering questions the system already knew. Token Yield = necessary spend divided by total spend.

How do you calculate Token Yield?

Token Yield equals necessary LLM spend divided by total LLM spend. If your agents spend $1M per month and Excipio caches 42% of queries, necessary spend is $580K. Token Yield is 0.58, or 58%. The remaining 42% is Rediscovery Tax. Excipio raises Token Yield by eliminating that 42% from the denominator.

What is a good Token Yield benchmark for enterprise AI?

Uncached deployments have a Token Yield near zero because nearly all spend includes some Rediscovery Tax component. A 42% cache hit rate, typical for Excipio deployments on day one, produces a Token Yield of 0.58. Well-tuned deployments reach 0.65 to 0.75 as the cache compounds.

How does Token Yield relate to cache hit rate?

Cache hit rate measures the fraction of queries answered from cache. Token Yield measures the fraction of total spend on queries that genuinely required a new LLM call. They are complementary. A high cache hit rate on low-cost queries produces less Token Yield improvement than caching high-frequency, expensive queries.

Why does Token Yield compound over time?

Every cache hit adds a validated query-answer pair to the enterprise knowledge graph. As the graph grows, the semantic match rate increases. A deployment starting at 42% cache hit rate typically reaches 60 to 70% within 90 days as the graph compounds. Token Yield rises in parallel.