Token Yield is the ratio of necessary spend to total spend in an enterprise AI agent deployment. Necessary spend is the cost of queries that could not have been answered from cache. Total spend includes both necessary queries and Rediscovery Tax spend on queries already answered. A deployment with 42% cache hit rate and a Token Yield of 58% means 42 cents of every dollar spent reached the frontier unnecessarily. Excipio raises Token Yield by eliminating the Rediscovery Tax component of total spend.
Token Yield is the ratio of necessary spend to total spend in an enterprise AI agent deployment. Necessary spend covers queries that genuinely required a frontier LLM. Total spend includes those queries plus every dollar spent re-answering questions the system already knew. Token Yield = necessary spend divided by total spend.
Token Yield equals necessary LLM spend divided by total LLM spend. If your agents spend $1M per month and Excipio caches 42% of queries, necessary spend is $580K. Token Yield is 0.58, or 58%. The remaining 42% is Rediscovery Tax. Excipio raises Token Yield by eliminating that 42% from the denominator.
Uncached deployments have a Token Yield near zero because nearly all spend includes some Rediscovery Tax component. A 42% cache hit rate, typical for Excipio deployments on day one, produces a Token Yield of 0.58. Well-tuned deployments reach 0.65 to 0.75 as the cache compounds.
Cache hit rate measures the fraction of queries answered from cache. Token Yield measures the fraction of total spend on queries that genuinely required a new LLM call. They are complementary. A high cache hit rate on low-cost queries produces less Token Yield improvement than caching high-frequency, expensive queries.
Every cache hit adds a validated query-answer pair to the enterprise knowledge graph. As the graph grows, the semantic match rate increases. A deployment starting at 42% cache hit rate typically reaches 60 to 70% within 90 days as the graph compounds. Token Yield rises in parallel.