LLM caches: saving tokens without dropping quality

A caching proxy in front of a language model can cut the token bill significantly, but it introduces subtle risks if the design is not careful. Which cache types work in production, where the usual traps sit, and how to add them without degrading the experience.

November 29, 2025 6 min 232

Artificial Intelligence

LiteLLM: A Proxy to Unify Model Providers

Cuando una aplicación habla con dos o más proveedores de LLM, antes o después aparece un proxy entre medias. LiteLLM propone uno concreto, y esta es la lectura honesta de qué gana y qué cuesta.

March 3, 2024 5 min 387 4.1