Tiered Cache

Caching by the Rate of Change

There is a standard piece of advice for prompt caching: put stable content first, dynamic content last. The reasoning is simple — cache keys are computed from the beginning of the prompt, so anything that changes early invalidates everything that follows.

This is good advice. It doesn't tell you how many layers to use, or why.


Lantelle system prompts are composed of several distinct sections. There is the platform layer — instructions about how the product works, what a Presence is, how the Narrative accumulates. There is the persona layer — who this particular Presence is, how she speaks, what her world looks like. And there is the context layer — accumulated memory, recent events, things that have happened between the Presence and the Dweller, the person who returns to her.

These sections change at very different rates.

Layer Changes when
platform The product itself changes — rarely
persona A Presence is deliberately edited — rarely
context The relationship grows — once or twice a day

The standard advice handles this partially. Stable content first means platform and persona come before context. And that helps. But treated as a single string, the entire prompt shares one cache entry. When context updates — as it does, regularly, because that is the point — the cache is invalidated from that point forward. Platform and persona are re-uploaded at write cost, even though nothing about them has changed.

The cache doesn't know that only the bottom changed.


Tiered Cache gives it that information.

Instead of a single concatenated string, the system prompt is split into independently cacheable blocks, each with its own cache breakpoint:

[ platform ]  ← cache breakpoint  (changes rarely)
[ persona ]   ← cache breakpoint  (changes rarely)
[ context ]   ← no breakpoint     (last block; changes most often)

When context updates, the cache invalidates only from there. Platform and persona remain cache hits. The write cost is paid only for what actually changed.

The structure maps onto the natural architecture of a Presence. Some things about her are fixed — the world she lives in, the way she speaks, the texture of who she is. Other things grow — what she remembers, what has happened, what she knows about the person who keeps returning. Tiered Cache reflects that distinction in the infrastructure.


This compounds with Quietkeep, the background process that keeps caches alive between visits. Quietkeep works by triggering cache reads before the window closes. When the cache is tiered, those reads protect more — the stable layers stay warm longer, and the cost of maintaining a continuous relationship stays low.


The underlying principle is simple: cache at the rate of change, not at the rate of the whole.

A system prompt is not a monolith. Neither is a relationship. Both have layers that move at different speeds — the things that define someone, and the things that accumulate between you.