If caching were cheap, you'd cache everything.
Precompute the feed. Precompute the search. Precompute the report. Every user's dashboard, ready and waiting. Every autocomplete prefix, not just the popular ones. Every tenant isolated, not sharing key prefixes. Every branch with its own state.
You'd stop asking "should we cache this?" and start asking "why wouldn't we?"
But caching isn't cheap, so you don't. You ration.
We laid out what must exist for cache to be free. We built Cinch to solve it.
Free to create. Spinning up a cache costs nothing. No provisioning delay.
Cost scales with usage. Pay for what you use, not peak capacity.
Built on cheap primitives. NVMe, not just RAM.
Exceptional performance. Meet or approach RAM speed for real workloads.
Instant wake-up. Sleeping caches come back in milliseconds.
Redis® compatible. Drop-in. No migration pain.
Here's how.
#Most of your cache is cold
Traditional Redis® keeps everything in RAM. That's the only way to guarantee sub-millisecond latency for every key. But here's the thing: you don't need every key to be equally fast.
Access patterns follow power laws. The famous 80/20 rule says 20% of your keys get 80% of your reads. But within that hot 20%, it's 80/20 again. And again. Your top 1% of keys might handle 50% of all requests. Your top 0.1% might handle 25%.
Here's what this means in practice: if your hottest keys get sub-millisecond reads and your warm keys get 1-2ms reads, users experience a fast application. The aggregate performance is what matters. Nobody notices if the 99th percentile key is 2ms instead of 0.3ms. That extra millisecond gets crushed underfoot by the network round trip.
Traditional Redis® pays for sub-millisecond latency on keys that get accessed once a week. That's wasteful. If you're willing to trade a couple hundred microseconds on cold keys, the economics change completely.
#The Accord for caching
There's a concept from industrial economics worth understanding: capital efficiency. The fastest option isn't always the best option.
Think about how we move cargo. Trains are slow and cheap. Ships are slower and cheapest. Trucks are medium speed and expensive. Planes are fast and very expensive. Each serves a purpose. You don't air-freight coal, and you don't send organs by container ship.
Caching has the same problem, but with only two options:
Dump truck (your database). Handles any load, but slow. You wouldn't commute in one.
McLaren (RAM-only Redis®). Blazing fast, expensive, overkill for 95% of trips.
Most applications are commuting in dump trucks because McLarens are too expensive. That's not a choice. It's a lack of options.
Cinch is the Honda Accord. Still sporty, way faster than a dump truck, but efficient enough for daily driving. Good enough performance at a price that lets you actually use it.
We're not trying to be the fastest cache. We're trying to be fast enough for real workloads while staying capital efficient. For most applications, that's a better trade.
#Breaking up the big cache
The old model is one giant Redis® cluster for everything. Shared state, shared capacity, shared problems. You ration because provisioning is slow and caches are expensive.
But remember: access patterns are fractal. Most tenants are idle. Most branches are stale. Most agents are between tasks. That big shared cache? Most of it is cold most of the time.
So break it up.
Cinch makes it trivially easy to spin up caches. One per tenant. One per branch. One per agent. One per task. Create them in milliseconds, delete them when you're done. They cost almost nothing when idle, and they wake instantly when needed.
Instant provisioning means you can create caches on demand. Scale to zero means idle caches don't drain your budget. Cheap tiered storage means cold keys and cold caches cost almost nothing. For truly infrequent data, storage can move to network or cloud, with sub-50ms wakeup when it's needed again.
The fractal access pattern makes all of this work. At every level of the stack.
#Under the hood
Cinch is a Redis®-compatible cache built on tiered storage. The protocol is identical. Any Redis® client works unchanged. But the architecture is different.
Two tiers
In-memory buffer. Your hottest keys live here. Sub-millisecond reads. This is real RAM, just like traditional Redis®, but only for the data that actually needs it.
Fast NVMe storage. Everything else. Low-single-digit-millisecond reads. Modern NVMe is fast enough that most applications can't tell the difference.
Hot keys automatically bubble up to the buffer based on access patterns. Cold keys sink down to storage. You don't manage this, it just happens. The system learns what's hot and keeps it fast.
Auto-stop
When a cache is inactive, it stops. Your data persists on NVMe, but compute costs drop to near zero. You're not paying for a server sitting idle at 3am waiting for traffic that won't come until morning.
When a request comes in, the cache wakes in milliseconds. Fast enough that your application doesn't notice. Fast enough that you can spin up hundreds of caches and only pay for the ones actually being used.
Redis® compatible
We implement the Redis® protocol. Any Redis® client works unchanged. No SDK, no migration, no rewrite. Point your connection string at Cinch and go.
This matters because Redis® is everywhere. Decades of libraries, tutorials, and battle-tested patterns. We're not asking you to learn something new. We're asking you to pay less for something you already know.
#The economics
In-memory buffer: $10/GB/mo. Your hottest keys live here.
NVMe storage: $1/GB/mo. Everything else.
Most applications keep about 20% of their data hot. Assuming that ratio:
| Provider | 1GB | 10GB | 50GB | 100GB |
|---|---|---|---|---|
| AWS ElastiCache | $126* | $126* | ~$500 | ~$950 |
| Upstash | $20 | ~$60 | ~$200 | ~$350 |
| Cinch | $3 | $28 | $140 | $280 |
* AWS minimum instance is ~13GB
That's roughly 10x cheaper than traditional managed Redis®. Not by cutting corners on performance, but by building for how caches are actually used.
That 90% savings matters, but the real win is what you build when cache stops being a constraint. When cache is cheap enough, you stop optimizing and start building.
#Cache everywhere
When caches are this cheap, you stop sharing and start isolating.
Per-tenant caches. Each customer gets their own cache. Complete data isolation. Not key prefixes, actual separation. A misbehaving tenant can't blow out everyone else's keys. A noisy neighbor can't spike everyone's latency.
Per-environment caches. Every PR gets its own staging cache. Every developer gets their own dev environment. Fork production data, test against it, delete when done. No more "who's using the staging Redis® right now?"
Per-user caches. Give every active user their own fast layer. Precompute their dashboard, their feed, their recommendations. Inactive users cost almost nothing thanks to auto-stop. When they come back, their cache wakes up.
Ephemeral caches for agents. AI agents running multi-step tasks spin up a dedicated cache, use it for the duration of the job, and discard it. No cleanup, no collision with other tasks. Cache-per-task makes sense when caches are disposable.
Edge caches. Put caches close to users. When each cache costs a few dollars idle, you can afford to have one in every region. Latency drops, user experience improves, and you're not paying for capacity you don't use.
#Standing on shoulders
We're not the first to question the all-RAM orthodoxy. A wave of companies have been rethinking database economics, and we've learned from all of them.
Neon separated compute and storage for Postgres. They proved you could have scale-to-zero databases with instant branching. The architecture was controversial ("Postgres needs local storage!") until it worked.
Turso put SQLite on the edge with libSQL. They proved embedded databases could be distributed. The idea sounded crazy ("SQLite isn't for production!") until companies started using it.
Upstash pioneered serverless Redis®. They proved pay-per-request caching had a market. They moved the needle on Redis® economics.
Each of these challenged assumptions about what databases need to cost. We're continuing that tradition by applying the capital efficiency lens to caching specifically.
#What Cinch isn't
We're optimizing for a specific point in the design space. That means tradeoffs.
If you need guaranteed sub-millisecond P99 on every single request, stick with in-memory Redis®. We're optimizing for aggregate performance and capital efficiency, not worst-case tail latency.
Pub/Sub with thousands of long-lived connections isn't our sweet spot. Persistent connections break auto-stop. We're built for request-response patterns.
We implement the Redis® protocol and your clients work unchanged, but we're not pretending to be byte-for-byte identical. Some edge cases might behave differently. For most applications, that's invisible.
For most applications (web apps, APIs, AI agents, background jobs), these tradeoffs are invisible. You get 10x better economics and don't notice the difference. That's the point.
Try it out
Starter tier: $1/month. 100MB storage, 10MB buffer, up to 3 caches (including forks).
We're in private alpha. We're looking for early users who want to cache more than they currently can afford to.
JOIN THE WAITLIST →