Memory for multi-model teams

Hit the limit?
Switch the model.
Keep going.

One private memory every AI model reads from. Hit a limit, switch to any other model or platform, and pick up exactly where you stopped. Self-hosted. Beta. Honest.

Get on the list Read limitations first →

00 / Read this first

Not every team
should use this.

Five real limitations, stated plainly. If any of these is a dealbreaker, this product isn't for you — and that's fine.

It does not import your old chats"No tool can pull your Claude.ai history — not us, not anyone. Your memory starts empty and fills from the day your team starts saving. If you wanted a magic 'import everything' button, this isn't it."

Garbage in, garbage out"It recalls what you save. Save lazy dumps, get noise. Save clean decisions, get sharp recall. The tool won't fix sloppy input."

You run a server"Self-hosted means your data, your machine, your control. It also means someone sets it up and keeps it running. We make that as painless as we can — it isn't click-and-forget."

It's a beta"Small team, rough edges, things change. Need a finished, guaranteed product today? Wait a few months. Want in early and don't mind dust? Keep reading."

It won't make a cheap model smart"Memory carries context. It doesn't upgrade the brain. DeepSeek with your memory is still DeepSeek — just one that isn't starting from zero."

Still here? Good. Now the part where it actually helps.

01 /

What it actually does

Switch models, keep the thread

Save in Claude. Recall in opencode, ChatGPT, DeepSeek, Gemma — whatever you run. No re-pasting, no re-explaining. The limit stops the chat, not the work.

Your memory is yours

Each teammate gets an isolated namespace. You can't read theirs, they can't read yours. Isolation is enforced, not promised.

Ask however you want

"Harga" finds "pricing." It matches on meaning, not exact words — so recall works even when you phrase it differently.

Nothing leaves your server

Self-hosted on your own box. No third party holds your team's memory. You keep the keys, literally.

02 /

How it works

"save this to my memory"

✓ Saved — "Architecture: FastAPI gateway, self-hosted on your own server."

Save from your chat

Tell your AI: "save this to my memory." It summarizes and stores it. One line, done.

Claude rate limited

opencode switch →

DeepSeek

ChatGPT

Hit a limit, switch models

Out of Claude credits? Move to opencode or whatever you run. Your memory is already there.

🔍 pricing decision

Pricing set at $8/mo — decided 3 days ago

Team size capped at 24 users

Monthly billing, no trial period

Recall and continue

Ask the new model. It pulls the relevant context by meaning and picks up where you stopped.

03 /

Under the hood

No black box. Here's exactly what runs and why each piece is there. If a term's new to you, the plain-English bit after it is the part that matters.

You type

Claude / opencode

→

Front door

Caddy (HTTPS)

→

The brain

Gateway API

→

Stores it

Qdrant + SQLite

The one service everything talks to. It takes your text, decides what to do with it, and enforces that your API key only ever touches your own memory. Written in Python. Built it ourselves — it's the part that's actually "ViMemory."

A small model (all-MiniLM) that converts text into a list of numbers. Text with similar meaning gets similar numbers — that's how "harga" can find "pricing." Runs locally on your server, so it's free and nothing gets sent out to an API.

A vector database. It stores those number-lists and, when you ask something, finds the closest matches by meaning instead of exact keywords. This is the engine behind "recall the right context."

A plain, file-based database holding the source of truth: who saved what, when, the original text, the tags, and the hashed API keys. Boring on purpose. Boring is reliable.

A reverse proxy that puts automatic HTTPS in front of everything. It's the only piece exposed to the internet — Qdrant and the database sit behind it, unreachable from outside.

All of the above run as containers started by one command. That's how a non-developer can stand the whole thing up on a server without wiring each piece by hand.

04 /

Who it's for

Right for you if…

✓Your team runs Claude alongside other models
✓You keep hitting limits mid-work
✓You're tired of re-pasting the same context
✓You want your data on your own server
✓You can tolerate a beta to get in early

Skip it if…

✕You want old chat history auto-imported
✕You won't or can't run a small server
✕You need a polished, finished product now
✕You're a solo user on one model only
✕You expect memory to make models smarter

05 / Early access

If the honesty didn't scare you off, get in line.

On price: we haven't set one. It's a beta — early teams get in free while we figure out if this is worth charging for. If and when there's a price, you hear it from us first. No surprise invoice.

Open the waitlist form →

Opens our Notion form in a new tab. We onboard a few teams at a time, personally — that's not scarcity marketing, it's a small team being honest about capacity.

ViMemory context flow: Claude Desktop to opencode via ViMemory

Hit the limit?Switch the model.Keep going.

Not every teamshould use this.