ClawMem – Open-source agent memory with SOTA local GPU retrieval

So I've been building ClawMem, an open-source context engine that gives AI coding agents persistent memory across sessions. It works with Claude Code (hooks + MCP) and OpenClaw (ContextEngine plugin + REST API), and both can share the same SQLite vault, so your CLI agent and your voice/chat agent build on the same memory without syncing anything. The retrieval architecture is a Frankenstein, which is pretty much always my process. I pulled the best parts from recent projects and research and stitched them together: [QMD]( https://github.com/tobi/qmd ) for the multi-signal retrieval pipeline (BM25 + vector + RRF + query expansion + cross-encoder reranking), [SAME]( https://github.com/sgx-labs/statelessagent ) for composite scoring with content-type half-lives and co-activation reinforcement, [MAGMA]( https://arxiv.org/abs/2501.13956 ) for intent classification with multi-graph traversal (semantic, temporal, and causal beam search), [A-MEM]( https://arxiv.org/abs/2510.02178 ) for self-evolving memory notes, and [Engram]( https://github.com/Gentleman-Programming/engram ) for deduplication patterns and temporal navigation. None of these were designed to work together. Making them coherent was most of the work. On the inference side, QMD's original stack uses a 300MB embedding model, a 1.1GB query expansion LLM, and a 600MB reranker. These run via llama-server on a GPU or in-process through node-llama-cpp (Metal, Vulkan, or CPU). But the more interesting path is the SOTA upgrade: ZeroEntropy's distillation-paired zembed-1 + zerank-2. These are currently the top-ranked embedding and reranking models on MTEB, and they're designed to work together. The reranker was distilled from the same teacher as the embedder, so they share a semantic space. You need ~12GB VRAM to run both, but retrieval quality is noticeably better than the default stack. There's also a cloud embedding option if you're tight on vram or prefer to offload embedding to a cloud model. For Claude Code specifically, it hooks into lifecycle events. Context-surfacing fires on every prompt to inject relevant memory, decision-extractor and handoff-generator capture session state, and a feedback loop reinforces notes that actually get referenced. That handles about 90% of retrieval automatically. The other 10% is 28 MCP tools for explicit queries. For OpenClaw, it registers as a ContextEngine plugin with the same hook-to-lifecycle mapping, plus 5 REST API tools for the agent to call directly. It runs on Bun with a single SQLite vault (WAL mode, FTS5 + vec0). Everything is on-device; no cloud dependency unless you opt into cloud embedding. The whole system is self-contained. This is a polished WIP, not a finished product. I'm a solo dev. The codebase is around 19K lines and the main store module is a 4K-line god object that probably needs splitting. And of course, the system is only as good as what you index. A vault with three memory files gives deservedly thin results. One with your project docs, research notes, and decision records gives something actually useful. Two questions I'd genuinely like input on: (1) Has anyone else tried running SOTA embedding + reranking models locally for agent memory, and is the quality difference worth the VRAM? (2) For those running multiple agent interfaces (CLI + voice/chat), how are you handling shared memory today?

  • AI Agent
  • Code Generation
  • Integrations
Mar 22, 2026Visit website

AI Summary

ClawMem is an open-source context engine that provides AI coding agents with persistent memory across sessions, supporting both Claude Code and OpenClaw with a shared SQLite vault. It integrates multiple retrieval techniques and offers SOTA local GPU retrieval options.

Best For

AI developers, Machine learning engineers, Researchers working with AI agents

Why It Matters

ClawMem enables AI coding agents to maintain persistent, shared memory across sessions by integrating advanced retrieval architectures and local GPU inference.

Key Features

  • Persistent memory for AI coding agents across sessions
  • Unified SQLite vault for shared memory between CLI and chat agents
  • Multi-signal retrieval pipeline (BM25, vector, RRF, query expansion, cross-encoder reranking)
  • SOTA local GPU retrieval with optimized embedding and reranking models

Use Cases

  • A freelance software developer uses ClawMem to maintain a consistent understanding of multiple client projects across different coding sessions and tools, ensuring continuity and reducing the need to re-familiarize themselves with project specifics.
  • A data scientist integrates ClawMem with their CLI agent to access and recall past research findings, experimental parameters, and analytical insights, accelerating the iterative process of data exploration and model development.
  • A hobbyist AI enthusiast leverages ClawMem to build a personalized AI assistant that remembers conversations and preferences across voice and chat interfaces, creating a more cohesive and context-aware user experience.