Deep Dive: How OpenClaw's Memory System Works

A comprehensive look at OpenClaw's file-first memory system, exploring its hybrid search architecture, automatic memory flush, and implementation details.

Introduction

OpenClaw is an open-source AI agent framework that stands out for its sophisticated memory system. Unlike traditional RAG (Retrieval-Augmented Generation) systems that rely on vector databases, OpenClaw takes a file-first approach: Markdown files are the source of truth, and the memory system is designed to help AI agents remember context across conversations.

In this deep dive, we'll explore how OpenClaw's memory system works under the hood, examining its architecture, implementation details, and unique innovations that make it production-ready.

Architecture Overview

OpenClaw implements a file-based, Markdown-driven memory system with semantic search capabilities. The core philosophy is simple yet powerful: files are the source of truth — the AI agent only retains what gets written to disk.

openclaw-memory

Key Components

  1. Markdown Storage Layer: Plain text files in the workspace directory

  2. Vector Search Engine: SQLite-based with hybrid (BM25 + vector) retrieval

  3. Embedding Providers: Auto-selection between local/OpenAI/Gemini

  4. Automatic Memory Flush: Pre-compaction trigger to persist context

Memory Types & Storage Structure

OpenClaw uses a two-tier memory design to balance short-term context with long-term knowledge:

1. Ephemeral Memory (Daily Logs)

Location: memory/YYYY-MM-DD.md

Daily logs are append-only files that capture day-to-day activities, decisions, and context. The system automatically:

  • Creates a new file each day

  • Loads today's and yesterday's logs at session start

  • Provides a running context window for recent work

Evidence from memory.mdarrow-up-right:

"Daily log (append-only). Read today + yesterday at session start."

2. Durable Memory (Curated Knowledge)

Location: MEMORY.md

This is the curated long-term memory file containing:

  • Important decisions and preferences

  • Project conventions and patterns

  • Long-term todos and goals

  • Critical facts that should persist

Important: MEMORY.md is only loaded in private sessions, never in group contexts, to protect sensitive information.

3. Session Memory

Location: sessions/YYYY-MM-DD-<slug>.md

When starting a new session, OpenClaw can automatically save the previous conversation to a timestamped file with a descriptive slug (generated by LLM). These session transcripts are indexed and searchable, allowing agents to recall past conversations.

Session Workflow

Core Implementation: MemoryIndexManager

The central class managing all memory operations is MemoryIndexManager (manager.ts:119-232arrow-up-right).

Key responsibilities:

Features:

  • Singleton pattern with caching: Prevents duplicate indexes (INDEX_CACHE)

  • Per-agent isolation: Separate SQLite stores via agentId

  • File watching: Debounced sync on file changes

  • Provider fallback chain: Graceful degradation across embedding providers

  • Session integration: Tracks and indexes conversation transcripts

Markdown Chunking Algorithm

One of the critical aspects of any memory system is how content is chunked before embedding. OpenClaw uses a sophisticated sliding window algorithm with overlap preservation.

Algorithm Details

Source: internal.ts:144-215arrow-up-right

Characteristics:

  • Target: ~400 tokens per chunk (~1600 chars approximation)

  • Overlap: 80 tokens (~320 chars) between consecutive chunks

  • Line-aware: Preserves line boundaries with line numbers

  • Hash-based deduplication: Each chunk gets SHA-256 hash for cache lookup

Chunking Algorithm

Why This Approach?

  1. Overlap prevents context loss: Related information at chunk boundaries stays connected

  2. Line numbers: Enable precise source attribution (path + line range)

  3. Token approximation: 4 chars ≈ 1 token is reasonable for English text

  4. Hash stability: Same content → same hash → cache hit → no re-embedding

Hybrid Search: BM25 + Vector

OpenClaw doesn't rely solely on vector similarity. Instead, it uses weighted score fusion combining two complementary retrieval methods:

1. Vector Search (Semantic Similarity)

Great for conceptual matches:

  • "gateway host" ≈ "machine running gateway"

  • "authentication flow" ≈ "login process"

Uses cosine similarity with embeddings stored in SQLite via sqlite-vec extension.

2. BM25 Search (Lexical Matching)

Excellent for exact tokens:

  • Error codes: ERR_CONNECTION_REFUSED

  • Function names: handleUserAuth()

  • IDs and unique identifiers

Uses SQLite's FTS5 (Full-Text Search) virtual tables.

Hybrid Merge Algorithm

Source: hybrid.ts:39-111arrow-up-right

Default weights: 70% vector + 30% text

BM25 score normalization (hybrid.ts:34-37arrow-up-right):

This converts BM25 rank (lower is better) to a score in [0, 1] range for fusion.

Hybrid Search

Embedding Provider System

OpenClaw supports three embedding providers with intelligent auto-selection:

Auto-Selection Chain

Source: embeddings.ts:135-167arrow-up-right

Provider Implementations

1. Local Provider

Source: embeddings.ts:65-111arrow-up-right

  • Uses node-llama-cpp for local inference

  • Default model: hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf (~600MB)

  • Auto-downloads missing models

  • Requires: pnpm approve-builds (native compilation)

Pros: Privacy, no API costs, offline operation Cons: Requires ~1GB disk space, slower than cloud APIs

2. OpenAI Provider

Source: embeddings-openai.tsarrow-up-right

  • Default model: text-embedding-3-small (1536 dimensions)

  • Supports Batch API for bulk indexing (50% cost reduction)

  • Fast and reliable

3. Gemini Provider

Source: embeddings-gemini.tsarrow-up-right

  • Default model: gemini-embedding-001 (768 dimensions)

  • Async batch endpoint support

  • Free tier available

Batch Embedding Optimization

For large memory files, embedding every chunk individually would be expensive and slow. OpenClaw implements batch processing with caching to optimize this.

Cache-First Strategy

Source: manager.ts:1769-1848arrow-up-right

Batch Features

  1. SHA-256 hash-based deduplication: Same content → same embedding (cache hit)

  2. OpenAI Batch API: 50% cost reduction compared to sync API

  3. Gemini async batches: Similar cost savings

  4. Failure tolerance: Auto-disable after 2 failures, fallback to sync

  5. Concurrency: Default 2 parallel batch jobs

Cost Savings Example

Indexing 10,000 chunks with text-embedding-3-small:

  • Sync API: 10,000 × $0.00002 = $0.20

  • Batch API: 10,000 × $0.00001 = $0.10

  • With 50% cache hit: 5,000 × $0.00001 = $0.05

SQLite Schema & Vector Storage

OpenClaw uses SQLite as its storage backend with several specialized tables:

Core Tables

Source: memory-schema.ts:9-75arrow-up-right

Vector Acceleration

Source: manager.ts:677-689arrow-up-right

OpenClaw uses the sqlite-vec extension for in-database vector similarity queries:

  • Stores embeddings as FLOAT[] in virtual table

  • Performs cosine similarity search entirely in SQL

  • Falls back to JavaScript implementation if extension unavailable

Schema Design Benefits

  1. Embedding cache: Prevents re-embedding identical content across files

  2. FTS5: Fast lexical search without external dependencies

  3. Virtual tables: Efficient vector operations without loading all data into memory

  4. Delta tracking: File hash comparison for incremental updates

Automatic Memory Flush

One of OpenClaw's most innovative features is automatic memory flush before context compaction.

The Problem

Long conversations eventually hit the context window limit. When this happens, the system must "compact" (summarize or truncate) older messages. Without intervention, valuable context gets lost.

The Solution

Source: memory.md:37-74arrow-up-right

When a session is close to auto-compaction, OpenClaw triggers a silent, agentic turn that reminds the model to write durable memory before the context is compacted.

Configuration:

Trigger Logic

Memory flush activates when:

For a 200K context window:

Behavior

  • Usually silent (NO_REPLY response) if nothing important to save

  • One flush per compaction cycle to avoid spam

  • Skipped in read-only sandbox mode (no file write access)

  • Gives the agent a final chance to extract insights before truncation

openclaw-memory-behavior-flow

Session Memory Integration

OpenClaw can automatically save and index past conversations, making them searchable in future sessions.

Session Save Handler

Source: session-memory handler.ts:64-183arrow-up-right

Session Indexing

Source: manager.ts:1101-1197arrow-up-right

Features

  1. JSONL parsing: Extracts user/assistant messages from session transcripts

  2. Delta-based incremental indexing: Only processes new messages

  3. Debounced background sync: Default thresholds:

    • 100KB of new data, OR

    • 50 new messages

  4. LLM-generated slugs: Descriptive filenames like 2026-01-30-memory-system-research.md

Why Session Indexing Matters

Imagine working on a project for weeks. You might ask:

  • "When did we decide to use TypeScript?"

  • "What was that bug we fixed in the auth flow?"

  • "What approach did we try for caching last week?"

With session indexing, the agent can search past conversations and recall decisions made weeks ago.

Memory Search Tools

The memory system exposes two tools to agents:

Source: memory-tool.ts:22-69arrow-up-right

Returns: Snippets (~700 chars) with:

  • File path

  • Line range (start_line, end_line)

  • Relevance score

  • Snippet text

Use cases:

  • "What did I decide about the API design?"

  • "When did we last discuss authentication?"

  • "What are my current todos?"

2. memory_get

Reads specific memory files with optional line range filtering.

Use cases:

  • Reading full MEMORY.md for comprehensive context

  • Fetching a specific daily log

  • Retrieving exact lines after search narrows down location

Tool Availability

Both tools are only enabled when memorySearch.enabled resolves to true in the agent configuration.

Key Innovations

OpenClaw's memory system introduces several novel design decisions:

1. File-First Philosophy

No database as source of truth — just Markdown files. This means:

  • Human-readable, version-controllable memory

  • Easy backup and migration

  • Debuggable with standard text tools

  • No vendor lock-in

2. Hybrid Retrieval

Combining BM25 + vector search gives balanced precision/recall:

  • Vector search catches semantic matches

  • BM25 catches exact terms and rare tokens

  • Weighted fusion prevents either from dominating

3. Provider Auto-Selection

Local → OpenAI → Gemini fallback chain with:

  • Graceful degradation

  • User transparency (tool results show provider used)

  • No manual configuration required for most users

4. Batch Optimization

Uses discounted Batch APIs for:

  • 50% cost reduction on bulk indexing

  • Better resource utilization

  • Automatic fallback to sync on failure

5. Cache-First Embedding

SHA-256 hash deduplication prevents re-embedding:

  • Same paragraph across files → embed once

  • Session replay with same messages → cache hit

  • Significant cost savings on repeated content

6. Delta-Based Sync

Incremental session indexing with:

  • Byte/message thresholds

  • Debounced background sync

  • No full reindex on every message

7. Pre-Compaction Flush

Automatic context → memory transfer before truncation:

  • Prevents context loss

  • No manual intervention required

  • Silent when nothing important to save

8. Per-Agent Isolation

Separate SQLite stores per agent ID:

  • Multi-agent workflows don't cross-contaminate

  • Each agent has its own memory namespace

  • Supports different embedding models per agent

Performance Characteristics

Constants from manager.ts:92-110arrow-up-right:

Typical performance:

  • Local embedding: ~50 tokens/sec (node-llama-cpp on M1 Mac)

  • OpenAI embedding: ~1000 tokens/sec (with batching)

  • Search latency: <100ms for 10K chunks (hybrid search)

  • Index size: ~5KB per 1K tokens (with 1536-dim embeddings)

Comparison with Traditional RAG

How does OpenClaw's approach differ from typical RAG systems?

Aspect
Traditional RAG
OpenClaw Memory

Source of truth

Vector database

Markdown files

Search method

Vector only

Hybrid (BM25 + vector)

Storage

Pinecone/Weaviate/Chroma

SQLite

Embedding

Always remote API

Local-first with fallback

Chunking

Fixed-size

Line-aware with overlap

Caching

Usually none

SHA-256 hash-based

Updates

Full reindex

Delta-based incremental

Context preservation

Manual

Automatic pre-compaction flush

Human-readable

No

Yes (plain Markdown)

Cost optimization

Limited

Batch API + caching

Use Cases

OpenClaw's memory system shines in scenarios where:

  1. Long-running projects: Work on a codebase for weeks/months with persistent context

  2. Personal AI assistants: Remember preferences, habits, and long-term goals

  3. Research workflows: Accumulate knowledge over time, build on past insights

  4. Multi-agent systems: Each agent maintains its own memory space

  5. Offline-first applications: Local embedding provider works without internet

Limitations & Trade-offs

No system is perfect. OpenClaw's memory system has trade-offs:

Storage Growth

Daily logs and session transcripts accumulate over time. A year of daily use could generate:

  • ~365 daily logs

  • ~1000 session files

  • ~500MB SQLite index

Mitigation: Manual archiving, or implement retention policies.

Embedding Drift

Different providers use different embedding models (1536-dim vs 768-dim). Switching providers requires reindexing.

Mitigation: The system tracks embedding model per chunk and handles gracefully.

SQLite FTS5 is solid but lacks features like:

  • Fuzzy matching

  • Typo tolerance

  • Advanced ranking signals

Why it's acceptable: Most queries are semantic (vector) anyway, and BM25 handles exact matches well.

No Cross-File Context

Each chunk is embedded independently. A concept spanning multiple files might not be connected.

Mitigation: Use section headers and explicit cross-references in Markdown.

Future Directions

Based on the codebase, potential enhancements could include:

  1. Graph-based memory: Link related memories explicitly

  2. Importance scoring: Prioritize frequently-accessed memories

  3. Automatic summarization: Compress old daily logs periodically

  4. Multi-modal embeddings: Index images, code, diagrams

  5. Federated memory: Share curated memories across agents/teams

  6. Retention policies: Auto-archive old sessions

Conclusion

OpenClaw's memory system represents a thoughtful evolution of RAG architecture. By prioritizing file-first storage, hybrid search, and automatic context preservation, it addresses real pain points in long-running AI agent workflows.

Key takeaways:

  • Files are the source of truth: Human-readable, version-controllable memory

  • Hybrid retrieval works: BM25 + vector gives better results than either alone

  • Cache everything: SHA-256 deduplication prevents redundant embedding costs

  • Incremental is better: Delta-based sync scales to large memory stores

  • Automate memory management: Pre-compaction flush prevents context loss

For developers building AI agents, OpenClaw's memory system offers a production-ready blueprint that balances performance, cost, and developer experience.

References


This analysis is based on OpenClaw commit f99e3ddarrow-up-right (January 2026).

Last updated