Deep Dive: How OpenClaw's Memory System Works

A comprehensive look at OpenClaw's file-first memory system, exploring its hybrid search architecture, automatic memory flush, and implementation details.

Introduction

OpenClaw is an open-source AI agent framework that stands out for its sophisticated memory system. Unlike traditional RAG (Retrieval-Augmented Generation) systems that rely on vector databases, OpenClaw takes a file-first approach: Markdown files are the source of truth, and the memory system is designed to help AI agents remember context across conversations.

In this deep dive, we'll explore how OpenClaw's memory system works under the hood, examining its architecture, implementation details, and unique innovations that make it production-ready.

Architecture Overview

OpenClaw implements a file-based, Markdown-driven memory system with semantic search capabilities. The core philosophy is simple yet powerful: files are the source of truth — the AI agent only retains what gets written to disk.

Key Components

Markdown Storage Layer: Plain text files in the workspace directory
Vector Search Engine: SQLite-based with hybrid (BM25 + vector) retrieval
Embedding Providers: Auto-selection between local/OpenAI/Gemini
Automatic Memory Flush: Pre-compaction trigger to persist context

Memory Types & Storage Structure

OpenClaw uses a two-tier memory design to balance short-term context with long-term knowledge:

1. Ephemeral Memory (Daily Logs)

Location: memory/YYYY-MM-DD.md

Daily logs are append-only files that capture day-to-day activities, decisions, and context. The system automatically:

Creates a new file each day
Loads today's and yesterday's logs at session start
Provides a running context window for recent work

Evidence from memory.md:

"Daily log (append-only). Read today + yesterday at session start."

2. Durable Memory (Curated Knowledge)

Location: MEMORY.md

This is the curated long-term memory file containing:

Important decisions and preferences
Project conventions and patterns
Long-term todos and goals
Critical facts that should persist

Important: MEMORY.md is only loaded in private sessions, never in group contexts, to protect sensitive information.

3. Session Memory

Location: sessions/YYYY-MM-DD-<slug>.md

When starting a new session, OpenClaw can automatically save the previous conversation to a timestamped file with a descriptive slug (generated by LLM). These session transcripts are indexed and searchable, allowing agents to recall past conversations.

Core Implementation: MemoryIndexManager

The central class managing all memory operations is MemoryIndexManager (manager.ts:119-232).

Key responsibilities:

export class MemoryIndexManager {
  private readonly cacheKey: string;
  private readonly cfg: OpenClawConfig;
  private readonly agentId: string;
  private readonly workspaceDir: string;
  private readonly settings: ResolvedMemorySearchConfig;
  private provider: EmbeddingProvider;
  private db: DatabaseSync;
  private readonly sources: Set<MemorySource>;
  private watcher: FSWatcher | null = null;
  private dirty = false;
  private sessionsDirty = false;

Features:

Singleton pattern with caching: Prevents duplicate indexes (INDEX_CACHE)
Per-agent isolation: Separate SQLite stores via agentId
File watching: Debounced sync on file changes
Provider fallback chain: Graceful degradation across embedding providers
Session integration: Tracks and indexes conversation transcripts

Markdown Chunking Algorithm

One of the critical aspects of any memory system is how content is chunked before embedding. OpenClaw uses a sophisticated sliding window algorithm with overlap preservation.

Algorithm Details

Source: internal.ts:144-215

export function chunkMarkdown(
  content: string,
  chunking: { tokens: number; overlap: number },
): MemoryChunk[] {
  const lines = content.split("\n");
  const maxChars = Math.max(32, chunking.tokens * 4);
  const overlapChars = Math.max(0, chunking.overlap * 4);

  // Sliding window with overlap preservation
  const carryOverlap = () => {
    if (overlapChars <= 0 || current.length === 0) {
      current = [];
      currentChars = 0;
      return;
    }
    let acc = 0;
    const kept: Array<{ line: string; lineNo: number }> = [];
    for (let i = current.length - 1; i >= 0; i -= 1) {
      const entry = current[i];
      if (!entry) continue;
      acc += entry.line.length + 1;
      kept.unshift(entry);
      if (acc >= overlapChars) break;
    }
    current = kept;
    currentChars = kept.reduce((sum, entry) => sum + entry.line.length + 1, 0);
  };

Characteristics:

Target: ~400 tokens per chunk (~1600 chars approximation)
Overlap: 80 tokens (~320 chars) between consecutive chunks
Line-aware: Preserves line boundaries with line numbers
Hash-based deduplication: Each chunk gets SHA-256 hash for cache lookup

Why This Approach?

Overlap prevents context loss: Related information at chunk boundaries stays connected
Line numbers: Enable precise source attribution (path + line range)
Token approximation: 4 chars ≈ 1 token is reasonable for English text
Hash stability: Same content → same hash → cache hit → no re-embedding

Hybrid Search: BM25 + Vector

OpenClaw doesn't rely solely on vector similarity. Instead, it uses weighted score fusion combining two complementary retrieval methods:

1. Vector Search (Semantic Similarity)

Great for conceptual matches:

"gateway host" ≈ "machine running gateway"
"authentication flow" ≈ "login process"

Uses cosine similarity with embeddings stored in SQLite via sqlite-vec extension.

2. BM25 Search (Lexical Matching)

Excellent for exact tokens:

Error codes: ERR_CONNECTION_REFUSED
Function names: handleUserAuth()
IDs and unique identifiers

Uses SQLite's FTS5 (Full-Text Search) virtual tables.

Hybrid Merge Algorithm

Source: hybrid.ts:39-111

export function mergeHybridResults(params: {
  vector: HybridVectorResult[];
  keyword: HybridKeywordResult[];
  vectorWeight: number;
  textWeight: number;
}): Array<...> {
  // Union by chunk ID
  const byId = new Map<string, {...}>();

  for (const r of params.vector) {
    byId.set(r.id, { ...r, vectorScore: r.vectorScore, textScore: 0 });
  }

  for (const r of params.keyword) {
    const existing = byId.get(r.id);
    if (existing) {
      existing.textScore = r.textScore;
    }
  }

  // Weighted scoring
  const merged = Array.from(byId.values()).map((entry) => {
    const score = params.vectorWeight * entry.vectorScore
                + params.textWeight * entry.textScore;
    return { ...entry, score };
  });

  return merged.sort((a, b) => b.score - a.score);
}

Default weights: 70% vector + 30% text

BM25 score normalization (hybrid.ts:34-37):

export function bm25RankToScore(rank: number): number {
  const normalized = Number.isFinite(rank) ? Math.max(0, rank) : 999;
  return 1 / (1 + normalized);
}

This converts BM25 rank (lower is better) to a score in [0, 1] range for fusion.

Embedding Provider System

OpenClaw supports three embedding providers with intelligent auto-selection:

Auto-Selection Chain

Source: embeddings.ts:135-167

if (requestedProvider === "auto") {
  const missingKeyErrors: string[] = [];
  let localError: string | null = null;

  // Priority 1: Local if model file exists
  if (canAutoSelectLocal(options)) {
    try {
      const local = await createProvider("local");
      return { ...local, requestedProvider };
    } catch (err) {
      localError = formatLocalSetupError(err);
    }
  }

  // Priority 2 & 3: OpenAI, then Gemini
  for (const provider of ["openai", "gemini"] as const) {
    try {
      const result = await createProvider(provider);
      return { ...result, requestedProvider };
    } catch (err) {
      if (isMissingApiKeyError(err)) {
        missingKeyErrors.push(message);
        continue;
      }
      throw new Error(message);
    }
  }
}

Provider Implementations

1. Local Provider

Source: embeddings.ts:65-111

Uses node-llama-cpp for local inference
Default model: hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf (~600MB)
Auto-downloads missing models
Requires: pnpm approve-builds (native compilation)

Pros: Privacy, no API costs, offline operation Cons: Requires ~1GB disk space, slower than cloud APIs

2. OpenAI Provider

Source: embeddings-openai.ts

Default model: text-embedding-3-small (1536 dimensions)
Supports Batch API for bulk indexing (50% cost reduction)
Fast and reliable

3. Gemini Provider

Source: embeddings-gemini.ts

Default model: gemini-embedding-001 (768 dimensions)
Async batch endpoint support
Free tier available

Batch Embedding Optimization

For large memory files, embedding every chunk individually would be expensive and slow. OpenClaw implements batch processing with caching to optimize this.

Cache-First Strategy

Source: manager.ts:1769-1848

private async embedChunksWithOpenAiBatch(
  chunks: MemoryChunk[],
  entry: MemoryFileEntry | SessionFileEntry,
  source: MemorySource,
): Promise<number[][]> {
  // Check cache first
  const cached = this.loadEmbeddingCache(chunks.map((chunk) => chunk.hash));
  const missing: Array<{ index: number; chunk: MemoryChunk }> = [];

  for (let i = 0; i < chunks.length; i += 1) {
    const hit = chunk?.hash ? cached.get(chunk.hash) : undefined;
    if (hit && hit.length > 0) {
      embeddings[i] = hit;
    } else if (chunk) {
      missing.push({ index: i, chunk });
    }
  }

  // Submit batch job for missing embeddings only
  const batchResult = await this.runBatchWithFallback({
    provider: "openai",
    run: async () => await runOpenAiEmbeddingBatches({...}),
    fallback: async () => await this.embedChunksInBatches(chunks),
  });

Batch Features

SHA-256 hash-based deduplication: Same content → same embedding (cache hit)
OpenAI Batch API: 50% cost reduction compared to sync API
Gemini async batches: Similar cost savings
Failure tolerance: Auto-disable after 2 failures, fallback to sync
Concurrency: Default 2 parallel batch jobs

Cost Savings Example

Indexing 10,000 chunks with text-embedding-3-small:

Sync API: 10,000 × $0.00002 = $0.20
Batch API: 10,000 × $0.00001 = $0.10
With 50% cache hit: 5,000 × $0.00001 = $0.05

SQLite Schema & Vector Storage

OpenClaw uses SQLite as its storage backend with several specialized tables:

Core Tables

Source: memory-schema.ts:9-75

-- Metadata table
CREATE TABLE IF NOT EXISTS meta (
  key TEXT PRIMARY KEY,
  value TEXT
);

-- File tracking
CREATE TABLE IF NOT EXISTS files (
  path TEXT PRIMARY KEY,
  source TEXT NOT NULL DEFAULT 'memory',
  hash TEXT NOT NULL,
  updated_at INTEGER NOT NULL
);

-- Chunks with embeddings
CREATE TABLE IF NOT EXISTS chunks (
  id TEXT PRIMARY KEY,
  path TEXT NOT NULL,
  source TEXT NOT NULL DEFAULT 'memory',
  start_line INTEGER NOT NULL,
  end_line INTEGER NOT NULL,
  hash TEXT NOT NULL,
  model TEXT NOT NULL,
  text TEXT NOT NULL,
  embedding TEXT NOT NULL,
  updated_at INTEGER NOT NULL
);

-- Embedding cache (cross-file deduplication)
CREATE TABLE IF NOT EXISTS embedding_cache (
  provider TEXT NOT NULL,
  model TEXT NOT NULL,
  provider_key TEXT NOT NULL,
  hash TEXT NOT NULL,
  embedding TEXT NOT NULL,
  dims INTEGER,
  updated_at INTEGER NOT NULL,
  PRIMARY KEY (provider, model, provider_key, hash)
);

-- Full-text search index
CREATE VIRTUAL TABLE IF NOT EXISTS chunks_fts USING fts5(
  text,
  id UNINDEXED,
  path UNINDEXED,
  source UNINDEXED,
  model UNINDEXED,
  start_line UNINDEXED,
  end_line UNINDEXED
);

Vector Acceleration

Source: manager.ts:677-689

private ensureVectorTable(dimensions: number): void {
  this.db.exec(
    `CREATE VIRTUAL TABLE IF NOT EXISTS ${VECTOR_TABLE} USING vec0(\n` +
    `  id TEXT PRIMARY KEY,\n` +
    `  embedding FLOAT[${dimensions}]\n` +
    `)`
  );
  this.vector.dims = dimensions;
}

OpenClaw uses the sqlite-vec extension for in-database vector similarity queries:

Stores embeddings as FLOAT[] in virtual table
Performs cosine similarity search entirely in SQL
Falls back to JavaScript implementation if extension unavailable

Schema Design Benefits

Embedding cache: Prevents re-embedding identical content across files
FTS5: Fast lexical search without external dependencies
Virtual tables: Efficient vector operations without loading all data into memory
Delta tracking: File hash comparison for incremental updates

Automatic Memory Flush

One of OpenClaw's most innovative features is automatic memory flush before context compaction.

The Problem

Long conversations eventually hit the context window limit. When this happens, the system must "compact" (summarize or truncate) older messages. Without intervention, valuable context gets lost.

The Solution

Source: memory.md:37-74

When a session is close to auto-compaction, OpenClaw triggers a silent, agentic turn that reminds the model to write durable memory before the context is compacted.

Configuration:

{
  "agents": {
    "defaults": {
      "compaction": {
        "reserveTokensFloor": 20000,
        "memoryFlush": {
          "enabled": true,
          "softThresholdTokens": 4000,
          "systemPrompt": "Session nearing compaction. Store durable memories now.",
          "prompt": "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
        }
      }
    }
  }
}

Trigger Logic

Memory flush activates when:

currentTokens >= contextWindow - reserveTokensFloor - softThresholdTokens

For a 200K context window:

currentTokens >= 200000 - 20000 - 4000 = 176000 tokens

Behavior

Usually silent (NO_REPLY response) if nothing important to save
One flush per compaction cycle to avoid spam
Skipped in read-only sandbox mode (no file write access)
Gives the agent a final chance to extract insights before truncation

Session Memory Integration

OpenClaw can automatically save and index past conversations, making them searchable in future sessions.

Session Save Handler

Source: session-memory handler.ts:64-183

const saveSessionToMemory: HookHandler = async (event) => {
  if (event.type !== "command" || event.action !== "new") {
    return;
  }

  // Read recent messages from session JSONL file
  const sessionContent = await getRecentSessionContent(sessionFile, messageCount);

  // Generate descriptive slug via LLM
  slug = await generateSlugViaLLM({ sessionContent, cfg });

  // Create filename: YYYY-MM-DD-<slug>.md
  const filename = `${dateStr}-${slug}.md`;
  const memoryFilePath = path.join(memoryDir, filename);

  // Write conversation summary to memory
  await fs.writeFile(memoryFilePath, entry, "utf-8");
}

Session Indexing

Source: manager.ts:1101-1197

private async syncSessionFiles(params: {
  needsFullReindex: boolean;
  progress?: MemorySyncProgressState;
}) {
  const files = await this.listSessionFiles();
  const indexAll = params.needsFullReindex || this.sessionsDirtyFiles.size === 0;

  // Index session JSONL files
  for (const absPath of files) {
    const entry = await this.buildSessionEntry(absPath);
    await this.indexFile(entry, { source: "sessions", content: entry.content });
    this.resetSessionDelta(absPath, entry.size);
  }
}

Features

JSONL parsing: Extracts user/assistant messages from session transcripts
Delta-based incremental indexing: Only processes new messages
Debounced background sync: Default thresholds:
- 100KB of new data, OR
- 50 new messages
LLM-generated slugs: Descriptive filenames like 2026-01-30-memory-system-research.md

Why Session Indexing Matters

Imagine working on a project for weeks. You might ask:

"When did we decide to use TypeScript?"
"What was that bug we fixed in the auth flow?"
"What approach did we try for caching last week?"

With session indexing, the agent can search past conversations and recall decisions made weeks ago.

Memory Search Tools

The memory system exposes two tools to agents:

1. memory_search

Source: memory-tool.ts:22-69

{
  label: "Memory Search",
  name: "memory_search",
  description:
    "Mandatory recall step: semantically search MEMORY.md + memory/*.md (and optional session transcripts) before answering questions about prior work, decisions, dates, people, preferences, or todos; returns top snippets with path + lines.",
  parameters: MemorySearchSchema,
  execute: async (_toolCallId, params) => {
    const results = await manager.search(query, {
      maxResults,
      minScore,
      sessionKey: options.agentSessionKey,
    });
    return jsonResult({
      results,
      provider: status.provider,
      model: status.model,
      fallback: status.fallback,
    });
  },
}

Returns: Snippets (~700 chars) with:

File path
Line range (start_line, end_line)
Relevance score
Snippet text

Use cases:

"What did I decide about the API design?"
"When did we last discuss authentication?"
"What are my current todos?"

2. memory_get

Reads specific memory files with optional line range filtering.

Use cases:

Reading full MEMORY.md for comprehensive context
Fetching a specific daily log
Retrieving exact lines after search narrows down location

Tool Availability

Both tools are only enabled when memorySearch.enabled resolves to true in the agent configuration.

Key Innovations

OpenClaw's memory system introduces several novel design decisions:

1. File-First Philosophy

No database as source of truth — just Markdown files. This means:

Human-readable, version-controllable memory
Easy backup and migration
Debuggable with standard text tools
No vendor lock-in

2. Hybrid Retrieval

Combining BM25 + vector search gives balanced precision/recall:

Vector search catches semantic matches
BM25 catches exact terms and rare tokens
Weighted fusion prevents either from dominating

3. Provider Auto-Selection

Local → OpenAI → Gemini fallback chain with:

Graceful degradation
User transparency (tool results show provider used)
No manual configuration required for most users

4. Batch Optimization

Uses discounted Batch APIs for:

50% cost reduction on bulk indexing
Better resource utilization
Automatic fallback to sync on failure

5. Cache-First Embedding

SHA-256 hash deduplication prevents re-embedding:

Same paragraph across files → embed once
Session replay with same messages → cache hit
Significant cost savings on repeated content

6. Delta-Based Sync

Incremental session indexing with:

Byte/message thresholds
Debounced background sync
No full reindex on every message

7. Pre-Compaction Flush

Automatic context → memory transfer before truncation:

Prevents context loss
No manual intervention required
Silent when nothing important to save

8. Per-Agent Isolation

Separate SQLite stores per agent ID:

Multi-agent workflows don't cross-contaminate
Each agent has its own memory namespace
Supports different embedding models per agent

Performance Characteristics

Constants from manager.ts:92-110:

const SNIPPET_MAX_CHARS = 700;              // Search result snippet size
const SESSION_DIRTY_DEBOUNCE_MS = 5000;     // Wait 5s before sync
const EMBEDDING_BATCH_MAX_TOKENS = 8000;    // Max tokens per batch
const EMBEDDING_INDEX_CONCURRENCY = 4;      // Parallel embedding requests
const EMBEDDING_RETRY_MAX_ATTEMPTS = 3;     // Retry failed embeddings
const SESSION_DELTA_READ_CHUNK_BYTES = 64 * 1024;  // 64KB read chunks
const VECTOR_LOAD_TIMEOUT_MS = 30_000;      // 30s to load sqlite-vec
const EMBEDDING_QUERY_TIMEOUT_REMOTE_MS = 60_000;  // 1min for remote APIs
const EMBEDDING_QUERY_TIMEOUT_LOCAL_MS = 5 * 60_000;  // 5min for local

Typical performance:

Local embedding: ~50 tokens/sec (node-llama-cpp on M1 Mac)
OpenAI embedding: ~1000 tokens/sec (with batching)
Search latency: <100ms for 10K chunks (hybrid search)
Index size: ~5KB per 1K tokens (with 1536-dim embeddings)

Comparison with Traditional RAG

How does OpenClaw's approach differ from typical RAG systems?

Aspect

Traditional RAG

OpenClaw Memory

Source of truth

Vector database

Markdown files

Search method

Vector only

Hybrid (BM25 + vector)

Storage

Pinecone/Weaviate/Chroma

SQLite

Embedding

Always remote API

Local-first with fallback

Chunking

Fixed-size

Line-aware with overlap

Caching

Usually none

SHA-256 hash-based

Updates

Full reindex

Delta-based incremental

Context preservation

Manual

Automatic pre-compaction flush

Human-readable

Yes (plain Markdown)

Cost optimization

Limited

Batch API + caching

Use Cases

OpenClaw's memory system shines in scenarios where:

Long-running projects: Work on a codebase for weeks/months with persistent context
Personal AI assistants: Remember preferences, habits, and long-term goals
Research workflows: Accumulate knowledge over time, build on past insights
Multi-agent systems: Each agent maintains its own memory space
Offline-first applications: Local embedding provider works without internet

Limitations & Trade-offs

No system is perfect. OpenClaw's memory system has trade-offs:

Storage Growth

Daily logs and session transcripts accumulate over time. A year of daily use could generate:

~365 daily logs
~1000 session files
~500MB SQLite index

Mitigation: Manual archiving, or implement retention policies.

Embedding Drift

Different providers use different embedding models (1536-dim vs 768-dim). Switching providers requires reindexing.

Mitigation: The system tracks embedding model per chunk and handles gracefully.

FTS5 vs Modern Search

SQLite FTS5 is solid but lacks features like:

Fuzzy matching
Typo tolerance
Advanced ranking signals

Why it's acceptable: Most queries are semantic (vector) anyway, and BM25 handles exact matches well.

No Cross-File Context

Each chunk is embedded independently. A concept spanning multiple files might not be connected.

Mitigation: Use section headers and explicit cross-references in Markdown.

Future Directions

Based on the codebase, potential enhancements could include:

Graph-based memory: Link related memories explicitly
Importance scoring: Prioritize frequently-accessed memories
Automatic summarization: Compress old daily logs periodically
Multi-modal embeddings: Index images, code, diagrams
Federated memory: Share curated memories across agents/teams
Retention policies: Auto-archive old sessions

Conclusion

OpenClaw's memory system represents a thoughtful evolution of RAG architecture. By prioritizing file-first storage, hybrid search, and automatic context preservation, it addresses real pain points in long-running AI agent workflows.

Key takeaways:

Files are the source of truth: Human-readable, version-controllable memory
Hybrid retrieval works: BM25 + vector gives better results than either alone
Cache everything: SHA-256 deduplication prevents redundant embedding costs
Incremental is better: Delta-based sync scales to large memory stores
Automate memory management: Pre-compaction flush prevents context loss

For developers building AI agents, OpenClaw's memory system offers a production-ready blueprint that balances performance, cost, and developer experience.

References

This analysis is based on OpenClaw commit f99e3dd (January 2026).

PreviousLetta's Context Repositories: Git-based Memory for Coding Agents NextPagedAttention: How Virtual Memory Revolutionized LLM Inference

Last updated 1 month ago

hashtagIntroduction

hashtagArchitecture Overview

hashtagKey Components

hashtagMemory Types & Storage Structure

hashtag1. Ephemeral Memory (Daily Logs)

hashtag2. Durable Memory (Curated Knowledge)

hashtag3. Session Memory

hashtagCore Implementation: MemoryIndexManager

hashtagMarkdown Chunking Algorithm

hashtagAlgorithm Details

hashtagWhy This Approach?

hashtagHybrid Search: BM25 + Vector

hashtag1. Vector Search (Semantic Similarity)

hashtag2. BM25 Search (Lexical Matching)

hashtagHybrid Merge Algorithm

hashtagEmbedding Provider System

hashtagAuto-Selection Chain

hashtagProvider Implementations

hashtag1. Local Provider

hashtag2. OpenAI Provider

hashtag3. Gemini Provider

hashtagBatch Embedding Optimization

hashtagCache-First Strategy

hashtagBatch Features

hashtagCost Savings Example

hashtagSQLite Schema & Vector Storage

hashtagCore Tables

hashtagVector Acceleration

hashtagSchema Design Benefits

hashtagAutomatic Memory Flush

hashtagThe Problem

hashtagThe Solution

hashtagTrigger Logic

hashtagBehavior

hashtagSession Memory Integration

hashtagSession Save Handler

hashtagSession Indexing

hashtagFeatures

hashtagWhy Session Indexing Matters

hashtagMemory Search Tools

hashtag1. memory_search

hashtag2. memory_get

hashtagTool Availability

hashtagKey Innovations

hashtag1. File-First Philosophy

hashtag2. Hybrid Retrieval

hashtag3. Provider Auto-Selection

hashtag4. Batch Optimization

hashtag5. Cache-First Embedding

hashtag6. Delta-Based Sync

hashtag7. Pre-Compaction Flush

hashtag8. Per-Agent Isolation

hashtagPerformance Characteristics

hashtagComparison with Traditional RAG

hashtagUse Cases

hashtagLimitations & Trade-offs

hashtagStorage Growth

hashtagEmbedding Drift

hashtagFTS5 vs Modern Search

hashtagNo Cross-File Context

hashtagFuture Directions

hashtagConclusion

hashtagReferences