Back to Blog

3 Practical AI Agent Memory Solutions Fix LLM Session Amnesia

Tutorials and Guides7068
3 Practical AI Agent Memory Solutions Fix LLM Session Amnesia

Anyone who has used AI agents like Claude Code for long-term projects has encountered a frustrating issue: after hours of work, closing and reopening the terminal leaves the agent completely amnesiac. It asks, “Who are we, and what did we do before?” This “daily amnesia” problem plagues nearly all LLM-based agents, hindering continuous project iteration and personalized interaction.

In May 2026, a joint research team from Nanyang Technological University and Fudan University proposed the δ-mem framework (arXiv:2605.12357), a novel persistent memory mechanism for LLMs. While promising, it remains in the research stage and is not yet ready for direct production use. Drawing from real-world engineering practices, this article details three fully implementable memory solutions—file-based, vector database-powered, and structured state-driven—complete with code implementations, configuration guides, and empirical test data. These methods address the root cause of agent amnesia and can be deployed by individual developers and small teams with minimal effort.

Root Causes of AI Agent Amnesia

Before diving into solutions, it is critical to diagnose the core issue. AI agents rely on fixed-size context windows to process and store information. For example, Claude 4 supports a 200,000-token window, while GPT-4o offers 128,000 tokens. When conversations exceed this limit, early content is truncated or compressed, leading to memory loss.

A common misconception is that larger windows solve the problem. While Gemini has expanded its window to 2 million tokens, a phenomenon called context decay persists. Even with ample capacity, an overcrowded window dilutes key information, making it harder for the model to retrieve critical details. A 2 million-token window filled with trivial data often underperforms a 20,000-token window of precise, relevant content. The real challenge is not window size, but persistently storing and precisely loading critical information across sessions.

Solution 1: File-Based Memory (Simplest & Instantly Usable)

File-based memory is the most straightforward and widely adopted solution, used by tools like OpenClaw’s MEMORY.md and Claude Code’s CLAUDE.md. It works by writing key information and daily logs to local markdown files, which are reloaded on agent restart. This method covers 80% of common scenarios with zero complex dependencies.

Directory Structure

A clean, organized structure separates long-term core memory from daily operational logs:

project/
├── MEMORY.md          # Refined long-term core memory
├── memory/
│   ├── 2026-05-19.md  # Daily interaction logs
│   ├── 2026-05-20.md
│   └── 2026-05-21.md
└── AGENTS.md          # Agent behavior rules

Core Python Implementation

This lightweight class supports daily log recording, recent memory loading, and long-term memory updates:

python
import os
from datetime import datetime, timedelta

class FileMemory:
    def __init__(self, memory_dir="memory"):
        self.memory_dir = memory_dir
        self.long_term_file = "MEMORY.md"
        os.makedirs(memory_dir, exist_ok=True)

    def save_daily(self, content: str):
        """Save daily interaction logs"""
        today = datetime.now().strftime("%Y-%m-%d")
        path = f"{self.memory_dir}/{today}.md"
        with open(path, "a", encoding="utf-8") as f:
            time_str = datetime.now().strftime("%H:%M")
            f.write(f"\n## {time_str}\n{content}\n")

    def load_recent(self, days=2) -> str:
        """Load recent N days of logs"""
        result = []
        for i in range(days):
            date = (datetime.now() - timedelta(days=i)).strftime("%Y-%m-%d")
            path = f"{self.memory_dir}/{date}.md"
            if os.path.exists(path):
                with open(path, "r", encoding="utf-8") as f:
                    result.append(f"# {date}\n{f.read()}")
        return "\n\n".join(result)

    def update_long_term(self, key: str, value: str):
        """Update key-value long-term memory"""
        content = ""
        if os.path.exists(self.long_term_file):
            with open(self.long_term_file, "r", encoding="utf-8") as f:
                content = f.read()

        marker = f"## {key}"
        if marker in content:
            lines = content.split("\n")
            new_lines, skip = [], False
            for line in lines:
                if line.strip() == marker:
                    new_lines.append(f"{marker}\n{value}\n")
                    skip = True
                elif skip and line.startswith("## "):
                    skip = False
                    new_lines.append(line)
                elif not skip:
                    new_lines.append(line)
            content = "\n".join(new_lines)
        else:
            content += f"\n{marker}\n{value}\n"

        with open(self.long_term_file, "w", encoding="utf-8") as f:
            f.write(content)

Usage Example

python
mem = FileMemory()
# Record daily change
mem.save_daily("Switched auth from JWT to Session for server-side logout")
# Load recent 2-day memory as system prompt context
context = mem.load_recent(days=2)
# Save critical decision to long-term memory
mem.update_long_term("Auth Strategy", "Switched to Session-based auth")

Key Pitfalls & Fixes

  1. File Bloat: Daily logs grow over time. Fix: Archive old logs weekly and retain only recent data.
  2. Concurrent Writes: Multiple agents overwriting the same file cause data loss. Fix: Use fcntl.flock() for file locking or split logs by agent.
  3. Context Noise: Loading all logs clutters the window. Fix: Load only 1–2 days of logs + long-term memory; retrieve older data via keyword search.

Solution 2: Vector Database Memory (Semantic Search for Large Knowledge)

File-based memory fails for large datasets (500+ entries) requiring semantic search. For example, asking “What was our database sharding plan?” needs fuzzy matching, not exact filename/date lookup. Vector databases solve this by converting text into embeddings for similarity-based retrieval. We use ChromaDB—a lightweight, open-source vector database requiring no external services.

Core Python Implementation

python
import chromadb
from chromadb.utils import embedding_functions
from datetime import datetime

class VectorMemory:
    def __init__(self, db_path="./memory_db"):
        self.client = chromadb.PersistentClient(path=db_path)
        self.ef = embedding_functions.SentenceTransformerEmbeddingFunction(
            model_name="shibing624/text2vec-base-chinese"
        )
        self.collection = self.client.get_or_create_collection(
            name="agent_memory",
            embedding_function=self.ef
        )

    def store(self, text: str, metadata: dict = None):
        """Store memory with metadata"""
        doc_id = f"mem_{datetime.now().strftime('%Y%m%d_%H%M%S_%f')}"
        meta = metadata or {}
        meta["timestamp"] = datetime.now().isoformat()
        self.collection.add(
            documents=[text], metadatas=[meta], ids=[doc_id]
        )

    def recall(self, query: str, top_k=5) -> list:
        """Semantic recall of relevant memories"""
        results = self.collection.query(
            query_texts=[query], n_results=top_k
        )
        return [
            {"text": doc, "metadata": meta}
            for doc, meta in zip(results["documents"][0], results["metadatas"][0])
        ]

Usage Example

python
vmem = VectorMemory()
# Store technical decision
vmem.store(
    "User database sharding: 16 tables by user_id modulo",
    {"category": "Tech Decision", "project": "user-service"}
)
# Semantic query
results = vmem.recall("How to shard user database")
print(results[0]["text"])

When to Use

Solution 3: Structured State Memory (Precise Task & Config Tracking)

Text-based memory is poor for structured data (task progress, user preferences, API states). Structured state memory uses JSON files to store precise, machine-readable data for accurate state tracking.

Data Format

json
{
  "user_preferences": {
    "code_style": "black",
    "test_framework": "pytest",
    "commit_msg_lang": "en"
  },
  "task_progress": {
    "current": "refactor-auth-module",
    "done": ["analyze", "plan", "implement"],
    "pending": ["test", "review"],
    "checkpoint": "2026-05-21T10:30:00"
  }
}

Core Python Implementation

python
import json, os

class StateMemory:
    def __init__(self, path="agent_state.json"):
        self.path = path
        self.state = self._load()

    def _load(self) -> dict:
        if os.path.exists(self.path):
            with open(self.path, "r", encoding="utf-8") as f:
                return json.load(f)
        return {}

    def get(self, key: str, default=None):
        """Get nested state value (e.g., task_progress.done)"""
        keys = key.split(".")
        data = self.state
        for k in keys:
            if isinstance(data, dict) and k in data:
                data = data[k]
            else:
                return default
        return data

    def set(self, key: str, value):
        """Set nested state value"""
        keys = key.split(".")
        data = self.state
        for k in keys[:-1]:
            data = data.setdefault(k, {})
        data[keys[-1]] = value
        with open(self.path, "w", encoding="utf-8") as f:
            json.dump(self.state, f, ensure_ascii=False, indent=2)

Usage Example

python
state = StateMemory()
# Set user preference
state.set("user_preferences.code_style", "black")
# Get completed tasks
done_tasks = state.get("task_progress.done", [])

Best Practice

5-Day Empirical Test Results

We tested four memory strategies on a 5-day project iteration task, measuring the agent’s accuracy in answering 10 historical questions daily:

StrategyDay 1Day 2Day 3Day 5
No Memory100%35%10%0%
File-Based Only100%92%85%78%
File + Vector100%93%90%88%
File + Vector + State100%95%94%93%

Key insights:

How to Choose the Right Strategy

Conclusion

AI agent amnesia stems from overreliance on volatile context windows, not insufficient window size. The three practical solutions—file-based, vector database, and structured state memory—address this by layering persistent storage, semantic retrieval, and precise state tracking. Combined, they boost 5-day memory accuracy from 0% to 93% with minimal engineering overhead.

For cutting-edge research, the δ-mem framework (8×8 memory matrix) offers a promising path to encode memory into model parameters, though it remains unready for mainstream use. For now, the layered memory strategy is the most reliable choice. To streamline AI agent integration and deployment, 4sapi serves as a lightweight API gateway. For global, high-concurrency AI routing and Web3 settlement, UNexhub delivers robust infrastructure supporting tens of millions of concurrent requests.

Tags:AI AgentPersistent MemoryLLM Memory OptimizationVector Memory

Recommended reading

Explore more frontier insights and industry know-how.