2026-01-29•0Guhn

Building a Claude Code Multi-Agent System with TEI + Qdrant

#MCP#TEI#RAG#Qdrant#Claude Code

TL;DR

I vectorized code with TEI (Text Embeddings Inference), stored it in Qdrant, and built a system where 9 Claude Code agents search code instead of reading it.

1. Why I Built This

Limitations of Claude Code

Claude Code is powerful, but when it needs to understand the overall codebase, it opens and reads files one by one.

Read src/auth.ts          → 200 lines consumed
Read src/middleware.ts     → 150 lines consumed
Read src/utils/token.ts    → 100 lines consumed
...

There are unavoidable cases where the entire codebase needs to be read, such as code quality reviews or security audits, and the consumption is significant.

I figured that agents handling code reviews or security checks could understand the code without the Read tool, reducing token costs. So I gave it a try.

2. Architecture

Layer	Components
Claude Code	9 agents (Morpheus, Neo, Seraph, ...)
↓ MCP Protocol (stdio)
mcp-code-rag	Rust binary. Indexing/search bridge
↓ HTTP / gRPC
TEI (e5-base)	Text → vector conversion
Qdrant	Vector storage + similarity search

Components

Component	Role	Location
mcp-code-rag	MCP server. Handles indexing/search requests	Local (Rust binary)
TEI (e5-base)	Text → 768-dimensional vector conversion	Docker
Qdrant	Vector storage + cosine similarity search	Docker
Claude Code Agents	Search via MCP tools when exploring code	Local

3. Why This Stack

The goal was not to find the latest embedding model or the optimal combination. The goal was to test whether a RAG pipeline actually works on a GPU-less NAS. So I chose a lightweight, easy-to-self-host stack.

TEI + e5-base

TEI — Official Hugging Face embedding server. One Docker command and it's deployed
e5-base — 768 dimensions. Practical enough at ~50ms/query on CPU
Supports asymmetric search with query: / passage: prefixes, which is advantageous for matching short queries → long code chunks

Why the MCP Server Is Written in Rust

Single binary — Deploy with one executable. Zero runtime dependencies
Memory — ~15MB resident memory. Python servers are 200MB+
Claude Code MCP — stdio-based JSON-RPC. Rust's serde + tokio is ideal
External communication — Async HTTP + gRPC handling

4. Core Flows

4.1 Indexing (Code → Vectors)

Project directory
    │
    ▼
[walkdir] File traversal (max depth: 10)
    │
    ▼
[filter] 52 directories + 60 extensions excluded
         (node_modules, target, .git, images, binaries...)
    │
    ▼
[incremental check] Compare file modification times
         → Process only changed files
    │
    ▼
[chunk] Split into 1500-char units (300-char overlap)
    │
    ▼
[TEI] Each chunk → "passage: {content}" → 768-dim vector
    │
    ▼
[Qdrant] UUID v5 based upsert (batches of 100)
         Collection: "code-rag-{project_name}"

Why 1500-char chunking?

Too large → search precision drops (noise)
Too small → a single function splits across multiple chunks, losing context
1500 chars + 300-char overlap → prevents splitting at function boundaries

4.2 Search Example — Code Review Agent

[code_review_agent] Start security audit
    │
    ▼
search_codebase("authentication logic", project_name="my-app")
    │
    ▼
Search results received (Top 10, similarity ≥ 0.5 only):
    │
    ▼
search_codebase("SQL query generation", project_name="my-app")
    │
    ▼
search_codebase("user input validation", project_name="my-app")
    │
    ▼
Generate security report from collected code snippets

Result: 0 Reads, 3 searches to complete the report. It analyzes security vulnerabilities across the codebase without opening a single file.

4.3 Auto-Indexing

Agent: search_codebase("error handling", project_name="blog")
    │
    ▼
[Collection check] Does "my-app" exist?
    │
    ├─ No → Trigger auto-indexing → Search after indexing completes
    │
    └─ Yes → Search immediately

Agents don't need to call indexing directly. The index is created automatically on the first search.

5. Real-World Numbers

Metric	Value
TEI embedding speed	~50ms/query (CPU)
Indexing (1000 files)	~3 min (incremental: changed files only)
Search response	~200ms (cache miss), ~5ms (cache hit)
mcp-code-rag memory	~15MB
Vector dimensions	768 (e5-base)
Cache TTL	5 min
Max indexable files	5,000
Chunk size	1,500 chars (300-char overlap)

6. MCP Tools

mcp-code-rag provides 2 MCP tools to Claude Code.

Tool	Description
search_codebase	Search code with natural language queries. Pass a project name to auto-switch collections + auto-index if not yet indexed
refresh_index	Manually index by specifying a project path. Incremental — only processes changed files

By enforcing a rule that agents call search_codebase first when exploring code, you can prevent the token waste of traversing files with the Read tool.

Strengths

Index once, search instantly thereafter — Indexing is slow only the first time; subsequent runs are incremental and process only changed files. Search itself takes ~200ms
Zero token consumption — No files are opened with Read, so the context window isn't spent on searches
Code exploration via natural language — Find relevant code immediately with queries like "authentication handling" or "SQL queries"
Strong for codebase-wide tasks — Well-suited for broad pattern analysis like security audits and code reviews

Limitations

Precise semantic search like "Is error handling properly implemented?" is insufficient with vector similarity alone. For such cases, you need a Rerank model, or a hybrid approach combining keyword search (BM25) with vector search.

Use Case	Current System	Needs Addition
Understanding overall code structure	Suitable	—
Security pattern scanning	Suitable	—
Precise logic search	Insufficient	Rerank, hybrid search
Error handling tracing	Insufficient	Rerank + AST analysis

7. Getting Started

Prerequisites

Docker environment (for running Qdrant, TEI)
Rust (for building mcp-code-rag)
Claude Code (Pro plan)

Once everything is installed and the MCP server is registered with Claude Code, it fetches only the code you need via semantic search — with zero token waste.

Log

• 2026-01-29: create

Claude Code Related Posts

2026-02-01•Claude Code

When Claude Code CLI Uses 100% CPU — Causes and Solutions

Analysis of causes and immediate solutions when the Claude Code CLI process consumes 100% CPU and won't terminate.

#Claude Code#CPU#Troubleshooting#CLI#macOS#kill

Back to List