Архитектурная схема: разработчик строит MCP Server как мост между AI-клиентом и внешними tools

Разработка MCP Server с нуля: полное руководство разработчика

LLM не может напрямую дернуть ваш PostgreSQL, прочитать /etc/app.conf или вызвать internal REST — не из-за «глупости модели», а из-за отсутствия стандартизированного tool channel. Model Context Protocol (MCP) — open JSON-RPC протокол Anthropic: Claude, Cursor, GPT подключаются к внешним capability через единый wire format. Здесь — полный pipeline: Tools, Resources, Prompts, HTTP transport, profiling latency и production на Apple Silicon с Unified Memory.

1. Три bottleneck'а: зачем AI нужен MCP Server

Фиксируем проблемы до первой строки кода:

  1. Tool silos: Function Calling (OpenAI proprietary), Plugins (walled garden), LangChain Tools (framework lock-in) — при смене provider переписываете integration layer. Классический N×M.
  2. Data unreachable: training cutoff, no live config stream, no internal docs in context — без controlled data plane.
  3. Actions not executable: pure chat не шлёт HTTP, не пишет файлы, не выполняет SQL — нужен standardized tool surface.

Если уже читали MCP как HTTP эры AI — здесь сразу implementation. ЦА: backend/AI engineers с Python или TypeScript.

2. Что такое MCP: wire protocol и architecture

2.1 Evolution stack

Function CallingPluginsMCP (Nov 2024, Anthropic, open spec). One server implementation — multiple clients (Cursor, Claude Desktop, VS Code Copilot, Gemini CLI). AAIF governance, 10 000+ registered servers в 2026.

2.2 Topology: Client ↔ Server ↔ triad

┌────────────────────┐         ┌─────────────────────┐
│   MCP Client       │ ◄─────► │   MCP Server        │
│  (Claude / Cursor) │  JSON   │  (ваш код)          │
│                    │  -RPC   │                     │
└────────────────────┘         └─────────────────────┘
                                        │
                          ┌─────────────┼─────────────┐
                          ▼             ▼             ▼
                       Tools       Resources      Prompts
                    (mutations)  (read-only)   (templates)
  • Client: model-side runtime, orchestrates tools/call.
  • Server: exposes capability surface.
  • Tools: side-effect functions — search, calc, DB write path.
  • Resources: URI-addressable read streams — zero mutation guarantee.
  • Prompts: injectable multi-turn templates.

2.3 Transport layer

JSON-RPC 2.0 over:

  • stdio: subprocess pipe, typical round-trip <1 ms on same machine — zero TCP overhead.
  • HTTP + SSE / Streamable HTTP: remote, concurrent clients, measurable RTT 10–200 ms depending on region.

Lifecycle: initialize → capability negotiation (tools/list, resources/list) → tools/call hot path → shutdown.

2.4 Decision matrix: MCP vs alternatives

Dimension MCP OpenAI FC LangChain Tools
Wire standard Open JSON-RPC Vendor-specific Framework-bound
Transport stdio / HTTP HTTP only HTTP
Cross-model Claude, GPT, Gemini OpenAI only Partial
Resources/Prompts First-class N/A N/A
Self-host perf control Full (CPU/RAM tuning) Cloud-bound Variable

3. Dev environment и project layout

3.1 Runtime choice

  • Python: mcp + FastMCP — минимальный boilerplate, asyncio-native.
  • TypeScript: @modelcontextprotocol/sdk — если stack уже на Node/Bun.

3.2 Bootstrap

python -m venv .venv && source .venv/bin/activate
pip install mcp httpx pydantic

npm init -y && npm install @modelcontextprotocol/sdk

3.3 Layout

my-mcp-server/
├── server.py
├── tools/
├── resources/
├── prompts/
├── tests/
└── pyproject.toml

3.4 Debug toolchain

  • MCP Inspector: interactive tools/call replay.
  • Claude Desktop: claude_desktop_config.json.
  • Cursor: .cursor/mcp.json.

4. Первый MCP Server: Hello World

4.1 Minimal server

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("my-first-server")

@mcp.tool()
def say_hello(name: str) -> str:
    """Greet by name"""
    return f"Hello, {name}! First MCP tool invocation."

if __name__ == "__main__":
    mcp.run()

4.2 Verify

python server.py
npx @modelcontextprotocol/inspector python server.py

Inspector: tools/listsay_hello; tools/call с {"name": "dev"} — baseline latency обычно <5 ms stdio.

4.3 Client wiring

{
  "mcpServers": {
    "my-first-server": {
      "command": "python",
      "args": ["/absolute/path/to/server.py"]
    }
  }
}

Absolute paths only — relative paths = #1 cause flaky stdio reconnects.

5. Tools: functions на hot path tools/call

5.1 Schema generation

Signature + docstring → JSON Schema auto. Naming: snake_case, verb-first (search_web, read_file).

5.2 Pydantic input

from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    query: str = Field(description="Search query")
    max_results: int = Field(default=5)
    language: str = Field(default="ru")

@mcp.tool()
def web_search(input: SearchInput) -> list[dict]:
    ...

5.3 Five production-grade tools

  1. Calculator: safe eval subset, no arbitrary code exec.
  2. File I/O: chroot whitelist, block path traversal.
  3. HTTP: httpx, timeout 30 s, connection pool reuse.
  4. DB: read-only SELECT, parameterized — zero DDL.
  5. Time: UTC datetime.now(timezone.utc).

5.4 Async tools (I/O-bound)

import httpx

@mcp.tool()
async def fetch_url(url: str) -> str:
    async with httpx.AsyncClient(timeout=30.0, limits=httpx.Limits(max_connections=20)) as client:
        r = await client.get(url)
        r.raise_for_status()
        return r.text

Async path критичен при concurrent tools/call — sync блокирует event loop, P99 взлетает.

5.5 Error contract

  • Structured JSON errors, not stack traces to client.
  • Hard timeout на external I/O (≤30 s).
  • Least-privilege file/DB access.

6. Resources: read-only data plane

6.1 Semantics

Resource = data provider, Tool = action executor. resources/read — idempotent, no side effects.

6.2 Static + dynamic

import json

@mcp.resource("config://app-settings")
def get_app_settings() -> str:
    return json.dumps({"version": "1.0", "env": "production"})

@mcp.resource("user://{user_id}/profile")
def get_user_profile(user_id: str) -> str:
    return json.dumps(db.query_user(user_id))

6.3 Filesystem resource server

List dir → read file → optional watchfiles for resource update notifications. Root whitelist mandatory.

7. Prompts: reusable injection templates

from mcp.types import PromptMessage, TextContent

@mcp.prompt()
def code_review_prompt(language: str, code: str) -> list[PromptMessage]:
    return [
        PromptMessage(
            role="user",
            content=TextContent(
                type="text",
                text=f"""Code review {language}:
1. Correctness
2. Security surface
3. Hot path performance

```{language}
{code}
```"""
            )
        )
    ]

8. HTTP transport: remote MCP Server

8.1 stdio vs HTTP+SSE

Metric stdio HTTP + SSE
Deploy Local subprocess Remote host
Latency <1 ms RTT 10–200 ms (network)
Concurrent clients 1 N (connection pool)
Throughput ceiling Single pipe Horizontal scale + load balancer
Use case Local dev 7×24 production, team shared

8.2 HTTP implementation

mcp = FastMCP("remote-server", transport="streamable-http")
if __name__ == "__main__":
    mcp.run(host="0.0.0.0", port=8000)

8.3 Auth hardening

  • Bearer Token middleware.
  • IP allowlist для internal clients.
  • Rate limit ~10 tools/call/s per API key.
  • CORS strict origin.

9. Debug, tests, failure modes

9.1 Inspector workflow

  1. npx @modelcontextprotocol/inspector python server.py
  2. Inspect tools/resources/prompts lists.
  3. Manual tools/call, validate JSON wire.
  4. Chaos: timeout, bad params.

9.2 Unit test

@pytest.mark.asyncio
async def test_calculator_tool():
    params = StdioServerParameters(command="python", args=["server.py"])
    async with stdio_client(params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            result = await session.call_tool("calculate", {"expression": "2 + 2"})
            assert result.content[0].text == "4"

9.3 Failure matrix

Symptom Root cause Fix
Tool missing in client Bad path, no restart Absolute path, restart Cursor
JSON serialize fail datetime in response Serialize to str/dict
Timeout disconnect Sync blocking I/O async + split long jobs
Permission denied Path outside whitelist Configure root dir

10. Production: Docker, hosting, observability

10.1 Dockerfile

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "server.py"]

10.2 Hosting options 2026

  • Railway/Render: $5–20/mo, cold start penalty.
  • Cloud Run/Lambda: per-invocation billing, latency spikes on cold.
  • Remote Mac Apple Silicon: M-series Unified Memory 16–64 GB — embedding + ChromaDB + MCP server без swap thrashing; launchd keepalive; Neural Engine offload для local embed models.

10.3 Observability stack

  • Structured log per tools/call: tool_name, duration_ms, status.
  • Prometheus: call_rate, P50/P99 latency, error_ratio.
  • GET /health + protocol version header.

10.4 Performance note: Apple Silicon

Vector search на M4 с 24 GB Unified Memory: ChromaDB in-process query ~15–40 ms на 100K chunks (benchmark зависит от embedding dim). На Intel laptop с 16 GB RAM + swap — P99 может уходить в секунды. Metal-accelerated frameworks (Core ML embed) на Apple Silicon дают 2–4× throughput vs CPU-only x86 при batch embed.

11. Case study: personal knowledge base MCP Server

11.1 Requirements

  • Semantic search по Markdown notes.
  • Create/update notes (whitelist path).
  • Token budget: не тащить весь corpus в context.

11.2 Stack

  • ChromaDB / Qdrant embedded.
  • Embeddings: text-embedding-3-small или local nomic-embed-text на Neural Engine.
  • watchfiles → incremental reindex.

11.3 Core tools

  1. index_notes: scan, chunk, embed, upsert.
  2. search_notes: Top-K cosine similarity + source path.
  3. write_note: atomic write в whitelist.
  4. notes://{path} resource: full doc read.

Token savings vs full corpus injection: 90%+. Hot path search_notes на Apple Silicon remote Mac — типично <200 ms end-to-end при warm index.

12. MCP ecosystem и roadmap 2026

12.1 Reference servers

  • mcp-server-filesystem, github, brave-search, postgres, slack.

Spec: modelcontextprotocol.io; Python SDK: github.com/modelcontextprotocol/python-sdk.

12.2 2026 trends

  • Native MCP во всех major AI IDEs.
  • AAIF certification, OAuth 2.1, granular tool ACL.
  • Audit logs для enterprise compliance.

12.3 Next steps

13. Итог: от laptop stdio к production node

Покрыли полный stack: protocol → env → Hello World → Tools/Resources/Prompts → HTTP → Inspector → Docker → knowledge base case study. MCP — de facto standard для AI tooling в 2026.

Лимиты local stdio очевидны: lid close = disconnect; embedding model + vector index жрут RAM; несколько MCP servers конкурируют за CPU — P99 tools/call деградирует. Production setup (vector RAG, HTTP remote, long Agent sessions) на always-on Apple Silicon node даёт predictable latency: Unified Memory для in-process vector DB, launchd для process supervision, тот же toolchain что Cursor/Claude Desktop.

SFTPMAC remote Mac rental — 7×24 Apple Silicon host для MCP Server и Agent pipelines: low-latency HTTP+SSE, native Python/Node, SFTP/rsync для sync notes/config. Лучше чем «домашний Mac как prod server». Какой tool напишете первым? Hello World — сегодня.