MCP Server на Python или TypeScript?

Python + mcp/FastMCP — fastest path для backend/AI; TypeScript + @modelcontextprotocol/sdk — для Node/full-stack. Оба SDK официальные, wire format идентичен.

stdio или HTTP+SSE?

stdio: latency <1 ms, один клиент, локальная разработка. HTTP+SSE: 10–200 ms RTT, multi-tenant, Bearer auth, production на remote host.

Tools — side-effect actions (search, calc, write). Resources — read-only data providers (config, files), URI-addressable.

Локально или на remote Mac?

Эксперименты — stdio на ноутбуке. Production с vector search, embedding pipeline и P99 latency budget — Apple Silicon remote Mac 7×24 с Unified Memory.

Разработка MCP Server с нуля: полное руководство разработчика

LLM не может напрямую дернуть ваш PostgreSQL, прочитать /etc/app.conf или вызвать internal REST — не из-за «глупости модели», а из-за отсутствия стандартизированного tool channel. Model Context Protocol (MCP) — open JSON-RPC протокол Anthropic: Claude, Cursor, GPT подключаются к внешним capability через единый wire format. Здесь — полный pipeline: Tools, Resources, Prompts, HTTP transport, profiling latency и production на Apple Silicon с Unified Memory.

1. Три bottleneck'а: зачем AI нужен MCP Server

Фиксируем проблемы до первой строки кода:

Tool silos: Function Calling (OpenAI proprietary), Plugins (walled garden), LangChain Tools (framework lock-in) — при смене provider переписываете integration layer. Классический N×M.
Data unreachable: training cutoff, no live config stream, no internal docs in context — без controlled data plane.
Actions not executable: pure chat не шлёт HTTP, не пишет файлы, не выполняет SQL — нужен standardized tool surface.

Если уже читали MCP как HTTP эры AI — здесь сразу implementation. ЦА: backend/AI engineers с Python или TypeScript.

2. Что такое MCP: wire protocol и architecture

2.1 Evolution stack

Function Calling → Plugins → MCP (Nov 2024, Anthropic, open spec). One server implementation — multiple clients (Cursor, Claude Desktop, VS Code Copilot, Gemini CLI). AAIF governance, 10 000+ registered servers в 2026.

2.2 Topology: Client ↔ Server ↔ triad

┌────────────────────┐         ┌─────────────────────┐
│   MCP Client       │ ◄─────► │   MCP Server        │
│  (Claude / Cursor) │  JSON   │  (ваш код)          │
│                    │  -RPC   │                     │
└────────────────────┘         └─────────────────────┘
                                        │
                          ┌─────────────┼─────────────┐
                          ▼             ▼             ▼
                       Tools       Resources      Prompts
                    (mutations)  (read-only)   (templates)

Client: model-side runtime, orchestrates tools/call.
Server: exposes capability surface.
Tools: side-effect functions — search, calc, DB write path.
Resources: URI-addressable read streams — zero mutation guarantee.
Prompts: injectable multi-turn templates.

2.3 Transport layer

JSON-RPC 2.0 over:

stdio: subprocess pipe, typical round-trip <1 ms on same machine — zero TCP overhead.
HTTP + SSE / Streamable HTTP: remote, concurrent clients, measurable RTT 10–200 ms depending on region.

Lifecycle: initialize → capability negotiation (tools/list, resources/list) → tools/call hot path → shutdown.

2.4 Decision matrix: MCP vs alternatives

Dimension	MCP	OpenAI FC	LangChain Tools
Wire standard	Open JSON-RPC	Vendor-specific	Framework-bound
Transport	stdio / HTTP	HTTP only	HTTP
Cross-model	Claude, GPT, Gemini	OpenAI only	Partial
Resources/Prompts	First-class	N/A	N/A
Self-host perf control	Full (CPU/RAM tuning)	Cloud-bound	Variable

3. Dev environment и project layout

3.1 Runtime choice

Python: mcp + FastMCP — минимальный boilerplate, asyncio-native.
TypeScript: @modelcontextprotocol/sdk — если stack уже на Node/Bun.

3.2 Bootstrap

python -m venv .venv && source .venv/bin/activate
pip install mcp httpx pydantic

npm init -y && npm install @modelcontextprotocol/sdk

3.3 Layout

my-mcp-server/
├── server.py
├── tools/
├── resources/
├── prompts/
├── tests/
└── pyproject.toml

3.4 Debug toolchain

MCP Inspector: interactive tools/call replay.
Claude Desktop: claude_desktop_config.json.
Cursor: .cursor/mcp.json.

4. Первый MCP Server: Hello World

4.1 Minimal server

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("my-first-server")

@mcp.tool()
def say_hello(name: str) -> str:
    """Greet by name"""
    return f"Hello, {name}! First MCP tool invocation."

if __name__ == "__main__":
    mcp.run()

4.2 Verify

python server.py
npx @modelcontextprotocol/inspector python server.py

Inspector: tools/list → say_hello; tools/call с {"name": "dev"} — baseline latency обычно <5 ms stdio.

4.3 Client wiring

{
  "mcpServers": {
    "my-first-server": {
      "command": "python",
      "args": ["/absolute/path/to/server.py"]
    }
  }
}

Absolute paths only — relative paths = #1 cause flaky stdio reconnects.

5. Tools: functions на hot path tools/call

5.1 Schema generation

Signature + docstring → JSON Schema auto. Naming: snake_case, verb-first (search_web, read_file).

5.2 Pydantic input

from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    query: str = Field(description="Search query")
    max_results: int = Field(default=5)
    language: str = Field(default="ru")

@mcp.tool()
def web_search(input: SearchInput) -> list[dict]:
    ...

5.3 Five production-grade tools

Calculator: safe eval subset, no arbitrary code exec.
File I/O: chroot whitelist, block path traversal.
HTTP: httpx, timeout 30 s, connection pool reuse.
DB: read-only SELECT, parameterized — zero DDL.
Time: UTC datetime.now(timezone.utc).

5.4 Async tools (I/O-bound)

import httpx

@mcp.tool()
async def fetch_url(url: str) -> str:
    async with httpx.AsyncClient(timeout=30.0, limits=httpx.Limits(max_connections=20)) as client:
        r = await client.get(url)
        r.raise_for_status()
        return r.text

Async path критичен при concurrent tools/call — sync блокирует event loop, P99 взлетает.

5.5 Error contract

Structured JSON errors, not stack traces to client.
Hard timeout на external I/O (≤30 s).
Least-privilege file/DB access.

6. Resources: read-only data plane

6.1 Semantics

Resource = data provider, Tool = action executor. resources/read — idempotent, no side effects.

6.2 Static + dynamic

import json

@mcp.resource("config://app-settings")
def get_app_settings() -> str:
    return json.dumps({"version": "1.0", "env": "production"})

@mcp.resource("user://{user_id}/profile")
def get_user_profile(user_id: str) -> str:
    return json.dumps(db.query_user(user_id))

6.3 Filesystem resource server

List dir → read file → optional watchfiles for resource update notifications. Root whitelist mandatory.

7. Prompts: reusable injection templates

from mcp.types import PromptMessage, TextContent

@mcp.prompt()
def code_review_prompt(language: str, code: str) -> list[PromptMessage]:
    return [
        PromptMessage(
            role="user",
            content=TextContent(
                type="text",
                text=f"""Code review {language}:
1. Correctness
2. Security surface
3. Hot path performance

```{language}
{code}
```"""
            )
        )
    ]

8. HTTP transport: remote MCP Server

8.1 stdio vs HTTP+SSE

Metric	stdio	HTTP + SSE
Deploy	Local subprocess	Remote host
Latency	<1 ms RTT	10–200 ms (network)
Concurrent clients	1	N (connection pool)
Throughput ceiling	Single pipe	Horizontal scale + load balancer
Use case	Local dev	7×24 production, team shared

8.2 HTTP implementation

mcp = FastMCP("remote-server", transport="streamable-http")
if __name__ == "__main__":
    mcp.run(host="0.0.0.0", port=8000)

8.3 Auth hardening

Bearer Token middleware.
IP allowlist для internal clients.
Rate limit ~10 tools/call/s per API key.
CORS strict origin.

9. Debug, tests, failure modes

9.1 Inspector workflow

npx @modelcontextprotocol/inspector python server.py
Inspect tools/resources/prompts lists.
Manual tools/call, validate JSON wire.
Chaos: timeout, bad params.

9.2 Unit test

@pytest.mark.asyncio
async def test_calculator_tool():
    params = StdioServerParameters(command="python", args=["server.py"])
    async with stdio_client(params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            result = await session.call_tool("calculate", {"expression": "2 + 2"})
            assert result.content[0].text == "4"

9.3 Failure matrix

Symptom	Root cause	Fix
Tool missing in client	Bad path, no restart	Absolute path, restart Cursor
JSON serialize fail	datetime in response	Serialize to str/dict
Timeout disconnect	Sync blocking I/O	async + split long jobs
Permission denied	Path outside whitelist	Configure root dir

10. Production: Docker, hosting, observability

10.1 Dockerfile

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "server.py"]

10.2 Hosting options 2026

Railway/Render: $5–20/mo, cold start penalty.
Cloud Run/Lambda: per-invocation billing, latency spikes on cold.
Remote Mac Apple Silicon: M-series Unified Memory 16–64 GB — embedding + ChromaDB + MCP server без swap thrashing; launchd keepalive; Neural Engine offload для local embed models.

10.3 Observability stack

Structured log per tools/call: tool_name, duration_ms, status.
Prometheus: call_rate, P50/P99 latency, error_ratio.
GET /health + protocol version header.

10.4 Performance note: Apple Silicon

Vector search на M4 с 24 GB Unified Memory: ChromaDB in-process query ~15–40 ms на 100K chunks (benchmark зависит от embedding dim). На Intel laptop с 16 GB RAM + swap — P99 может уходить в секунды. Metal-accelerated frameworks (Core ML embed) на Apple Silicon дают 2–4× throughput vs CPU-only x86 при batch embed.

11. Case study: personal knowledge base MCP Server

11.1 Requirements

Semantic search по Markdown notes.
Create/update notes (whitelist path).
Token budget: не тащить весь corpus в context.

11.2 Stack

ChromaDB / Qdrant embedded.
Embeddings: text-embedding-3-small или local nomic-embed-text на Neural Engine.
watchfiles → incremental reindex.

11.3 Core tools

index_notes: scan, chunk, embed, upsert.
search_notes: Top-K cosine similarity + source path.
write_note: atomic write в whitelist.
notes://{path} resource: full doc read.

Token savings vs full corpus injection: 90%+. Hot path search_notes на Apple Silicon remote Mac — типично <200 ms end-to-end при warm index.

12. MCP ecosystem и roadmap 2026

12.1 Reference servers

mcp-server-filesystem, github, brave-search, postgres, slack.

Spec: modelcontextprotocol.io; Python SDK: github.com/modelcontextprotocol/python-sdk.

12.2 2026 trends

Native MCP во всех major AI IDEs.
AAIF certification, OAuth 2.1, granular tool ACL.
Audit logs для enterprise compliance.

12.3 Next steps

Read full MCP spec.
Ship open-source server на GitHub.
MCP + Agent (Cursor Agent Skills).

13. Итог: от laptop stdio к production node

Покрыли полный stack: protocol → env → Hello World → Tools/Resources/Prompts → HTTP → Inspector → Docker → knowledge base case study. MCP — de facto standard для AI tooling в 2026.

Лимиты local stdio очевидны: lid close = disconnect; embedding model + vector index жрут RAM; несколько MCP servers конкурируют за CPU — P99 tools/call деградирует. Production setup (vector RAG, HTTP remote, long Agent sessions) на always-on Apple Silicon node даёт predictable latency: Unified Memory для in-process vector DB, launchd для process supervision, тот же toolchain что Cursor/Claude Desktop.

SFTPMAC remote Mac rental — 7×24 Apple Silicon host для MCP Server и Agent pipelines: low-latency HTTP+SSE, native Python/Node, SFTP/rsync для sync notes/config. Лучше чем «домашний Mac как prod server». Какой tool напишете первым? Hello World — сегодня.