Разработка MCP Server с нуля: полное руководство разработчика
LLM не может напрямую дернуть ваш PostgreSQL, прочитать /etc/app.conf или вызвать internal REST — не из-за «глупости модели», а из-за отсутствия стандартизированного tool channel. Model Context Protocol (MCP) — open JSON-RPC протокол Anthropic: Claude, Cursor, GPT подключаются к внешним capability через единый wire format. Здесь — полный pipeline: Tools, Resources, Prompts, HTTP transport, profiling latency и production на Apple Silicon с Unified Memory.
1. Три bottleneck'а: зачем AI нужен MCP Server
Фиксируем проблемы до первой строки кода:
- Tool silos: Function Calling (OpenAI proprietary), Plugins (walled garden), LangChain Tools (framework lock-in) — при смене provider переписываете integration layer. Классический N×M.
- Data unreachable: training cutoff, no live config stream, no internal docs in context — без controlled data plane.
- Actions not executable: pure chat не шлёт HTTP, не пишет файлы, не выполняет SQL — нужен standardized tool surface.
Если уже читали MCP как HTTP эры AI — здесь сразу implementation. ЦА: backend/AI engineers с Python или TypeScript.
2. Что такое MCP: wire protocol и architecture
2.1 Evolution stack
Function Calling → Plugins → MCP (Nov 2024, Anthropic, open spec). One server implementation — multiple clients (Cursor, Claude Desktop, VS Code Copilot, Gemini CLI). AAIF governance, 10 000+ registered servers в 2026.
2.2 Topology: Client ↔ Server ↔ triad
┌────────────────────┐ ┌─────────────────────┐
│ MCP Client │ ◄─────► │ MCP Server │
│ (Claude / Cursor) │ JSON │ (ваш код) │
│ │ -RPC │ │
└────────────────────┘ └─────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
Tools Resources Prompts
(mutations) (read-only) (templates)
- Client: model-side runtime, orchestrates tools/call.
- Server: exposes capability surface.
- Tools: side-effect functions — search, calc, DB write path.
- Resources: URI-addressable read streams — zero mutation guarantee.
- Prompts: injectable multi-turn templates.
2.3 Transport layer
JSON-RPC 2.0 over:
- stdio: subprocess pipe, typical round-trip <1 ms on same machine — zero TCP overhead.
- HTTP + SSE / Streamable HTTP: remote, concurrent clients, measurable RTT 10–200 ms depending on region.
Lifecycle: initialize → capability negotiation (tools/list, resources/list) → tools/call hot path → shutdown.
2.4 Decision matrix: MCP vs alternatives
| Dimension | MCP | OpenAI FC | LangChain Tools |
|---|---|---|---|
| Wire standard | Open JSON-RPC | Vendor-specific | Framework-bound |
| Transport | stdio / HTTP | HTTP only | HTTP |
| Cross-model | Claude, GPT, Gemini | OpenAI only | Partial |
| Resources/Prompts | First-class | N/A | N/A |
| Self-host perf control | Full (CPU/RAM tuning) | Cloud-bound | Variable |
3. Dev environment и project layout
3.1 Runtime choice
- Python:
mcp+FastMCP— минимальный boilerplate, asyncio-native. - TypeScript:
@modelcontextprotocol/sdk— если stack уже на Node/Bun.
3.2 Bootstrap
python -m venv .venv && source .venv/bin/activate
pip install mcp httpx pydantic
npm init -y && npm install @modelcontextprotocol/sdk
3.3 Layout
my-mcp-server/
├── server.py
├── tools/
├── resources/
├── prompts/
├── tests/
└── pyproject.toml
3.4 Debug toolchain
- MCP Inspector: interactive tools/call replay.
- Claude Desktop:
claude_desktop_config.json. - Cursor:
.cursor/mcp.json.
4. Первый MCP Server: Hello World
4.1 Minimal server
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("my-first-server")
@mcp.tool()
def say_hello(name: str) -> str:
"""Greet by name"""
return f"Hello, {name}! First MCP tool invocation."
if __name__ == "__main__":
mcp.run()
4.2 Verify
python server.py
npx @modelcontextprotocol/inspector python server.py
Inspector: tools/list → say_hello; tools/call с {"name": "dev"} — baseline latency обычно <5 ms stdio.
4.3 Client wiring
{
"mcpServers": {
"my-first-server": {
"command": "python",
"args": ["/absolute/path/to/server.py"]
}
}
}
Absolute paths only — relative paths = #1 cause flaky stdio reconnects.
5. Tools: functions на hot path tools/call
5.1 Schema generation
Signature + docstring → JSON Schema auto. Naming: snake_case, verb-first (search_web, read_file).
5.2 Pydantic input
from pydantic import BaseModel, Field
class SearchInput(BaseModel):
query: str = Field(description="Search query")
max_results: int = Field(default=5)
language: str = Field(default="ru")
@mcp.tool()
def web_search(input: SearchInput) -> list[dict]:
...
5.3 Five production-grade tools
- Calculator: safe eval subset, no arbitrary code exec.
- File I/O: chroot whitelist, block path traversal.
- HTTP:
httpx, timeout 30 s, connection pool reuse. - DB: read-only SELECT, parameterized — zero DDL.
- Time: UTC
datetime.now(timezone.utc).
5.4 Async tools (I/O-bound)
import httpx
@mcp.tool()
async def fetch_url(url: str) -> str:
async with httpx.AsyncClient(timeout=30.0, limits=httpx.Limits(max_connections=20)) as client:
r = await client.get(url)
r.raise_for_status()
return r.text
Async path критичен при concurrent tools/call — sync блокирует event loop, P99 взлетает.
5.5 Error contract
- Structured JSON errors, not stack traces to client.
- Hard timeout на external I/O (≤30 s).
- Least-privilege file/DB access.
6. Resources: read-only data plane
6.1 Semantics
Resource = data provider, Tool = action executor. resources/read — idempotent, no side effects.
6.2 Static + dynamic
import json
@mcp.resource("config://app-settings")
def get_app_settings() -> str:
return json.dumps({"version": "1.0", "env": "production"})
@mcp.resource("user://{user_id}/profile")
def get_user_profile(user_id: str) -> str:
return json.dumps(db.query_user(user_id))
6.3 Filesystem resource server
List dir → read file → optional watchfiles for resource update notifications. Root whitelist mandatory.
7. Prompts: reusable injection templates
from mcp.types import PromptMessage, TextContent
@mcp.prompt()
def code_review_prompt(language: str, code: str) -> list[PromptMessage]:
return [
PromptMessage(
role="user",
content=TextContent(
type="text",
text=f"""Code review {language}:
1. Correctness
2. Security surface
3. Hot path performance
```{language}
{code}
```"""
)
)
]
8. HTTP transport: remote MCP Server
8.1 stdio vs HTTP+SSE
| Metric | stdio | HTTP + SSE |
|---|---|---|
| Deploy | Local subprocess | Remote host |
| Latency | <1 ms RTT | 10–200 ms (network) |
| Concurrent clients | 1 | N (connection pool) |
| Throughput ceiling | Single pipe | Horizontal scale + load balancer |
| Use case | Local dev | 7×24 production, team shared |
8.2 HTTP implementation
mcp = FastMCP("remote-server", transport="streamable-http")
if __name__ == "__main__":
mcp.run(host="0.0.0.0", port=8000)
8.3 Auth hardening
- Bearer Token middleware.
- IP allowlist для internal clients.
- Rate limit ~10 tools/call/s per API key.
- CORS strict origin.
9. Debug, tests, failure modes
9.1 Inspector workflow
npx @modelcontextprotocol/inspector python server.py- Inspect tools/resources/prompts lists.
- Manual tools/call, validate JSON wire.
- Chaos: timeout, bad params.
9.2 Unit test
@pytest.mark.asyncio
async def test_calculator_tool():
params = StdioServerParameters(command="python", args=["server.py"])
async with stdio_client(params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.call_tool("calculate", {"expression": "2 + 2"})
assert result.content[0].text == "4"
9.3 Failure matrix
| Symptom | Root cause | Fix |
|---|---|---|
| Tool missing in client | Bad path, no restart | Absolute path, restart Cursor |
| JSON serialize fail | datetime in response | Serialize to str/dict |
| Timeout disconnect | Sync blocking I/O | async + split long jobs |
| Permission denied | Path outside whitelist | Configure root dir |
10. Production: Docker, hosting, observability
10.1 Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "server.py"]
10.2 Hosting options 2026
- Railway/Render: $5–20/mo, cold start penalty.
- Cloud Run/Lambda: per-invocation billing, latency spikes on cold.
- Remote Mac Apple Silicon: M-series Unified Memory 16–64 GB — embedding + ChromaDB + MCP server без swap thrashing; launchd keepalive; Neural Engine offload для local embed models.
10.3 Observability stack
- Structured log per tools/call: tool_name, duration_ms, status.
- Prometheus: call_rate, P50/P99 latency, error_ratio.
GET /health+ protocol version header.
10.4 Performance note: Apple Silicon
Vector search на M4 с 24 GB Unified Memory: ChromaDB in-process query ~15–40 ms на 100K chunks (benchmark зависит от embedding dim). На Intel laptop с 16 GB RAM + swap — P99 может уходить в секунды. Metal-accelerated frameworks (Core ML embed) на Apple Silicon дают 2–4× throughput vs CPU-only x86 при batch embed.
11. Case study: personal knowledge base MCP Server
11.1 Requirements
- Semantic search по Markdown notes.
- Create/update notes (whitelist path).
- Token budget: не тащить весь corpus в context.
11.2 Stack
- ChromaDB / Qdrant embedded.
- Embeddings:
text-embedding-3-smallили localnomic-embed-textна Neural Engine. watchfiles→ incremental reindex.
11.3 Core tools
- index_notes: scan, chunk, embed, upsert.
- search_notes: Top-K cosine similarity + source path.
- write_note: atomic write в whitelist.
- notes://{path} resource: full doc read.
Token savings vs full corpus injection: 90%+. Hot path search_notes на Apple Silicon remote Mac — типично <200 ms end-to-end при warm index.
12. MCP ecosystem и roadmap 2026
12.1 Reference servers
- mcp-server-filesystem, github, brave-search, postgres, slack.
Spec: modelcontextprotocol.io; Python SDK: github.com/modelcontextprotocol/python-sdk.
12.2 2026 trends
- Native MCP во всех major AI IDEs.
- AAIF certification, OAuth 2.1, granular tool ACL.
- Audit logs для enterprise compliance.
12.3 Next steps
- Read full MCP spec.
- Ship open-source server на GitHub.
- MCP + Agent (Cursor Agent Skills).
13. Итог: от laptop stdio к production node
Покрыли полный stack: protocol → env → Hello World → Tools/Resources/Prompts → HTTP → Inspector → Docker → knowledge base case study. MCP — de facto standard для AI tooling в 2026.
Лимиты local stdio очевидны: lid close = disconnect; embedding model + vector index жрут RAM; несколько MCP servers конкурируют за CPU — P99 tools/call деградирует. Production setup (vector RAG, HTTP remote, long Agent sessions) на always-on Apple Silicon node даёт predictable latency: Unified Memory для in-process vector DB, launchd для process supervision, тот же toolchain что Cursor/Claude Desktop.
SFTPMAC remote Mac rental — 7×24 Apple Silicon host для MCP Server и Agent pipelines: low-latency HTTP+SSE, native Python/Node, SFTP/rsync для sync notes/config. Лучше чем «домашний Mac как prod server». Какой tool напишете первым? Hello World — сегодня.