LangGraph или CrewAI для продакшена?

LangGraph — при сложных state transitions и жёстком SLA. CrewAI — для быстрого PoC с role-based teams. Перед go-live мигрируйте на LangGraph, если нужны checkpointer и conditional edges.

Чем MCP отличается от A2A?

MCP — вертикальное подключение агента к внешним tools и ресурсам. A2A — горизонтальная координация и делегирование между агентами. Стандарт 2026: оба протокола в двухслойной архитектуре.

Сколько агентов держать в проде?

Практический потолок: 3–8 агентов. Дальше — context pollution, экспоненциальный debug cost и runaway token spend. Расширяйте возможности через MCP tools, а не через размножение агентов.

2026 Multi-Agent AI Architecture: продакшен-оркестрация и матрица решений

В 2026 один LLM-агент не тянет составные бизнес-пайплайны под нагрузкой. Google Agent Bake-Off (2025): multi-agent команды дают до 6× success rate на composite tasks; AdaptOrch фиксирует +12–23 % quality на adaptive topology. Ниже — hardcore-разбор MAS: core concepts, 6 orchestration patterns, LangGraph/CrewAI/AutoGen matrix, MCP+A2A stack, production engineering, observability, pitfalls, decision tree, 2026 trends и мост на SFTPMAC Remote Mac.

1. Почему single agent не выживает в проде

PoC на одном агенте выглядит убедительно. Под реальной нагрузкой всплывают четыре structural limits:

Context bottleneck: длинный history + tool output забивают 128K window; на 10-step research pipeline промежуточные результаты теряются уже к step 7. Измеримо: >40 % quality drop без state export между шагами.
Размытие expertise: один system prompt на code review + legal + analytics — ни один домен не дотягивает до audit-grade depth. Role separation даёт measurable lift.
Serial inefficiency: три независимых task в одной очереди = 100 % idle wait на параллелизуемых участках. Fan-out/fan-in режет P95 latency на 40–60 %.
Single point of failure: одна hallucination или failed tool call стопорит весь flow. Supervisor-worker позволяет per-worker retry без global abort.

Цифры не про «больше агентов — лучше», а про правильную декомпозицию + orchestration layer. Без этого post-mortem превращается в гадание.

2. MAS: core concepts и 3 control modes

Multi-Agent System — автономные агенты под shared state, comms protocol и orchestration layer. Четыре принципа для stable prod:

Role specialization: один agent — одна responsibility; system prompt и toolset жёстко scoped.
Tool isolation: agent A — read-only DB, agent B — write-only. Least privilege per role.
State isolation: session keys, checkpointer IDs, MCP connections — per agent. Иначе context bleed между tenants.
Replaceability: worker models hot-swappable; supervisor routing contract immutable.

Control mode	Характеристика	Типичный сценарий
Centralized	Один orchestrator dispatch + aggregate	Finance, healthcare — strict audit trail
Decentralized	Peer negotiation и delegation	Brainstorm, exploratory research
Hierarchical	Supervisor → worker → sub-worker	Large-scale codegen, multi-hop research

3. Шесть паттернов оркестрации

90 %+ production MAS укладываются в эти шесть. Выбирайте явно — implicit hybrid ломает debug и capacity planning.

3.1 Sequential pipeline

A → B → C, fixed order. LangGraph: add_edge("researcher", "writer"). Use case: research → draft → edit.

3.2 Parallel fan-out / fan-in

Supervisor шлёт task трём worker параллельно, aggregate results. LangGraph Send или AutoGen GroupChat. Web search + DB query + static analysis одновременно — типичный win.

3.3 Hierarchical supervisor-worker

Supervisor: task decomposition, worker selection, QA gate. CrewAI Process.hierarchical или LangGraph conditional edges.

from langgraph.graph import StateGraph, END

def supervisor_node(state):
    if state["needs_code"]:
        return "coder"
    return "researcher"

graph = StateGraph(AgentState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("coder", coder_agent)
graph.add_node("researcher", researcher_agent)
graph.add_conditional_edges("supervisor", supervisor_node)

3.4 Swarm coordination

Peer-to-peer messaging до consensus. Креативно мощно; в проде — hard cap на rounds (например 15) и duplicate-state detection. Иначе бесконечный «понял, принял».

3.5 Blackboard architecture

Shared memory (Redis, PostgreSQL JSONB) для intermediate results; agents read/write async. Overnight batch, long-running analytics — sweet spot.

3.6 Hybrid mode

~80 % реальных систем: parallel fan-out под supervisor, writer pipeline на выходе. LangGraph subgraphs модульно упаковывают subflows.

4. LangGraph vs CrewAI vs AutoGen — selection matrix

Ось	LangGraph	CrewAI	AutoGen
State management	Checkpointer, persistence из коробки	Task-scoped, custom memory	Conversation history
Branching / loops	StateGraph, explicit control	Process types ограничены	Dynamic GroupChat
Learning curve	Средняя–высокая	Низкая (YAML + roles)	Средняя
Production readiness	★★★★★	★★★☆☆	★★★★☆
PoC velocity	★★★☆☆	★★★★★	★★★★☆
MCP integration	Official adapter	Custom tool wrappers	Function calling path
Metal / Apple Silicon local inference	Ollama node в графе, UMA-friendly	Через custom tools	Code-exec loops, CPU-heavy

Rule of thumb: complex state machine + SLA → LangGraph. Role PoC за неделю → CrewAI, migrate before go-live. Human-in-the-loop dialog → AutoGen v0.4+. На M4 unified memory 5–8 agent + Ollama q4_K_M — реалистичный on-prem slice; на x86 VPS тот же stack упрётся в swap thrashing.

5. MCP + A2A: vertical tools, horizontal agents

Reference stack 2026: MCP down, A2A across. Путать протоколы — architectural debt.

MCP: agent → external tools/DB/API. JSON-RPC 2.0, tools/list, tools/call. Детали: MCP как HTTP эры AI.
A2A: horizontal delegation. Google Agent Card (capabilities, endpoints) + JSON-RPC task handoff orchestrator → worker.

Minimal Agent Card:

{
  "name": "code-reviewer-agent",
  "description": "Security and quality review for PR diffs",
  "url": "https://agent.internal/a2a/v1",
  "capabilities": ["streaming", "pushNotifications"],
  "skills": [{ "id": "security-scan", "name": "Security Scan" }]
}

MCP alone не делегирует между agents. A2A alone не коннектит PostgreSQL. Оба слоя обязательны для полноценного MAS.

6. Production engineering: state, HITL, circuit breaker, token budget

Demo green ≠ prod stable. Семь шагов до audit-ready deploy:

Decompose use case: 3–8 specialized agents; input/output schemas — JSON Schema, versioned.
Pick pattern: sequential, fan-out или hierarchical — код в LangGraph StateGraph.
Wire MCP: minimal MCP servers per agent (stdio/HTTP); permissions isolated.
A2A contract: Agent Cards с task_id, timeout, retry policy в JSON-RPC payload.
Persistence: SqliteSaver или Redis checkpointer — restart без state loss (RPO < 1 min).
HITL: перед DB write, billing API, outbound email — interrupt_before node.
Circuit breaker + token budget: max 3 retries per worker; session cap 50K in / 20K out via middleware.

Cost reference (июнь 2026): 5 agents × 10 research rounds — GPT-4.1: $0.80–$2.40/run; DeepSeek V3: $0.05–$0.20/run. Без token budget monthly burn unpredictable — FinOps nightmare.

7. Observability: MAST failure distribution, distributed tracing

MAST framework (CMU, 2024) — failure taxonomy в multi-agent prod:

Ambiguous specification: ~42 %
Tool/API errors: ~28 %
Coordination failure: ~18 %
Other (model quality): ~12 %

Metric	Alert threshold	Tooling
E2E latency P95	> 60 s	OpenTelemetry + Grafana
Tool call failure rate	> 5 % / 5 min	LangSmith / Langfuse
Tokens per task	> 120 % budget	Custom middleware
LLM-as-a-Judge score	< 3.5 / 5.0	Batch eval pipeline
Agent loop detection	Same state ≥ 5×	StateGraph cycle counter

Каждый invocation — trace_id. OpenTelemetry spans: supervisor → worker → MCP tool call. SLA на incident triage: root cause < 30 s. Без этого вы дебажите grep'ом по jsonl в 3 часа ночи.

8. Типовые pitfalls: demo → prod gap

Context pollution: shared session ID — worker B читает garbage от worker A. Isolate per agent (per-account-channel-peer discipline).
Infinite loops: swarm без stop condition — agents exchange acknowledgments forever. Hard round limit + duplicate state detection.
Agent sprawl: >10 agents — debug cost exponential. Cap 3–8; extend via MCP tools.
Demo-prod gap: Jupyter без checkpointer, auth, rate limits не переживёт night shift. Пройти все 7 шагов §6 до deploy.

9. Decision tree

Serial или parallel? → Serial: sequential pipeline; independent chunks: fan-out/fan-in.
Dynamic routing? → Yes: LangGraph conditional edges или hierarchical supervisor.
Human approval? → Yes: LangGraph interrupt + HITL UI; alt: AutoGen UserProxy.
PoC deadline ≤ 1 week? → CrewAI start, LangGraph migration before go-live.
External tools primary? → Build MCP servers first (MCP server from scratch).
Inter-agent delegation? → Yes: A2A Agent Cards; No: internal supervisor routing often enough.
7×24 uptime? → Yes: §10 Remote Mac architecture.

10. Тренды 2026 и SFTPMAC Remote Mac bridge

Четыре вектора H2 2026:

Federated orchestration: cross-org Agent Card registry с access policies.
Multimodal agents: fan-out image/audio/video — CV и design review pipelines.
Adaptive topology: runtime agent count и routing mutation (AdaptOrch lineage).
EU AI Act: с августа 2026 — HITL logs, explainability, data governance для high-risk AI. Checkpointer + audit trail проектируйте заранее.

LangGraph graphs, MCP server fleet, vector DB и OpenTelemetry collector требуют 7×24 host. Laptop теряет checkpointer state; stdio MCP processes orphan'ятся; overnight batch рвётся на sleep.

6 patterns + 3 frameworks + MCP+A2A two-layer stack валидируются локально на Mac. Для SLA (P95 < 60 s, 99.5 % uptime) нужны launchd, ≥32 GB unified memory на Apple Silicon и SFTP-synced config — не spot VM с OOM под multi-agent load.

Итог: multi-agent orchestration даёт measurable lift только с explicit ops design и always-on gateway host. Dev laptop не закрывает ни uptime, ни incident response.

SFTPMAC Remote Mac rental — 5–8 agents + multiple MCP servers на одном Apple Silicon node: unified memory без discrete VRAM bottleneck, macOS allowedPaths sandbox для tools, SFTP sync CI → prod. OpenClaw gateway и MAS на одном хосте — config push без ручного ssh в 2 ночи. Если нужен не weekend demo, а uninterrupted production pipeline — 7×24 remote Mac дешевле, чем чинить OOM на дешёвом x86 VPS.