2026 OpenClaw sessions_spawn Missing From Tool List, allowAgents allowed:none, Subagent Nesting, and Layered Doctor Runbook

Executive summary: three confusions that send teams in circles

The first confusion treats the absence of sessions_spawn from the rendered tool palette as proof that the feature was removed upstream. In practice the gateway often still ships the primitive while the active runtime profile suppresses it because the selected model endpoint does not negotiate function calling, because allowAgents resolved to an empty set for that channel session, or because the management WebSocket never stabilized long enough for the client to synchronize capabilities.

The second confusion reads allowed:none as a broken access-control list when it is frequently an evaluation snapshot: no agent identity satisfied the policy predicate for the inbound message, so the tool surface collapses to a safe minimum. Editing random booleans without tracing which list entry should have matched wastes hours and leaves security holes.

The third confusion jumps straight to nginx buffering or Caddy TLS whenever anything flakes, even when local loopback is already failing. Proxies matter, yet they should enter the story after the same process passes clean on 127.0.0.1. The layered sequence in this runbook prevents expensive firewall tickets for application misconfiguration.

Finally, remember that production hardening from the SSRF guide can silently drop outbound callbacks that some tool implementations rely on when they wrap HTTP assistants. A gateway that is secure on paper but starves legitimate egress looks identical to a dead model route in the UI.

Symptom grammar: when the tool list lies by omission

OpenClaw surfaces tools to the model only when the stack agrees on three axes: the transport is healthy, the policy graph names at least one eligible agent for the session, and the model contract includes tool or function slots. If any axis fails, observability may still show a green banner while individual primitives vanish.

sessions_spawn is a high-privilege primitive relative to read-only fetchers. Gateways therefore guard it behind combinations of agent allow lists, channel trust, and model capability flags. When operators expect to fork nested automations from a parent bot user, they should verify that the parent session actually maps to an agent entry with child tooling enabled, not merely that a default agent exists elsewhere in the file.

allowAgents diagnostics deserve literal reading. allowed:none means the evaluator produced zero passing identities for the current message context. Common causes include mismatched channel user identifiers, tenant scoping that filters the list, stale cached policy after a reload that never happened, and accidental duplication of agent identifiers so the resolver short-circuits. Capture the diagnostic blob immediately; it is more trustworthy than human memory of YAML indentation.

When both symptoms appear together, resist the urge to patch only one. An empty allow list starves the graph, and a model without tools makes the starvation invisible because the assistant never attempts structured calls. Treat the pair as a correlated incident until evidence splits them.

Nesting subagents under agents.list and why placement matters

Many teams first model a flat agents.list array where every entry is a peer. That layout works until you need hierarchical delegation: a primary agent that may spawn constrained workers with different model routes, rate limits, or secret scopes. OpenClaw expects those children under agents.list[].subagents so inheritance and allow lists compose predictably.

Incorrect nesting is a frequent reason why operators believe sessions_spawn is missing. The parent entry loads, channels bind, yet the child definitions sit in a commented experimental block or under a key the parser ignores. The gateway then exposes only the parent capabilities, and if the parent policy forbids spawning, the UI never lists the primitive.

Validate structure with the same discipline you use for Kubernetes manifests: schema first, then semantics. After each edit, restart or reload according to the install guide so you are not staring at a stale process that read the previous file generation. Containerized deployments amplify this failure mode because volume mounts and entrypoint wrappers sometimes point at a different path than the editor window.

When subagents specify their own model overrides, confirm that those routes still support parallel tool calls if your playbook expects them. A child routed to a completion-only endpoint will not magically gain tools because the parent had them. Inheritance is about policy and secrets, not about retroactively upgrading vendor APIs.

Models must support function calling for structured tools

Even perfect YAML cannot conjure tool calling on a model deployment that only exposes plain text completions. OpenClaw negotiates capabilities with the upstream according to the configured driver and feature flags. If the remote surface lacks function or tool slots, the gateway prunes high-risk entries such as sessions_spawn from the catalog the assistant sees.

Use openclaw models status as the honest broker between marketing pages and reality. It should list the active route, fallback ordering, and any capability warnings emitted by the client library. When migrating from a chat-completions style deployment to a tools-native deployment, schedule the switch in staging and capture before-and-after JSON from doctor output.

Mixed fleets amplify mistakes. A parent agent might still point at a tool-capable model while a nested subagent inherits a cheaper route without tools to save tokens. The nested session then appears broken while top-level chat looks fine. Align economics and capability explicitly rather than hoping inheritance copies model identifiers you never set.

Latency-sensitive channels such as Telegram streaming or Teams cards, discussed in the channel production article, compound the issue because partial tool failures surface as truncated messages rather than hard errors. Correlate model capability flags with chunk boundaries when users report “ghost” replies.

Layered runbook: status, then models status, then doctor, then logs

Layer one: openclaw status. Establish whether the local daemon or container believes itself healthy, which version string is running, and which interfaces are bound. If status already reports dependency failures, fix those before interpreting missing tools. Version skew between CLI and server is a classic reason doctor looks fine on a laptop while the server never loaded the new module.

Layer two: openclaw models status. Confirm the active model route, authentication health, and capability hints. If this layer fails, no amount of channel debugging in the gateway operations guide will resurrect tools. Capture output for attachments when opening vendor tickets.

Layer three: openclaw doctor. Run with structured JSON when available so you can diff runs over time. Map doctor findings to concrete keys: filesystem permissions, missing secrets, invalid webhook URLs, adapter timeouts. Doctor is not a replacement for logs, but it compresses dozens of silent misconfigurations into actionable bullets.

Layer four: logs. Tie gateway logs to channel adapter logs using a shared identifier. Increase verbosity temporarily on a staging host rather than blinding production with permanent trace noise. Pay attention to lines mentioning tool registration, policy denial, or spawn rejection codes.

Only after all four layers look coherent should you widen scope to reverse proxies or egress firewalls. Skipping layers turns every incident into a network hunt.

Decision matrix: classify before rewriting configuration

Use the matrix during incidents and postmortems so arguments refer to signals, not vibes.

Primary signal	Likely layer	First action	Deeper reading
sessions_spawn missing, loopback healthy	Model capability or policy	models status, then allowAgents trace	Function calling section above
allowAgents allowed:none	Agent graph or channel identity	Verify agents.list and subagents mapping	Gateway doctor guide
Intermittent tool catalog resets	WebSocket stability	Compare direct port versus public hostname	Reverse proxy guide
Tool calls hang after model reply	Outbound callback blocked	Test egress to vendor URL from gateway host	SSRF hardening guide
Upgrade changed behavior	Version skew or defaults	status plus install rollback path	Install and rollback guide
Rich media channels misbehave only under load	Adapter limits and streaming	Split traffic, review streaming timeouts	Channel production article

When two cells both match, work top to bottom in the runbook, not left to right in panic.

CLI examples: capture evidence before editing YAML

Run on the same host that executes the gateway, or inside the container namespace, so paths and environment variables match reality.

# Layer 1 — process and build identity
openclaw status

# Layer 2 — model routes and capability hints
openclaw models status

# Layer 3 — structured health (add flags your build supports)
openclaw doctor --json

# Layer 4 — follow logs with session correlation
# macOS example: substitute your service label or file path
log stream --style syslog --predicate 'process == "openclaw"'
# or: tail -n 200 /var/log/openclaw/gateway.log

Archive command output next to the configuration diff in your ticket system. Future you will thank present you.

Reverse proxy WebSocket upgrades and why tooling depends on them

Many teams terminate TLS on nginx or Caddy in front of OpenClaw. When Upgrade and Connection headers are mishandled, browsers and desktop clients may still load static pages while the management WebSocket silently retries. The UI then shows stale capability lists, which looks exactly like missing tools.

Follow the reverse proxy guide for timeout alignment, buffering directives, and allowedOrigins constraints. Idle timeouts shorter than long-running model streams cause subtle partial failures; tooling registration often arrives early in the session, so short timeouts sometimes mask the issue until the first spawn attempt.

Always compare three paths: raw localhost, internal service name inside the cluster, and public hostname through the proxy. If only the public hostname fails, you have isolated the defect to the edge. If all three fail, return to the layered runbook.

Production hardening: when SSRF guards block legitimate callbacks

The production hardening guide explains why OpenAI-style webhooks and arbitrary fetch tooling are dangerous without egress controls. Well-intentioned teams sometimes deploy deny-by-default egress rules that also block vendor endpoints required for tool confirmation flows.

When hardening, maintain an explicit allow list document tied to configuration as code. Each hostname should list the owning team, rotation cadence, and the feature that breaks if removed. Security reviewers can approve narrow holes when rationale is concrete.

Distinguish user-initiated fetches from gateway-initiated callbacks. SSRF protections often target the former while forgetting that model drivers may call back to platform URLs for telemetry or token exchange. Use staging mirrors of production firewalls to catch these overlaps before launch.

Rotate webhook secrets via managers, not shell history, and re-audit egress whenever new tools add domains.

Field glossary: vocabulary for precise postmortems

Tool surface is the filtered set of callable functions for a session. sessions_spawn is the delegation primitive; it appears only when policy and capability align. allowAgents is the evaluated identity set for the inbound context, and allowed:none simply means no row matched the policy predicate. agents.list holds top-level agents; subagents nest under parents per schema, carrying inherited or overridden secret scope, rate limit, and model route choices. Function calling is the upstream feature that exposes tool slots; a completion-only endpoint cannot fake it. Capability hints and client warnings belong in openclaw models status output.

The gateway mediates channels, models, and policy. Channel adapters map Telegram, Teams, or other surfaces into sessions. The management WebSocket synchronizes UI control state; when it flakes, tool catalogs look stale. Doctor summarizes local health; structured logging plus a shared correlation identifier ties gateway and adapter lines together during triage.

At the edge, TLS termination, correct WebSocket upgrade headers, sane idle timeout values, and tight allowedOrigins keep operator consoles honest. Proxies that strip auth headers or corporate browsers with strict content policies can mimic application bugs, so document approved clients.

SSRF defenses restrict server-side fetches; pair them with an explicit egress allow list so legitimate webhook and vendor callbacks still flow. Version skew between CLI and server binaries, plus volume mount drift in containers, explains many phantom regressions; staging parity for firewalls catches hardening mistakes before launch.

A no-op tool in CI prompts is a cheap contract test. Watch observability cardinality when tagging sessions, and trim runbook debt by storing command transcripts with diffs. Tenant boundaries isolate customers on shared hosts. Hosted remote Mac here means rental capacity with managed ingress, the bridge to the closing CTA.

FAQ and why teams rent SFTPMAC remote Mac capacity

Why does the UI show sessions_spawn missing even though the binary is new?

Often the server is older than the CLI, the model route lacks function calling, allowAgents resolved empty, or the WebSocket through your proxy never stabilized. Prove each layer with commands before blaming releases.

What does allowAgents allowed:none mean operationally?

No configured agent satisfied the policy predicate for that message context. Audit channel identity mapping, list duplication, and subagent placement rather than randomly widening wildcards.

Should production SSRF hardening block all model callbacks?

No. Deny unexpected destinations while explicitly allowing documented vendor hosts. Otherwise tools that rely on HTTP callbacks will fail closed and mimic model outages.

Summary: Missing sessions_spawn and allowed:none usually indicate stacked issues across model capability, agent graph nesting, transport health, and egress policy. Walk status, models status, doctor, and logs in order, then proxies, then hardening.

Limits of DIY: Maintaining Mac hosts, TLS edges, webhook rules, and observability across regions consumes senior hours. SFTPMAC packages hosted remote Mac capacity with ingress patterns tuned for automation teams, redirecting engineering time toward agents and products instead of recursive tunnel debugging.

Explore SFTPMAC plans and regions when you need a stable remote Mac hub for OpenClaw-style gateways alongside SFTP-first workflows.

OpenClaw sessions_spawn Missing From the Tool List, allowAgents as allowed:none, Subagent Nesting, and a Layered Doctor Runbook