2026 opsOpenClawTLSWebSocket

2026 OpenClaw Behind Nginx or Caddy: TLS, WebSocket, and allowedOrigins Production Checklist

Running OpenClaw on 127.0.0.1:18789 feels finished until you put a real hostname and HTTPS in front. Suddenly the console shows 403 on static assets, the dashboard websocket never stays up, or allowedOrigins rejects a URL you swear matches. This guide walks Nginx and Caddy reverse-proxy patterns that terminate TLS, preserve Upgrade and Connection headers, and align gateway origin policy with what browsers actually send. It ties the proxy layer to the gateway doctor playbook, the 3.x production hardening article, and the cloud deploy FAQ so you can tell proxy misconfiguration apart from token or channel failures.

OpenClawNginxCaddyTLSWebSocketallowedOrigins
OpenClaw gateway behind reverse proxy TLS and WebSocket

Three failure modes that appear the minute HTTPS goes live

First, origin mismatch breaks browser security checks. Developers test against http://127.0.0.1:18789 where any localhost origin looks equivalent. Production users load https://agents.example.com. Fetch calls, EventSource streams, and bundled admin tools compare the page origin to entries in allowedOrigins. A missing scheme, an extra trailing slash, or a forgotten staging hostname produces 403 responses that look like authentication failures even though tokens never reached the gateway logic.

Second, WebSocket upgrades disappear inside the proxy. OpenClaw dashboards and some channel bridges expect an HTTP/1.1 upgrade path. Generic reverse-proxy recipes copy-pasted from static file hosts omit proxy_set_header Upgrade $http_upgrade; and Connection "upgrade". Symptoms include sockets that open, echo a frame, then reset because the intermediary buffered the handshake incorrectly or downgraded to HTTP/2 on the same port without a cleartext upgrade path.

Third, TLS double-termination or wrong trust anchors waste days. Teams sometimes enable HTTPS inside Node while also terminating at Nginx, leading to cipher mismatch loops or certificate chains that browsers reject. Alternatively they forward TLS passthrough without SNI-aware routing and wonder why ACME challenges fail. Keep a single termination story: public clients talk TLS to the proxy, the proxy talks plain HTTP to loopback unless you intentionally use mTLS inside a private mesh.

Fourth, mixed-content warnings appear when marketing pages stay on HTTP while the admin UI redirects to HTTPS, causing browsers to block subresources. Fifth, corporate proxies strip Sec-WebSocket-Protocol headers required by some clients; you need packet captures or curl -v through the public hostname to compare with loopback behavior. Sixth, rate limiting at the edge returns 429 that operators confuse with OpenClaw throttling. Seventh, Web Application Firewalls sometimes block JSON bodies that resemble SSRF probes even though traffic is legitimate OpenAI-compatible calls documented in the hardening guide. Eighth, IPv6 AAAA records point to a stale VM while IPv4 works, producing intermittent websocket failures that correlate with DNS selection rather than application bugs. Ninth, health checks from the load balancer hit the wrong virtual host and mark nodes unhealthy despite a warm process on port 18789. Tenth, log aggregation hides the proxy layer entirely, so on-call assumes openclaw doctor is lying when nginx already returned 502 before the request touched Node.

Threat model: why production wants a reverse proxy in front of OpenClaw

Exposing Node directly on port 443 works for homelab demos. Production benefits from a dedicated edge that handles ACME renewal, OCSP stapling, TLS cipher policy, and request size limits before untrusted clients reach your gateway. That edge is also where you attach bot scoring, GeoIP blocks, and centralized access logs that compliance teams expect.

Nginx remains the default choice when teams already operate it for other services. You get fine-grained map directives, custom error pages, and mature rate-limit zones. Operational cost is higher because certificate renewal and config reload discipline sit with your platform team.

Caddy appeals when you want automatic HTTPS with minimal boilerplate and readable Caddyfiles. Trade-offs include slightly different defaults for header pass-through, so you still verify websocket behavior rather than assume magic fixes everything.

OpenClaw itself should trust only the loopback or a unix socket when possible. Bind public listeners through the proxy, not through 0.0.0.0:18789 on cloud images unless your threat model explicitly allows it. Pair that posture with the separation of compat tokens and channel secrets described in the hardening article so a compromised edge rule does not equal a full messaging takeover.

Document which hops may append X-Forwarded-For. If OpenClaw or plugins derive client IP for auditing, spoofed headers from random internet hosts must be ignored. Only the immediate proxy IP should be trusted to add forwarding metadata. This overlaps with webhook verification ordering: authenticate before parsing large bodies, as emphasized in production SSRF guidance.

When you operate multiple environments, clone proxy snippets per hostname instead of sharing one wildcard server block that accidentally serves staging certificates to production hostnames. Keep environment-specific allowedOrigins arrays checked into separate secret stores to avoid a typo that grants staging browsers access to production gateways.

Finally, align observability: emit structured logs from both proxy and OpenClaw with a shared request identifier when possible. That identifier makes it obvious whether a timeout happened between client and nginx or between nginx and Node.

Reference industry practice for TLS deployment lifecycles: NIST Special Publication 800-52 Rev. 2 provides federal TLS guidance that many private teams mirror when choosing cipher suites and certificate rotation windows. You do not need to copy every control, but documenting which profile you follow accelerates security questionnaires and on-call handoffs.

Decision matrix: Nginx versus Caddy for OpenClaw gateways

Use the table during architecture review, then capture the chosen row inside your internal runbook alongside firewall rules.

EdgeBest whenWebSocket ergonomicsOperational notes
Nginx plus certbotYou already run nginx for APIs and static sitesExplicit Upgrade headers; watch http2 on same portYou manage renew hooks and config tests
Caddy automatic HTTPSSmall team wants minimal TLS ceremonyUsually straightforward; verify custom transportsWatch header defaults versus nginx habits
Cloud load balancer plus nginxMulti-AZ with health checksLB may need idle timeout raised for websocketsDocument two-layer timeout math
Direct Node TLSStrictly internal mesh with mTLSNode handles upgrades natively but loses WAF hooksRequires strong patch discipline on Node crypto

After you pick an edge, rehearse failure modes from the gateway ops guide: loopback health, gateway HTTP, doctor JSON, then channel bridges. Skipping straight to Telegram tests wastes hours when nginx already returns 502.

Quantify success criteria before go-live: ninety-five percent of websocket sessions should stay up for at least thirty minutes during soak tests, TLS handshakes should stay below three hundred milliseconds at p95 from your target office networks, and certificate expiry alerts should fire at least fourteen days before renewal.

Vendor-specific managed Mac or remote build hosts may already include curated proxy templates. Compare them against this checklist rather than treating vendor snippets as immutable truth.

Minimal fragments: loopback upstream, websocket headers, long reads

Adapt hostnames and TLS contact email. Test with curl -I https://your.hostname/health through the public name after deployment.

# --- Nginx (illustrative server block) ---
# proxy_pass to http://127.0.0.1:18789;
# proxy_http_version 1.1;
# proxy_set_header Host $host;
# proxy_set_header X-Forwarded-Proto $scheme;
# proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# proxy_set_header Upgrade $http_upgrade;
# proxy_set_header Connection "upgrade";
# proxy_read_timeout 3600s;
# proxy_send_timeout 3600s;

# --- Caddy (illustrative) ---
# reverse_proxy 127.0.0.1:18789 {
#   header_up X-Forwarded-Proto {scheme}
#   header_up X-Forwarded-For {remote_host}
# }
# # enable WebSocket pass-through; tune transport read_timeout for long-lived sockets

Keep comments in version control explaining why timeouts exceed six hundred seconds. Future engineers otherwise trim them during a cleanup and silently break nightly automation dashboards.

Timeouts, buffers, and observability baselines

Set proxy_read_timeout and proxy_send_timeout to at least 3600 seconds on routes that carry interactive websockets or long polling from assistant clients. Shorter defaults suited to REST APIs terminate healthy sessions. For HTTP-only compat endpoints, you may keep sixty to one hundred twenty seconds to fail fast on stuck upstreams.

Body size limits should reflect maximum webhook or media payloads you accept. If the hardening guide caps outbound media fetches at twenty megabytes, align inbound limits so attackers cannot POST gigabyte bodies that exhaust disk before OpenClaw rejects them.

Log TLS protocol version and cipher at the edge weekly to catch clients stuck on deprecated algorithms. Pair those metrics with OpenClaw gateway logs filtered by route family, as recommended in the production article, so security reviews see both layers.

Run synthetic checks every five minutes: TCP connect to 443, full TLS handshake, HTTP GET to /health, and one websocket upgrade with a pre-shared test token. Alert when any step deviates from baseline latency by more than three standard deviations.

Capacity planning: expect at least two concurrent websocket connections per active operator during business hours, plus burst traffic when CI systems hit OpenAI-compatible endpoints. Translate that into worker connections and file descriptor ulimits on the proxy host.

Document rollback: keep the previous nginx or Caddy configuration file under a dated path so nginx -t failure does not leave the team editing live production without a known-good restore point.

Publish a one-page diagram that shows DNS, load balancer, proxy, loopback OpenClaw, and outbound API paths so auditors see the full trust boundary at a glance.

FAQ, cross-links, and when hosted remote Mac simplifies the stack

Doctor is green but browsers still see 403. Where do I look?

Start at the proxy access logs for the exact path returning 403. If nginx logged it, OpenClaw never issued the response. If only application logs show 403, re-check allowedOrigins and authentication headers next.

Do I need separate server names for OpenAI-compatible routes?

Not mandatory, but many teams isolate /v1 compat traffic on its own virtual host to apply different rate limits and WAF rules without touching operator dashboards.

Can I reuse the cloud FAQ firewall section verbatim?

Use it as a baseline, then add your proxy ports 80 and 443, health-check source IPs, and any IPv6 ranges your provider publishes.

Summary: Terminate TLS at a reverse proxy, forward websocket upgrades explicitly, align allowedOrigins with real browser origins, and triage failures in layers from edge to loopback to channels.

Limitation: You still operate certificates, proxy configs, and gateway upgrades. Teams that want Apple-native toolchains colocated with stable ingress for both SFTP artifacts and long-running gateways often rent a managed remote Mac instead of owning every network hop.

SFTPMAC packages remote Mac capacity with operator-tended connectivity so your engineers focus on OpenClaw workflows instead of colocation tickets. The same machines that receive CI uploads over SFTP can host gateways with predictable uplinks when your architecture demands tight coupling between build outputs and automation.

Self-managed fleets accrue hidden time: chasing ACME failures during holidays, reconciling nginx diffs across regions, and proving websocket reliability to enterprise customers. When those costs exceed the price of dedicated hosting, moving the gateway footprint to a provider that monitors power, cooling, and upstream BGP becomes a rational trade.

Revisit this architecture quarterly because public cloud IP ranges, corporate TLS inspection policies, and OpenClaw release notes all move. Lightweight architecture decision records prevent the next hire from undoing your proxy timeouts during a well-meaning cleanup.

Explore SFTPMAC plans when you want managed remote Mac nodes with stable ingress for gateways and file delivery in one operational story.