Hermes Agent Skills terminal workflow and SKILL.md authoring diagram

2026 Hermes Agent Skills Advanced Guide: SKILL.md, Bundles & GEPA Self-Evolution Decision Guide

In early 2026 Nous Research shipped Hermes Agent, and within two months the project crossed 160,000 GitHub stars. The headline is not a bigger model—it is "the agent that grows with you." Under that promise sits a standardized, evolvable, cross-session Skills system. This guide skips first-time install (see our Hermes Agent step-by-step install guide) and goes straight to the architecture decisions: SKILL.md format, Progressive Disclosure, Skill Bundles, conditional activation, Tap publishing, and how GEPA + DSPy turn skills into assets that improve with every run.

1. Three pain points: why the Hermes Skills system deserves deep study

Many developers treat Hermes as another chat wrapper and miss the module with the highest long-term ROI:

  1. Runaway token cost: stuffing SOPs into the system prompt burns tokens on every turn. Progressive Disclosure keeps skills at zero cost until activated.
  2. Knowledge that never compounds: one-off prompts vanish with the session. Skills are cross-session procedural memory—versionable, shareable, and evolvable.
  3. Workflows that need repeated slash commands: complex tasks require multiple /skill-name invocations. Skill Bundles pack related skills into one slash command.

Advanced readers should answer five questions before shipping: How does Progressive Disclosure control tokens? How do conditional activations work? How do Bundles trigger full workflows? How does DSPy + GEPA auto-improve Skills? What community repos are worth installing today? The sections below address each one.

2. Core concepts: Skills are not Prompts, Skills are not Memory

Confusing the three leads to wrong architecture choices. A useful mnemonic: Prompt = sticky note (valid for this turn); Memory = notebook (permanent notes, always nearby); Skill = SOP manual (step-by-step procedure, opened when needed).

Dimension Plain Prompt Memory Skills
Persistence Current conversation Cross-session, permanent Cross-session, permanent
Load timing Always in context Injected every session On demand (key difference)
Token cost Every turn Small and stable Zero until activated
Content type Any intent description User preferences / facts Procedural steps (how to do something)
Maintainer User manually Agent automatically User and Agent
Shareability Awkward Private Publishable as community Tap

3. SKILL.md format and Progressive Disclosure

3.1 Base structure (agentskills.io open standard)

All Hermes Skills follow the agentskills.io open standard, ensuring portability across Hermes, Claude Code, and Cursor.

---
name: my-skill                    # Required: lowercase + hyphens, max 64 chars
description: |                    # Required: max 1024 chars; start with "Use when..."
  Use when the user needs to [...].
  Handles [...] and [...].
version: 1.0.0
license: MIT
compatibility: Requires git, docker
allowed-tools: Bash(git:*) Read
metadata:
  hermes:
    tags: [devops, automation]
    category: software-development
    related_skills: [github-pr-workflow, test-driven-development]
    requires_toolsets: [terminal]
    fallback_for_toolsets: [web]
---

# My Skill Title

## Overview
## When to Use
## Procedure
## Common Pitfalls
## Verification Checklist

3.2 Skill directory layout (modular design)

~/.hermes/skills/
└── my-category/
    └── my-skill/
        ├── SKILL.md              # Main file (core steps; aim for ≤500 lines)
        ├── references/           # API references (loaded on demand)
        ├── templates/            # Reusable templates
        └── scripts/              # Scripts the agent can execute directly

3.3 Progressive Disclosure: three load levels (token control core)

Level Content Trigger Token cost
Level 0 name + description Every session start, all skills ~3K tokens (all skills combined)
Level 1 Full SKILL.md body User /skill-name or LLM decides needed Depends on file length
Level 2 references/ and scripts/ files LLM decides during execution Per file, on demand

Writing tip: description is the only Level 0 signal—the LLM uses it to decide whether to load the full skill. Clarify when to use rather than what it is.

4. Skill Bundles: one command triggers a complete workflow

Skill Bundles are a 2026 Hermes addition and remain underused. A Bundle is lightweight YAML that packs multiple related skills into one slash command. Running /bundle-name loads every listed skill at once.

File location: ~/.hermes/skill-bundles/<slug>.yaml

name: backend-dev
description: |
  Full backend feature workflow — code review, TDD, and PR management.
skills:
  - github-code-review
  - test-driven-development
  - github-pr-workflow
instruction: |
  Always write failing tests first before implementation.
  Never push directly to main.

Advanced scenarios:

  • AI researcher workflow (research-session): arxiv + deep-research + plan + excalidraw—query recent papers and sketch architecture each session.
  • MLOps deploy pipeline (mlops-deploy): vllm + llama-cpp + github-pr-workflow + systematic-debugging—benchmark inference before and after deploy and log quantization settings.

Bundle priority rules:

  • When a Bundle and a single Skill share a name, the Bundle wins.
  • Missing skills in a Bundle are skipped without error; the loader reports what was absent.
  • Bundles do not alter the system prompt, so they stay Prompt Cache friendly.
hermes bundles create backend-dev \
  --skills github-code-review,test-driven-development,github-pr-workflow \
  --instruction "Always write failing tests first"

5. Conditional activation: environment-aware skills

Skills can auto-show or hide based on which tools are available in the current session. Configure under metadata.hermes:

metadata:
  hermes:
    requires_toolsets: [web]
    requires_tools: [web_search]
    fallback_for_toolsets: [browser]
    fallback_for_tools: [browser_navigate]
Field Behavior
requires_toolsets Hide this skill when listed toolsets are missing
requires_tools Hide this skill when listed tools are missing
fallback_for_toolsets Hide when listed toolsets exist (acts as fallback)
fallback_for_tools Hide when listed tools exist

Classic pattern: free vs paid search. Set duckduckgo-search with fallback_for_tools: [web_search]: when FIRECRAWL_KEY or BRAVE_SEARCH_KEY is configured, paid web_search activates and the DuckDuckGo skill disappears to save tokens; when the API is unavailable, the fallback surfaces automatically.

Platform-aware example: telegram-notify can set requires_toolsets: [messaging] and platforms: [telegram, discord]. The hermes skills TUI lets you toggle skills independently for CLI, Telegram, and Discord.

6. Skills Hub and open-source ecosystem

6.1 Official install channels

hermes skills install official/research/arxiv
hermes skills install https://example.com/SKILL.md --name my-skill
hermes skills install github:openai/skills/k8s
hermes skills tap add github:my-org/my-skills

6.2 Repos worth bookmarking

Repository Description Highlights
ChuckSRQ/awesome-hermes-skills Curated production-grade skills Deep Research, MLOps, Apple integrations; 23 skills with GitHub Copilot
amanning3390/hermeshub Community skill registry Security scanning, API, and marketplace features
kevinnft/ai-agent-skills 191 skills, 28 categories One-command install for Hermes, Claude Code, and Cursor
NousResearch/hermes-agent Official main repo Authoritative source with all built-in Skills and authoring specs

The agentskills.io open standard means Skills work across Hermes, Claude Code, Cursor, and OpenCode. Validate compliance with skills-ref validate ./my-skill.

7. Publishing your own Skill Tap: team and community sharing

Create a GitHub repo as a Tap so your team—or the wider community—can subscribe to your skill set. This is one of the least-documented advanced techniques.

my-skills-tap/
├── skills.sh.json
├── mlops/vllm-deploy/SKILL.md
├── research/paper-summarizer/SKILL.md
└── README.md

skills.sh.json controls Hub category display. Team deployment:

hermes skills tap add github:your-org/your-skills-tap
hermes skills tap add github:your-org/private-skills --token $GH_TOKEN
hermes skills tap update
hermes skills tap list

Version management tip: put ~/.hermes/skills/ under Git. Across devices, git pull && hermes skills reset syncs and rebuilds built-in skills.

8. Self-evolving skills with GEPA + DSPy

This is Hermes's most distinctive capability versus peer tools. GEPA (Genetic-Pareto Prompt Evolution) is an ICLR 2026 Oral result, integrated in hermes-agent-self-evolution. The idea: do not fine-tune model weights—analyze execution traces, generate variants, and run multi-objective Pareto optimization on SKILL.md text. Each optimization run costs roughly $2–10 (API calls only; no GPU required).

8.1 GEPA five-stage evolution pipeline

  1. Stage 1 — trace collection: read full reasoning traces from SQLite (tool calls, branches, errors).
  2. Stage 2 — reflective failure analysis: LLM produces actionable side information—not just "it failed," but why.
  3. Stage 3 — targeted mutation: generate 10–20 SKILL.md variants aimed at failure causes.
  4. Stage 4 — multi-objective Pareto evaluation: optimize success rate, token efficiency, and speed together.
  5. Stage 5 — human review PR: best variant becomes a PR; ships after human approval.

8.2 Quick start

git clone https://github.com/NousResearch/hermes-agent-self-evolution
cd hermes-agent-self-evolution && pip install -r requirements.txt
export HERMES_AGENT_PATH=~/.hermes

python -m evolution.skills.evolve_skill \
    --skill github-code-review --iterations 10 --eval-source synthetic

python -m evolution.skills.evolve_skill \
    --skill github-code-review --iterations 10 --eval-source sessiondb

8.3 Four safety guardrails

  1. Full test suite: pytest tests/ -q must pass 100%.
  2. Size limits: Skills ≤ 15KB; tool descriptions ≤ 500 characters.
  3. Prompt Cache compatibility: no mid-session edits that invalidate cache.
  4. Semantic preservation: variants must not drift from the skill's core purpose.

8.4 Five-phase evolution roadmap (official status)

Phase Target Engine Status
Phase 1 Skill files (SKILL.md) DSPy + GEPA Shipped
Phase 2 Tool descriptions DSPy + GEPA Planned
Phase 3 System prompt fragments DSPy + GEPA Planned
Phase 4 Tool implementation code Darwinian Evolver Planned
Phase 5 Continuous improvement loop Automated pipeline Planned

Cross-platform trace fusion: because Skills follow agentskills.io, you can feed Claude Code or Gemini CLI traces into GEPA:

python -m evolution.skills.evolve_skill \
    --skill github-code-review --iterations 10 --eval-source mixed \
    --trace-dirs ~/.claude/traces,~/.hermes/sessions

9. Plugin skills: extending Hermes boundaries

Plugins package skills under a namespace (plugin:skill): they do not appear in the default skills_list (less noise), activate only on explicit user invocation (opt-in), and skills inside a plugin can reference each other.

skill_view("superpowers:writing-plans")
# Loading also surfaces sibling skills in the same plugin

Declare in the plugin plugin.yaml:

name: my-hermes-plugin
skills:
  - name: writing-plans
    path: skills/writing-plans/SKILL.md
  - name: editing
    path: skills/editing/SKILL.md

10. Advanced authoring tips (engineer view)

10.1 description drives activation precision

Too vague: Helps with code. Better: Use when reviewing a pull request... Do NOT use for writing new code.

10.2 Pitfalls separate good skills from noise

High-quality Pitfalls name specific failure modes, root causes, and fix steps—fragile CSS selectors, GitHub API rate limits, large diff token overflow, and chunking strategies to handle them.

10.3 Scripting and skill_manage

Reference scripts/extract_schema.py in Procedure; on failure, load references/manual-extract.md. Agents can maintain skills via skill_manage(action='patch'| 'create'). Set skills.agent_writes_require_approval: true in config.yaml for a human approval gate.

10.4 Skill size control

Skill size Recommendation
< 500 lines Keep everything in SKILL.md
500–1000 lines Move deep reference to references/
> 1000 lines Split strongly; consider two skills
> 15KB Exceeds GEPA limit; must split

11. Case study: building Skills for your tech blog workflow

Build a blog-workflow Bundle packing seo-keyword-research, outline-generator, code-example-validator, bilingual-checker, and publish-to-platform:

name: blog-workflow
description: Full tech blog writing workflow.
skills:
  - seo-keyword-research
  - outline-generator
  - code-example-validator
  - bilingual-checker
  - publish-to-platform
instruction: |
  Always research SEO keywords before writing.
  Ensure all code examples are tested and runnable.
  Generate both Chinese and English title options.

A custom seo-keyword-research skill should query long-tail terms at session start ("how to X," "X vs Y," platform-specific agent terminology), output 3–5 primary keywords plus 10–15 long-tail entries, and note terminology differences across locales.

12. FAQ

Q: What is the difference between Skills and MCP?
Skills are procedural knowledge documents; MCP is a tool interface. MCP provides database access; a Skill teaches the agent how to run a migration correctly. They complement each other.

Q: Why does the agent still use an old Skill after I edited it?
Run /reset for a new session, or reinstall with --now (which breaks Prompt Cache).

Q: Are GEPA-evolved skills safe?
Four guardrails plus human PR review; semantic drift checks prevent purpose drift. Still review every diff before merge.

Q: How do I reuse Hermes Skills in Claude Code?
Copy to ~/.claude/skills/ or use the kevinnft/ai-agent-skills multi-platform installer.

Q: Does Chinese content in Skills hurt token efficiency?
Chinese runs roughly 1–1.5 tokens per character, similar to English density. Keep description in English (or bilingual) for sharper LLM matching.

13. Resources and remote Mac 7x24 decision

13.1 Official and community resources

13.2 Deployment scenario decision matrix

Scenario Local Mac / laptop Remote Mac 7x24 (SFTPMAC)
GEPA evolution + sessiondb Lid close breaks traces; incomplete samples Continuous session traces; richer evolution data
Telegram/Discord Gateway Sleep and Wi-Fi drops take the bot offline launchd supervision keeps Gateway online
Team Tap sync Each developer's ~/.hermes drifts Unified node + SFTP/rsync for skills directory
Skill Bundles long sessions Memory and tokens compete with other apps Apple Silicon unified memory; stable multi-skill runs

13.3 Summary: from local experiments to production agent nodes

This guide covered the full Hermes Skills stack: concept comparison, SKILL.md and Progressive Disclosure, Skill Bundles, conditional activation, Hub ecosystem, Tap publishing, GEPA self-evolution, Plugin extensions, authoring tips, and a blog workflow case study. Mastering it upgrades your agent from a disposable prompt into a versionable, shareable, self-improving procedural asset.

Local Hermes has clear limits: laptop lid close breaks connectivity, GEPA sessiondb traces stay incomplete, Telegram Gateway drops when the machine sleeps, and team Taps drift across developer machines. Teams that need 7x24 uptime for evolution traces, Gateway hosting, or unified ~/.hermes/skills/ sync should run Hermes on an Apple Silicon remote Mac—native launchd supervision, macOS-aligned toolchains, and SFTP/rsync for secure skills directory sync.

SFTPMAC remote Mac rental targets Hermes Agent Skills workflows: 7x24 Gateway, continuous GEPA trace collection, and team Tap directory sync—a better production entry than a home Mac pulling double duty. Start with your first SKILL.md today and let the agent compound from every session.