Quick Answer

Citation Ready

Core knowledge summary of the Skill system > Original source translated from BestBlogs.dev

Skill System Core Knowledge Summary

Original: Do You Really Understand Skills?
Author: Li Xiaoyu
Source: BestBlogs.dev

Core Conclusion

There are 5 execution modes for Skills, not just function calling. All 16 official Anthropic Skills are driven by the skill_run sandbox command, without using Tools: declarations.

Core formula:

Skill Execution Power = SKILL.md body quality × (Agent base tools + skill_run sandbox capabilities)

1. Five Execution Modes

Mode 1: Pure Prompt Injection

Item	Description
Representatives	frontend-design, brand-guidelines, algorithmic-art
Structure	Only `SKILL.md`, no scripts/
Principle	Skill = carefully crafted system prompt providing domain knowledge and behavioral constraints
Execution	LLM writes code directly based on injected guidelines

Key Insight: Encode expert tacit knowledge (aesthetic taste, design methodology) into explicit rules expressed in LLM-understandable ways.

Mode 2: Script Execution

Item	Description
Representatives	pdf, pptx, xlsx, webapp-testing
Structure	`SKILL.md` (user manual) + `scripts/` (pre-built scripts)
Principle	SKILL.md teaches LLM how to use scripts; complex tasks call pre-built tools
Execution	`skill_run("python3 scripts/xxx.py")` executes in sandbox

Difference Between Two Types of Code:

	SKILL.md Code Examples	scripts/ Pre-built Scripts
Purpose	Teach LLM how to write code	Direct execution black-box tools
Who Executes	LLM writes + Agent base tools	`skill_run` in sandbox
Suitable For	Simple one-time operations	Complex, verification-required workflows

Token Efficiency: 1 skill_run (20 tokens) vs 8 function calls (1600-4000 tokens)

Mode 3: Library Calling

Item	Description
Representatives	slack-gif-creator
Structure	`SKILL.md` (API docs) + `core/` (Python library)
Principle	LLM writes code on-the-fly `import core.xxx`, combining library functions to complete tasks
Characteristics	LLM changes from "caller" to "developer"

Key Difference:

	Script Execution (pdf)	Library Calling (slack-gif-creator)
File Structure	`scripts/xxx.py` (standalone executable)	`core/xxx.py` (Python module)
Calling Method	`skill_run("python3 scripts/xxx.py")`	LLM writes script `import core.xxx`
LLM Role	Caller (runs pre-built scripts)	Developer (combines library functions)

Mode 4: Progressive Document Loading

Item	Description
Representatives	pptx, mcp-builder
Structure	`SKILL.md` (routing table) + multiple detailed docs (editing.md, pptxgenjs.md)
Principle	Three-layer information model, load on demand
Advantage	46% token efficiency improvement, avoiding loading all documents at once

Three-Layer Information Model:

Layer 1 description (~50 words)
"Presentation creation, editing, and analysis..."
→ Always in available skills list, LLM judges if loading needed

Layer 2 SKILL.md body (~200 lines)
Quick Reference routing table + design guidelines + QA workflow
→ Injected into system message after loading, LLM knows general direction

Layer 3 editing.md / pptxgenjs.md (on-demand)
Detailed operation steps + code examples
→ LLM loads on-demand through skill_select_docs

Token Savings:

Load all at once: ~28.7KB ≈ 7000 tokens
Progressive loading (editing task): ~3750 tokens (46% savings)

Mode 5: Orchestration

Item	Description
Representatives	skill-creator
Structure	Ultra-detailed orchestration guide (32KB) + sub-agent instructions + script toolchain
Principle	SKILL.md defines complete multi-stage pipeline
Characteristics	Uses elements from all four previous modes

Execution Flow:

Capture Intent → Interview → Write SKILL.md → Run Tests
→ Evaluate → Improve → Repeat → Package

Essence: SKILL.md is not teaching LLM how to use a tool, but orchestrating LLM execution of a complex multi-step project.

2. Skill Framework Core Architecture

Data Flow (7-Step Chain)

SKILL.md → FsSkillRepository (scanning/parsing)
  → Skill Object → SkillToolSet (6 management tools + skill_run)
  → DynamicSkillToolSet (Token optimization)
  → SkillsRequestProcessor (request injection)
  → LLM function_call → Sandbox execution

Core Components

Component	Function
state_delta	CQRS architecture, decouples writers and readers
skill_run	Universal execution engine, supports arbitrary shell commands
Incremental Hash	`compute_dir_digest()`, zero overhead for repeated calls
Sandbox Isolation	`/tmp/ws_xxx/` workspace, read-only protection + symlinks
Environment Variables	`$WORKSPACE_DIR`, `$OUTPUT_DIR`, `$SKILL_NAME` auto-injected

skill_run Sandbox Workspace Layout

/tmp/ws_session123/
├── skills/pdf/           ← Skill directory (read-only protection)
│   ├── SKILL.md
│   ├── scripts/
│   ├── out/  → ../../out      ← Symlink
│   └── work/ → ../../work     ← Symlink
├── out/                  ← $OUTPUT_DIR
├── work/                 ← $WORK_DIR
└── runs/                 ← Execution records

Auto-Injected Environment Variables

Variable	Points To	Purpose
`$WORKSPACE_DIR`	`/tmp/ws_session123/...`	Access workspace root
`$SKILLS_DIR`	`$WORKSPACE_DIR/skills`	Reference other skill files
`$WORK_DIR`	`$WORKSPACE_DIR/work`	Store intermediate files
`$OUTPUT_DIR`	`$WORKSPACE_DIR/out`	Store final output
`$RUN_DIR`	`$WORKSPACE_DIR/runs/run_xxx`	Access execution records
`$SKILL_NAME`	`pdf`	Know which skill this is

Three-Layer Information Model (Token Economics)

Layer	Trigger Condition	Token Consumption	Content
L0	Always injected	~30/skill	name + description
L1	After `skill_load`	~500-2000	SKILL.md body
L2	After `skill_select_docs`	~1000-5000	Detailed reference documents

3. Why Not Use Function Calling?

Anthropic's answer: Teaching LLM to write code is more flexible than registering APIs

Three Main Reasons

LLM Itself Is the Best Code Executor
- After seeing code examples, it can write operations 9, 10 that authors didn't think of
- Function calling limits flexibility - can only call predefined functions
skill_run Is the Universal Fallback
- Any language, any command, if shell can run it, it can execute
- No need to define tool Schema for every operation
Token Efficiency
- 1 skill_run (20 tokens) vs 8 function calls (1600-4000 tokens)
- 10 rounds of dialogue gap: 16k - 40k tokens
Tools: Scenarios Are Narrow
- Only needed for internal API calls, databases that can't be done through code
- All 16 Anthropic Skills can be completed through code + scripts

Framework Supports But Not Used

The framework indeed implements complete Tools: declaration → DynamicSkillToolSet → function calling chain, but Anthropic's own Skills didn't use any — the framework预留 capabilities, but practice found they're not needed.

4. Practical Recommendations

Scenario	Recommended Mode	Key Actions
Teaching standards/style	Pure Prompt Injection	Write good SKILL.md
Operating specific file formats	Script Execution	Write `scripts/` + usage instructions
Flexible API combination	Library Calling	Write `core/` + API docs
Large knowledge volume	Progressive Loading	SKILL.md as routing table
Complex multi-step workflow	Orchestration	Define pipeline + sub-agents

Starting Advice: First write a pure Prompt Injection skill with only SKILL.md, get it running, then gradually add scripts, libraries, and documents as needed.

5. Architecture Highlights

1. CQRS Decoupling

Writers (_tools.py) and readers (_skill_processor.py) decoupled through state_delta, evolve independently.

2. Funnel Loading

"Coarse filter then fine selection" — three-layer information gradual injection, maximize token efficiency.

3. Declarative Binding

Tool name string matching rather than import references, loose coupling design.

4. Unified IR

FunctionDeclaration as intermediate representation, shields differences between different LLM APIs.

5. Incremental Staging

compute_dir_digest() calculates directory hash, zero overhead for repeated calls.

6. Symlink Transparency

out/, work/ symlinks let scripts access shared directories "transparently", scripts don't need to know they run in a sandbox.

6. Unified View of Five Modes

Skill Execution Mode Spectrum

◄── Lightweight ────────────────────────────────── Heavyweight ──►

Pure Prompt   Progressive    Library     Script      Orchestration
Injection     Loading        Calling     Execution

──────────   ──────────   ──────────   ──────────   ──────────
frontend-    pptx          slack-gif-   pdf          skill-
design       mcp-builder   creator      xlsx         creator
──────────   ──────────   ──────────   ──────────   ──────────

Only inject   Inject +      Inject +     Inject +     Inject +
body to       on-demand     LLM writes   pre-built    multi-step
system        load docs     code         scripts      workflow
message       import lib    skill_run

Base Shared by All Modes

Framework Mechanism	Mode 1	Mode 2	Mode 3	Mode 4	Mode 5
`skill_load` → body injection	✅	✅	✅	✅	✅
`skill_select_docs`	❌	✅	❌	✅	✅
`skill_run` (pre-built scripts)	❌	✅	❌	✅	✅
`skill_run` (LLM-written scripts)	❌	❌	✅	❌	❌
`Tools:` declaration (function calling)	❌	❌	❌	❌	❌
Sub-agent orchestration	❌	❌	❌	❌	✅

Summary

Anthropic's design philosophy for the Skill system is essentially a profound understanding of LLM capability boundaries:

They didn't make Skill a "register more APIs for LLM" system, but a "teach LLM more knowledge" system.

Because LLM's strongest capability is not calling APIs, but understanding natural language instructions and autonomously solving problems.

SKILL.md is essentially an extremely efficient knowledge transfer method: using the fewest tokens to inject the most critical domain knowledge, operation standards, and tool usage into LLM's context.