Core knowledge summary of the Skill system > Original source translated from BestBlogs.dev
Skill System Core Knowledge Summary
Skill System Core Knowledge Summary
Original: Do You Really Understand Skills?
Author: Li Xiaoyu
Source: BestBlogs.dev
Core Conclusion
There are 5 execution modes for Skills, not just function calling. All 16 official Anthropic Skills are driven by the skill_run sandbox command, without using Tools: declarations.
Core formula:
Skill Execution Power = SKILL.md body quality × (Agent base tools + skill_run sandbox capabilities)1. Five Execution Modes
Mode 1: Pure Prompt Injection
| Item | Description |
|---|---|
| Representatives | frontend-design, brand-guidelines, algorithmic-art |
| Structure | Only SKILL.md, no scripts/ |
| Principle | Skill = carefully crafted system prompt providing domain knowledge and behavioral constraints |
| Execution | LLM writes code directly based on injected guidelines |
Key Insight: Encode expert tacit knowledge (aesthetic taste, design methodology) into explicit rules expressed in LLM-understandable ways.
Mode 2: Script Execution
| Item | Description |
|---|---|
| Representatives | pdf, pptx, xlsx, webapp-testing |
| Structure | SKILL.md (user manual) + scripts/ (pre-built scripts) |
| Principle | SKILL.md teaches LLM how to use scripts; complex tasks call pre-built tools |
| Execution | skill_run("python3 scripts/xxx.py") executes in sandbox |
Difference Between Two Types of Code:
| SKILL.md Code Examples | scripts/ Pre-built Scripts | |
|---|---|---|
| Purpose | Teach LLM how to write code | Direct execution black-box tools |
| Who Executes | LLM writes + Agent base tools | skill_run in sandbox |
| Suitable For | Simple one-time operations | Complex, verification-required workflows |
Token Efficiency: 1 skill_run (20 tokens) vs 8 function calls (1600-4000 tokens)
Mode 3: Library Calling
| Item | Description |
|---|---|
| Representatives | slack-gif-creator |
| Structure | SKILL.md (API docs) + core/ (Python library) |
| Principle | LLM writes code on-the-fly import core.xxx, combining library functions to complete tasks |
| Characteristics | LLM changes from "caller" to "developer" |
Key Difference:
| Script Execution (pdf) | Library Calling (slack-gif-creator) | |
|---|---|---|
| File Structure | scripts/xxx.py (standalone executable) | core/xxx.py (Python module) |
| Calling Method | skill_run("python3 scripts/xxx.py") | LLM writes script import core.xxx |
| LLM Role | Caller (runs pre-built scripts) | Developer (combines library functions) |
Mode 4: Progressive Document Loading
| Item | Description |
|---|---|
| Representatives | pptx, mcp-builder |
| Structure | SKILL.md (routing table) + multiple detailed docs (editing.md, pptxgenjs.md) |
| Principle | Three-layer information model, load on demand |
| Advantage | 46% token efficiency improvement, avoiding loading all documents at once |
Three-Layer Information Model:
Layer 1 description (~50 words)
"Presentation creation, editing, and analysis..."
→ Always in available skills list, LLM judges if loading needed
Layer 2 SKILL.md body (~200 lines)
Quick Reference routing table + design guidelines + QA workflow
→ Injected into system message after loading, LLM knows general direction
Layer 3 editing.md / pptxgenjs.md (on-demand)
Detailed operation steps + code examples
→ LLM loads on-demand through skill_select_docsToken Savings:
- Load all at once: ~28.7KB ≈ 7000 tokens
- Progressive loading (editing task): ~3750 tokens (46% savings)
Mode 5: Orchestration
| Item | Description |
|---|---|
| Representatives | skill-creator |
| Structure | Ultra-detailed orchestration guide (32KB) + sub-agent instructions + script toolchain |
| Principle | SKILL.md defines complete multi-stage pipeline |
| Characteristics | Uses elements from all four previous modes |
Execution Flow:
Capture Intent → Interview → Write SKILL.md → Run Tests
→ Evaluate → Improve → Repeat → PackageEssence: SKILL.md is not teaching LLM how to use a tool, but orchestrating LLM execution of a complex multi-step project.
2. Skill Framework Core Architecture
Data Flow (7-Step Chain)
SKILL.md → FsSkillRepository (scanning/parsing)
→ Skill Object → SkillToolSet (6 management tools + skill_run)
→ DynamicSkillToolSet (Token optimization)
→ SkillsRequestProcessor (request injection)
→ LLM function_call → Sandbox executionCore Components
| Component | Function |
|---|---|
| state_delta | CQRS architecture, decouples writers and readers |
| skill_run | Universal execution engine, supports arbitrary shell commands |
| Incremental Hash | compute_dir_digest(), zero overhead for repeated calls |
| Sandbox Isolation | /tmp/ws_xxx/ workspace, read-only protection + symlinks |
| Environment Variables | $WORKSPACE_DIR, $OUTPUT_DIR, $SKILL_NAME auto-injected |
skill_run Sandbox Workspace Layout
/tmp/ws_session123/
├── skills/pdf/ ← Skill directory (read-only protection)
│ ├── SKILL.md
│ ├── scripts/
│ ├── out/ → ../../out ← Symlink
│ └── work/ → ../../work ← Symlink
├── out/ ← $OUTPUT_DIR
├── work/ ← $WORK_DIR
└── runs/ ← Execution recordsAuto-Injected Environment Variables
| Variable | Points To | Purpose |
|---|---|---|
$WORKSPACE_DIR | /tmp/ws_session123/... | Access workspace root |
$SKILLS_DIR | $WORKSPACE_DIR/skills | Reference other skill files |
$WORK_DIR | $WORKSPACE_DIR/work | Store intermediate files |
$OUTPUT_DIR | $WORKSPACE_DIR/out | Store final output |
$RUN_DIR | $WORKSPACE_DIR/runs/run_xxx | Access execution records |
$SKILL_NAME | pdf | Know which skill this is |
Three-Layer Information Model (Token Economics)
| Layer | Trigger Condition | Token Consumption | Content |
|---|---|---|---|
| L0 | Always injected | ~30/skill | name + description |
| L1 | After skill_load | ~500-2000 | SKILL.md body |
| L2 | After skill_select_docs | ~1000-5000 | Detailed reference documents |
3. Why Not Use Function Calling?
Anthropic's answer: Teaching LLM to write code is more flexible than registering APIs
Three Main Reasons
LLM Itself Is the Best Code Executor
- After seeing code examples, it can write operations 9, 10 that authors didn't think of
- Function calling limits flexibility - can only call predefined functions
skill_run Is the Universal Fallback
- Any language, any command, if shell can run it, it can execute
- No need to define tool Schema for every operation
Token Efficiency
- 1 skill_run (20 tokens) vs 8 function calls (1600-4000 tokens)
- 10 rounds of dialogue gap: 16k - 40k tokens
Tools: Scenarios Are Narrow
- Only needed for internal API calls, databases that can't be done through code
- All 16 Anthropic Skills can be completed through code + scripts
Framework Supports But Not Used
The framework indeed implements complete Tools: declaration → DynamicSkillToolSet → function calling chain, but Anthropic's own Skills didn't use any — the framework预留 capabilities, but practice found they're not needed.
4. Practical Recommendations
| Scenario | Recommended Mode | Key Actions |
|---|---|---|
| Teaching standards/style | Pure Prompt Injection | Write good SKILL.md |
| Operating specific file formats | Script Execution | Write scripts/ + usage instructions |
| Flexible API combination | Library Calling | Write core/ + API docs |
| Large knowledge volume | Progressive Loading | SKILL.md as routing table |
| Complex multi-step workflow | Orchestration | Define pipeline + sub-agents |
Starting Advice: First write a pure Prompt Injection skill with only SKILL.md, get it running, then gradually add scripts, libraries, and documents as needed.
5. Architecture Highlights
1. CQRS Decoupling
Writers (_tools.py) and readers (_skill_processor.py) decoupled through state_delta, evolve independently.
2. Funnel Loading
"Coarse filter then fine selection" — three-layer information gradual injection, maximize token efficiency.
3. Declarative Binding
Tool name string matching rather than import references, loose coupling design.
4. Unified IR
FunctionDeclaration as intermediate representation, shields differences between different LLM APIs.
5. Incremental Staging
compute_dir_digest() calculates directory hash, zero overhead for repeated calls.
6. Symlink Transparency
out/, work/ symlinks let scripts access shared directories "transparently", scripts don't need to know they run in a sandbox.
6. Unified View of Five Modes
Skill Execution Mode Spectrum
◄── Lightweight ────────────────────────────────── Heavyweight ──►
Pure Prompt Progressive Library Script Orchestration
Injection Loading Calling Execution
────────── ────────── ────────── ────────── ──────────
frontend- pptx slack-gif- pdf skill-
design mcp-builder creator xlsx creator
────────── ────────── ────────── ────────── ──────────
Only inject Inject + Inject + Inject + Inject +
body to on-demand LLM writes pre-built multi-step
system load docs code scripts workflow
message import lib skill_runBase Shared by All Modes
| Framework Mechanism | Mode 1 | Mode 2 | Mode 3 | Mode 4 | Mode 5 |
|---|---|---|---|---|---|
skill_load → body injection | ✅ | ✅ | ✅ | ✅ | ✅ |
skill_select_docs | ❌ | ✅ | ❌ | ✅ | ✅ |
skill_run (pre-built scripts) | ❌ | ✅ | ❌ | ✅ | ✅ |
skill_run (LLM-written scripts) | ❌ | ❌ | ✅ | ❌ | ❌ |
Tools: declaration (function calling) | ❌ | ❌ | ❌ | ❌ | ❌ |
| Sub-agent orchestration | ❌ | ❌ | ❌ | ❌ | ✅ |
Summary
Anthropic's design philosophy for the Skill system is essentially a profound understanding of LLM capability boundaries:
They didn't make Skill a "register more APIs for LLM" system, but a "teach LLM more knowledge" system.
Because LLM's strongest capability is not calling APIs, but understanding natural language instructions and autonomously solving problems.
SKILL.md is essentially an extremely efficient knowledge transfer method: using the fewest tokens to inject the most critical domain knowledge, operation standards, and tool usage into LLM's context.