Progressive Disclosure: Layered Architecture for Claude Code

Today, using .md files to provide specific instructions to Code Agents (like Claude Code) is essential to achieve immediate good results without degrading the context window.

In the landscape of artificial intelligence applied to software development, one of the most significant challenges concerns optimal context window management. When working with systems like Claude Code, every token counts, and efficiency in using available context determines not only the quality of responses but also the economic and performance sustainability of the entire system. It's in this scenario that the concept of progressive disclosure emerges—an intelligent architecture that is redefining how AI agents access and process information.

The Context Window Problem: A Bottleneck to Overcome

Before diving into progressive disclosure architecture, it's essential to understand the nature of the problem this methodology solves. The context window represents the maximum amount of information a language model can process at a single moment. No matter how advanced models like Claude are, this window remains limited and precious.

Imagine having to consult an entire encyclopedia to answer a simple question: would you load all volumes simultaneously or first search the general index? The answer is obvious, but traditionally many AI systems have followed precisely the "encyclopedic" approach, loading enormous amounts of documentation, skills, and references into the context window, regardless of their actual relevance to the specific task.

This inefficiency translates into three concrete problems:

Premature saturation of the context window with irrelevant information
Increased computational costs and response times
Potential degradation of response quality due to "information noise"

Progressive Disclosure: A Layered Index for Artificial Intelligence

Progressive disclosure architecture represents a radical paradigm shift. Instead of loading all available content, the system operates through progressive levels of detail, revealing information only when actually necessary.

The First Layer: Frontmatter and Metadata

When Claude Code receives a new task, the first operation doesn't consist of loading entire skills or documentation, but rather analyzing exclusively the frontmatter of available .md files. This frontmatter contains essential metadata:

---
name: "Form Validation"
description: "Advanced form validation management with React Hook Form and Zod"
category: "frontend"
tags: ["forms", "validation", "react"]
---

This operation costs very few tokens, typically between 10 and 30 per skill, allowing Claude to quickly scan dozens of different skills without significantly impacting the context window. It's the equivalent of browsing a book's index before deciding which chapter to read.

The Second Layer: SKILL.md and Overall Framework

Once potentially relevant skills are identified through frontmatter, Claude proceeds to load the main SKILL.md file. This document provides:

# Form Validation Skill

## Objective
Implement robust and user-friendly form validation systems

## Guiding Principles
- Client-side and server-side validation
- Immediate user feedback
- Schema-based validation with Zod

## Structure
- forms.md: Form patterns and components
- validation.md: Advanced validation logic
- error-handling.md: UX error handling

The SKILL.md file acts as a conceptual map, offering the overall framework without entering implementation details. It allows Claude to understand the skill's overall architecture, basic rules, and fundamental principles, enabling informed decisions about which specific details might be needed.

The Third Layer: Specific On-Demand References

Only at this point, if the task explicitly requires it, does Claude load specific reference files. If the task involves creating a registration form, forms.md will be loaded. If instead it's about implementing complex validation logic, validation.md will be consulted.

This approach ensures the context window contains exclusively information directly applicable to the current task, maximizing efficiency and relevance of generated responses.

Concrete Advantages of Layered Architecture

Context Window Efficiency

The most immediate benefit concerns optimal use of available contextual space. In a traditional scenario, loading 10 complete skills could consume 50,000-100,000 tokens. With progressive disclosure, the same discovery operation requires barely 200-500 tokens for frontmatter scanning, plus 2,000-5,000 tokens for relevant SKILL.md files, leaving ample room for actually necessary content.

System Scalability

As an AI agent's knowledge base grows, traditional architecture quickly becomes unsustainable. With 50 or 100 available skills, preloading everything would be impossible. Progressive disclosure instead allows almost linear scaling, since discovery cost remains contained regardless of the total number of available skills.

Operational Cost Reduction

Considering that advanced AI models have costs proportional to processed tokens, context window optimization translates directly into significant economic savings, especially at high request volumes.

Response Quality Improvement

Counter-intuitively, less information can generate better responses. By eliminating "noise" represented by irrelevant documentation, Claude can focus exclusively on elements pertinent to the task, producing more focused and precise outputs.

Practical Implementation: File Organization

Progressive disclosure effectiveness strongly depends on file organizational structure. Here's an example of optimal architecture:

skills/
├── frontend/
│   ├── forms/
│   │   ├── SKILL.md          # Overall framework
│   │   ├── forms.md          # Form patterns
│   │   ├── validation.md     # Validation
│   │   └── error-handling.md # Error handling
│   ├── state-management/
│   │   ├── SKILL.md
│   │   └── ...
├── backend/
│   ├── api-design/
│   │   ├── SKILL.md
│   │   └── ...

Each directory represents an autonomous skill, with the SKILL.md file as entry point and reference files as modular deep-dives.

Best Practices for Effective Frontmatter

Frontmatter quality determines the entire system's effectiveness. Here are the characteristics of optimal frontmatter:

---
name: "API Authentication & Authorization"
description: "Implementing JWT, OAuth2 auth systems, RBAC permission management"
category: "backend"
tags: ["auth", "security", "jwt", "oauth"]
complexity: "intermediate"
related: ["api-design", "security-best-practices"]
---

Key elements:

Concise but descriptive name: must immediately communicate the skill's scope
Keyword-rich description: facilitates semantic matching with tasks
Specific tags: allow quick filtering by technologies or patterns
Structured metadata: complexity and related help Claude contextualize

Execution Flow: A Concrete Example

Let's see how Claude Code operates with progressive disclosure in a real scenario:

Received task: "Create a login form with email and password validation, user-friendly error handling"

Phase 1: Frontmatter Scanning

Claude analyzes frontmatter of all available skills (cost: ~300 tokens for 30 skills). Identifies as relevant:

# forms/SKILL.md frontmatter
name: "Form Validation"
description: "Form management with validation"

# auth/SKILL.md frontmatter
name: "Authentication Patterns"
description: "Login, registration, password recovery patterns"

Phase 2: SKILL.md Loading

Loads SKILL.md files of the two identified skills (cost: ~3,000 tokens). Understands that:

For the form, needs to consult forms/forms.md and forms/validation.md
For auth context needs auth/login-patterns.md

Phase 3: Specific References

Loads only the three identified reference files (cost: ~6,000 tokens). At this point has everything necessary to generate the solution, having consumed about 9,300 tokens instead of the 50,000+ that would have been necessary preloading everything.

Advanced Architectural Considerations

Intelligent Caching

Once a reference file is loaded, it could remain in cache for subsequent related tasks, further optimizing efficiency. Claude Code can implement caching strategies based on the probability that a reference remains relevant for next operations.

Dependency Resolution

Some skills conceptually depend on others. The SKILL.md file can declare these dependencies through the related field, allowing Claude to preload references that will almost certainly be needed.

---
name: "Advanced Form Patterns"
related: ["form-validation", "state-management"]
requires: ["form-validation"]  # Mandatory dependency
---

Adaptive Loading

More sophisticated systems could implement adaptive loading strategies, where Claude learns from experience which references tend to be needed together, proactively optimizing loading.

Limitations and Critical Issues to Consider

Like any architecture, progressive disclosure also presents some challenges:

Additional Latency

Progressive loading potentially introduces additional latencies, since content is retrieved in multiple phases. In contexts where speed is critical, it might be necessary to balance progressive disclosure with strategic pre-loading.

Maintenance Complexity

Maintaining accurate frontmatter and always updated descriptions requires discipline. Outdated frontmatter can lead Claude to ignore relevant skills or load wrong ones.

Over-Segmentation

There's a risk of excessively fragmenting knowledge, creating too many small files that complicate overall management. Finding the right level of granularity is an art.

A Necessary Paradigm Shift

Progressive disclosure architecture isn't simply a technical optimization, but represents a fundamental change in how we conceive the relationship between AI agents and knowledge bases. Just as humans don't memorize entire encyclopedias but develop strategies to effectively access information when needed, so AI systems must evolve toward more intelligent and selective approaches.

For those developing with Claude Code or other agent-based systems, adopting this architecture means not only saving tokens and costs, but building intrinsically more scalable, maintainable, and performant systems. It's an investment in infrastructure that grows sustainably with knowledge base expansion.

Progressive disclosure teaches us a fundamental lesson: in artificial intelligence as in life, more information doesn't always equal better understanding. Sometimes, true intelligence lies in knowing what to ignore and when to seek more.

Progressive Disclosure: The Layered Architecture Revolutionizing Claude Code Efficiency with MD Files