From Prompts to SOPs: The Rise of Intelligence Engineering
Contact Me
- Blog: https://cugtyt.github.io/blog/llm-application/index
- Email: cugtyt@qq.com
- GitHub: Cugtyt@GitHub
We’ve seen the evolution of how we work with large language models. First came Prompt Engineering—the art of crafting inputs to elicit better outputs. Then Context Engineering emerged—managing what information the model sees, when it sees it, and how it’s structured. Now we’re entering a new phase: Intelligence Engineering—the discipline of architecting agent systems that transform raw model reasoning into structured, repeatable, production-grade workflows.
As base models become increasingly powerful and commoditized, the differentiation shifts to the agent layer—how you structure intelligence for your specific use case.
The key insight is this: a model’s intelligence is necessary but not sufficient for production deployment. A coding model can generate code. But to actually fix a bug in your codebase, following your team’s conventions, respecting your security constraints, and validating against your test suite—that requires more than raw intelligence. It requires an SOP (Standard Operating Procedure)—a structured guide that tells the agent how to approach the task, what steps to follow, what constraints to respect, and how to know when it’s done.
Intelligence Engineering is about designing these SOPs. It’s the practice of encoding procedural knowledge into agent systems so that generic model capabilities become scenario-specific solutions. Different products implement SOPs differently—Claude calls them Skills, GitHub Copilot uses SKILL.md and agent.md files, Cursor has its own config formats—but they all serve the same purpose: guiding intelligence toward production-ready outcomes.
Two Layers of Intelligence
When we talk about “AI intelligence” in production systems, we’re really talking about two distinct layers:
Model Layer: The Thinking Capability
The model layer is where raw intelligence lives. This is what the LLM provides:
- Token generation: Producing coherent text, code, or structured output
- Reasoning: Following logical chains, analyzing problems, considering alternatives
- Knowledge retrieval: Drawing on training data to inform responses
- Pattern recognition: Understanding context, intent, and domain-specific patterns
This layer is powerful but generic. A coding model trained on millions of repositories can write Python, debug JavaScript, and explain algorithms. But it writes “public knowledge” code—solutions that work in general but may not fit your specific situation.
Agent Layer: The Doing Capability
The agent layer is where intelligence becomes actionable. This is what transforms a model into a production system:
- Memory: Maintaining context across interactions and sessions
- Tools: Accessing external systems—file systems, APIs, databases, terminals
- Lifecycle management: Knowing when to act, when to wait, when to ask for input
- SOPs: Following structured procedures to achieve specific outcomes
The agent layer doesn’t add more “thinking”—it adds structure to thinking. It defines how the model’s intelligence gets applied to real problems.
The key distinction: A model can write code. An agent with SOPs knows how to fix your specific bug in your specific codebase following your specific workflow.
SOPs: The Core of Intelligence Engineering
An SOP (Standard Operating Procedure) is what tells an agent how to achieve a task. It’s not separate pieces (tools, configs, instructions)—it’s one cohesive procedure that combines all of them into actionable guidance.
Different products implement SOPs in different formats, but they all contain the same core components:
Components of an SOP
| Component | What It Defines | Example |
|---|---|---|
| Tool Definitions | What actions are available | read_file, edit_file, run_terminal, search_code |
| Workflow Logic | How to sequence actions | “Read file → identify issue → propose fix → validate → apply” |
| Instructions | Domain guidance and constraints | Procedural steps, constraints, best practices |
| Success Criteria | How to know when done | “Tests pass, no new errors introduced, follows style guide” |
Example: A “Bug Fix” SOP
name: bug-fix
description: Fix a reported bug in the codebase
tools:
- read_file # Examine source code
- grep_search # Find related code and usages
- get_errors # Check current errors/warnings
- edit_file # Apply fixes
- run_terminal # Run tests and validation
instructions: |
When fixing a bug:
1. First understand the bug - read the error message, reproduce if possible
2. Locate the source - search for related code, understand the context
3. Identify root cause - don't just fix symptoms, find the actual problem
4. Consider impact - check what else uses this code (list_code_usages)
5. Propose minimal fix - change as little as necessary
6. Validate - run tests, check for new errors
7. If tests fail, iterate - don't stop until validation passes
constraints:
- Never modify test files to make tests pass
- Preserve existing code style and patterns
- If fix requires architectural changes, stop and report
success_criteria:
- Original error no longer occurs
- All existing tests pass
- No new errors introduced
This SOP encodes how to fix bugs—not just “write code that fixes it” but the complete procedure a skilled developer would follow.
Intelligence Engineering in Practice
The clearest evidence for Intelligence Engineering is this: the same base model behaves completely differently across different agent products. Why? Because each product wraps the model in different SOPs.
GitHub Copilot uses SKILL.md and agent.md files. Claude Code has its Skills system with MCP (Model Context Protocol) for runtime injection—meaning new SOPs can be added on the fly without rebuilding the agent. Cursor uses .cursorrules for project-level guidance. Custom agents might use YAML or JSON workflow configs.
The implementation varies, but they’re all doing the same thing: encoding procedural knowledge that tells the agent how to apply its intelligence to specific scenarios. A coding model is a coding model. What makes these products feel different isn’t the model—it’s the SOPs wrapped around it.
From Generic Output to Production Solution
When you ask a model to “fix this bug,” it draws on public knowledge—common patterns, general best practices, typical solutions. This produces reasonable output, but not necessarily your output. It might use a different style, miss your validation requirements, or skip your required review steps.
SOPs bridge this gap by encoding three critical elements:
Constraints (what NOT to do): Never commit directly to main, never bypass type checking, never include credentials in code.
Procedures (what MUST be done): Always run linter before proposing changes, always check for existing tests, always verify backward compatibility.
Evaluation (how to know when DONE): All tests pass, no new lint errors, type checking succeeds.
With SOPs, the agent doesn’t just produce output that “works”—it produces output that follows your procedures, respects your constraints, and meets your evaluation criteria. Intelligence Engineering is about ensuring the agent works your way.
The Intelligence Engineering Stack
Here’s how all the pieces fit together:
┌─────────────────────────────────────────────────────────────┐
│ Scenario Requirements & Evaluation │
│ (What "success" means for this specific use case) │
├─────────────────────────────────────────────────────────────┤
│ SOPs (Skills, agent.md, configs) │
│ Tools + Instructions + Workflow + Success Criteria │
│ (The complete procedure for achieving specific outcomes) │
├─────────────────────────────────────────────────────────────┤
│ Agent Runtime │
│ Lifecycle management, memory, tool execution │
│ (The execution engine that runs SOPs) │
├─────────────────────────────────────────────────────────────┤
│ Model Layer │
│ Token generation, reasoning, knowledge │
│ (The raw intelligence that powers everything) │
└─────────────────────────────────────────────────────────────┘
Each layer amplifies the one below:
- Model Layer provides reasoning capability
- Agent Runtime structures that reasoning into executable workflows—following a standardized agent lifecycle (conversation → LLM call → tool call → loop until done) that makes SOPs portable across implementations
- SOPs constrain and guide those workflows toward specific outcomes
- Scenario Requirements define what success looks like
Because the Agent Runtime follows a uniform pattern, SOPs can be written once and work across different agent products. The standardization at the runtime layer is what makes Intelligence Engineering scalable.
The gap between “impressive demo” and “production deployment” lives in the upper layers. Intelligence Engineering is the discipline of designing those layers.
The Discipline of Intelligence Engineering
Intelligence Engineering treats SOPs as first-class engineering artifacts—regardless of whether they’re implemented as Skills, agent.md files, or custom configs:
Design
- Define clear SOPs for each task type
- Specify tools, instructions, workflows, and success criteria
- Consider edge cases and failure modes
- Choose the right implementation format for your platform
Version Control
- SOPs should be versioned like code
- Changes to procedures should be reviewed
- Rollback capability when SOPs degrade performance
Testing
- Test SOPs against representative scenarios
- Measure success rates, efficiency, quality
- Regression testing when SOPs or models change
Iteration
- Observe agent behavior in production
- Identify where SOPs fail or underperform
- Refine procedures based on real-world feedback
Conclusion
The evolution is clear:
| Paradigm | Core Question | Focus |
|---|---|---|
| Prompt Engineering | “What should I say to the model?” | Crafting inputs |
| Context Engineering | “What should the model know?” | Managing information |
| Intelligence Engineering | “How should the system behave?” | Designing SOPs |
We have powerful models. They can think, reason, and generate. But thinking isn’t doing. The agent layer—through SOPs—bridges that gap.
Intelligence Engineering is the discipline of encoding procedural knowledge into agent systems. Whether you call them Skills, agent.md, .cursorrules, or something else—they’re all SOPs. They’re all ways of telling an agent how to apply intelligence to get real work done.
The model brings reasoning. The SOP brings the structure. Together, they solve production problems.