Not All Agents Are Created Equal
The term “AI agent” covers everything from a simple LLM call with a search tool to a fully autonomous swarm of specialized agents coordinating across systems. These aren’t just different scales — they’re fundamentally different architectures with different risk profiles, failure modes, and governance needs.
Understanding which pattern you’re building — and what can go wrong — is the first step toward building agents that are production-ready, not just demo-ready.
The 6 Autonomy Levels
ARIAS classifies every detected agent into one of six autonomy levels, based on how much independent decision-making the agent performs:
L1: Tool-Augmented LLM
What it is. A single LLM call enhanced with tools (search, calculator, API lookup). The LLM decides which tool to call, but execution is one-shot — no loops, no iteration.
Example. OpenAI function calling with a weather API tool. User asks “What’s the weather in London?” → LLM calls weather tool → returns result.
What can go wrong. Tool selection errors (LLM picks the wrong tool), excessive tool permissions (search tool that also has write access), no output validation (tool returns garbage, LLM presents it as fact).
L2: Structured Pipeline
What it is. A fixed sequence of LLM calls and transformations — RAG chains, prompt sequences, multi-step extraction. The flow is deterministic; the LLM provides intelligence at each step but doesn’t decide the flow.
Example. RAG pipeline: retrieve documents → rerank → generate answer → validate output.
What can go wrong. Context window overflow (too many retrieved documents), retrieval poisoning (adversarial documents in the vector store), cascading errors (bad retrieval → bad generation → bad validation passes anyway).
L3: Reactive Agent
What it is. An agent that responds to inputs by selecting actions from available tools, with the ability to call tools multiple times. Function calling with tool_choice=auto. The agent reacts but doesn’t plan ahead.
Example. Customer support agent with access to knowledge base search, ticket creation, and escalation tools.
What can go wrong. Tool selection confusion (too many tools, LLM picks the wrong one), no error handling (tool fails, agent retries indefinitely), missing boundaries (agent takes actions outside its intended scope).
L4: Iterative Agent (ReAct)
What it is. A reasoning-and-acting loop where the agent plans, executes, observes results, and iterates. ReAct, chain-of-thought with tools, or plan-and-execute patterns.
Example. Research agent that searches for information, evaluates what it found, decides what to search next, and synthesizes a final answer.
What can go wrong. Unbounded loops (agent iterates forever), goal drift (agent’s reasoning diverges from the original task), cost explosion (each iteration costs tokens), state corruption (accumulated context becomes contradictory).
L5: Multi-Agent Coordinator
What it is. Multiple specialized agents coordinated by a supervisor or through structured handoffs. Each agent has a specific role and tools.
Example. Code review system with a “reviewer” agent, a “security scanner” agent, and a “documentation checker” agent, coordinated by a “lead” agent.
What can go wrong. Circular delegation (agent A delegates to B delegates to A), conflicting objectives (reviewer says “approve” while scanner says “block”), shared state corruption (agents overwrite each other’s memory), privilege escalation (low-privilege agent delegates to high-privilege agent).
L6: Autonomous Swarm
What it is. Self-organizing agents that dynamically spawn, coordinate, and terminate without human oversight. The system evolves its own agent topology based on the task.
Example. Autonomous research system that creates specialized agents on the fly, divides work, and aggregates results.
What can go wrong. Everything from L5, plus: uncontrolled agent spawning, resource exhaustion, loss of human oversight, emergent behaviors that weren’t designed or tested, impossibility of audit (who decided what and why?).
The Pattern Determines the Risk
As autonomy increases from L1 to L6, several things change:
| L1-L2 | L3-L4 | L5-L6 | |
|---|---|---|---|
| Decision surface | Small, bounded | Medium, iterative | Large, emergent |
| Failure modes | Predictable | Partially predictable | Unpredictable |
| Testing approach | Unit tests work | Integration tests needed | Behavioral testing required |
| Governance need | Basic (output validation) | Moderate (iteration limits, timeouts) | Critical (human gates, audit trails, drift detection) |
| Blast radius | Isolated | Component | System-wide |
An L1 agent with a misconfigured tool causes a bad response. An L5 agent with a misconfigured delegation chain causes cascading failures across your entire system.
What ARIAS Does for Each Pattern
ARIAS detects the architecture pattern of every agent and tailors its maturity assessment accordingly:
- L1-L2: Checks tool descriptions, output validation, error handling basics
- L3-L4: Adds iteration limits, timeout enforcement, retry strategies, cost tracking
- L5-L6: Adds delegation chain analysis, cross-agent conflict detection, human oversight gates, audit trail requirements
The maturity bar rises with autonomy — and ARIAS ensures your governance scales with your architecture.
This is the first article in our Agent Design Patterns series. Deep dives into each pattern are coming next.
ARIAS is the control plane for AI agents. Start your free trial to see how your agents are classified.