What Is an AI Agent? A Clear Definition
Every vendor calls their chatbot an ‘agent’ now. Most of them aren’t. Here is the three-ingredient definition that cuts through the hype, and a clean test you can apply yourself.
The three-ingredient definition
Strip away the marketing and an AI agent is three things:
- A loop. The system can run multiple model invocations back-to-back, with each invocation’s output influencing the next input.
- Tools. The model can call functions, search a database, send an email, run code, modify a file. Tools are how the agent affects the world beyond text.
- A goal and a stopping condition. The loop continues until the goal is achieved or an exit condition fires.
A thing that’s missing any of these isn’t an agent. A chatbot with no tools is a chatbot. A single LLM call with a tool attached is automation, not an agent. An open-ended loop with no goal is just a model burning your money.
Chatbot vs agent, a clean test
Given any system marketed as an agent, ask these four questions:
- Can the model take actions that change state outside the conversation?
- Can it plan a sequence of actions and adjust based on results?
- Does it have a clear goal it’s working toward?
- Can it decide to stop on its own (succeed or give up)?
Four yeses: it’s an agent. One no: it’s a chatbot with extra steps. Two or more nos: the vendor is selling hype.
A retrieval-augmented chatbot that searches docs and synthesises an answer isn’t an agent, it doesn’t modify state, it’s a single plan-and-execute, and the user decides when to stop. Useful, but not an agent.
The autonomy spectrum
“Agent” isn’t binary. It’s a spectrum from heavy human oversight to fully autonomous. The spectrum matters because different points on it have very different risk profiles.
- Suggest: the agent drafts, a human approves every action. (GitHub Copilot Chat.)
- Approve: the agent acts, but high-impact actions require a human click. (Claude Computer Use with a confirm dialog on risky calls.)
- Supervised loop: the agent runs autonomously for minutes-to-hours, a human reviews the outcome. (Coding agents like Claude Code doing long refactors.)
- Autonomous: the agent runs continuously without human intervention, only escalating on policy-defined exceptions. (Production SRE agents like Nova’s remediation bots.)
The autonomy level should match the consequence of a bad action. Code refactors are reversible with git, so supervised loops are fine. Database writes aren’t, so production agents typically stay in approve mode for those.
Four pitfalls when building one
Common failure modes across 2024-2025 agent projects:
- No exit condition. The agent plans, acts, observes, plans again, acts again, forever. The goal was fuzzy enough that it never thinks it’s done. Every agent needs either a clear success criterion or a hard iteration limit.
- Too much freedom, not enough scaffolding. Teams often hand the agent a vague prompt and 40 tools. The search space is too large; the agent flails. Structure the problem: give it 3-5 tools at a time, narrow the scope, run in stages.
- No audit trail. Every tool call should be logged with inputs, outputs, and the reasoning that led to it. When the agent does something weird, you need the trace to debug. Without it, you’re guessing.
- Running in the wrong autonomy tier. A team builds a coding agent and gives it write access to production. First incident costs a week. Start in suggest mode, earn trust, move up.
What real production agents look like in 2025
Three live examples, chosen for variety:
SRE incident-triage agent: watches alerts, correlates them against the service map, checks recent deploys, proposes (or in autonomous mode, executes) a remediation from a human-approved runbook. Tools: metrics API, deploy history, kubectl-equivalent. Stopping condition: alert clears or human is paged.
Coding agent (Claude Code, GitHub Copilot Agent): given a task description and a repo, writes code, runs tests, iterates until tests pass. Tools: read/write files, run shell, search. Stopping condition: tests pass or max iterations hit.
Browser-use agent (computer-use models): takes a natural-language goal (“book me a flight to London on Friday”) and controls a browser to accomplish it. Tools: mouse and keyboard via screenshots + coordinates. Stopping condition: goal achieved or human intervention.
The common shape: narrow domain, explicit tools, clear stopping. The generic “do anything you want” agent is still a research problem.
Where this is heading
Three trends reshaping agents in 2025:
- Better reasoning models. The underlying LLMs are becoming much better at multi-step planning, which directly improves every agent built on them.
- Standardised tool protocols. MCP (Model Context Protocol) and similar standards are making tools portable across agent frameworks.
- Long-running agents. Agents that persist for hours or days, with durable memory and resumption, are going mainstream. This demands new patterns for state management and cost control.
The next 12 months will make “just a chatbot with tools” feel increasingly dated. Real agents, ones that operate for hours against live systems with minimal supervision, are becoming the default shape of AI products.