"We want a chatbot." We hear this sentence every week during scoping. And nine times out of ten, we have to dig: behind the word chatbot sometimes hides an actual chatbot — a question-answering system over knowledge — but often an AI agent, that is, a system capable of making decisions and executing actions in external tools.
The difference isn't cosmetic. It drives cost (from 1x to 5x), implementation complexity, risk profile and roll-out strategy. This article offers a clear method to decide, based on what we observe on DevHighWay 2025-2026 projects.
Chatbot vs agent: a clear technical boundary
A modern chatbot is a Q/A system built on RAG (Retrieval-Augmented Generation). The user asks a question, the system retrieves relevant passages from a vector database (Qdrant, Pinecone, pgvector), passes them to the LLM with the question, and returns a synthetic answer. Closed scope, low error criticality, mature architecture.
An AI agent works differently. It receives a goal, plans a sequence of actions, calls tools (APIs, databases, business functions), evaluates intermediate results, and iterates until reaching the goal — or failing cleanly. Typical patterns: ReAct, Plan-and-Execute, multi-agent. Scope is open, the decision tree combinatorial, operational risk real.
Step 1 — List the desired interactions
The decisive test fits in one question: will the user leave the interaction with information, or with an action performed? "What are your delivery times?" is an information question. "Push my delivery to Saturday" is an action request. If 90% of interactions fall in the first group, you have a chatbot case.
Many projects mix both. In that case, the right approach isn't systematically a single agent doing both: it's often a RAG chatbot that detects action requests and escalates them to a dedicated workflow — simpler to build, more predictable to operate.
Step 2 — Identify external state changes
If the interaction modifies a state in a third-party system (create a Zendesk ticket, update a Salesforce deal, trigger a Stripe payment, launch an n8n workflow), you're in agent territory. Tool use — the LLM's ability to call external functions — becomes central. Architecture, observability and risk profile shift radically.
- Read-only: display an order status, retrieve an invoice — manageable in a chatbot with RAG enriched by read APIs
- Simple writes: create a ticket, add a note — lightweight agent like OpenAI Assistants API
- Complex multi-step writes: chain 4-5 calls with dependencies and rollback — LangGraph or AutoGen
- Critical action: transfer, deletion, contract — agent required with human validation during the first months
Step 3 — Evaluate error criticality
The cost of an error defines the level of investment in guardrails. A customer-service chatbot that answers poorly costs user frustration and a human callback. An accounting agent that errs costs a wrong entry, sometimes hard to audit. At equal volume, the impact isn't comparable.
Our practical rule: under €50 average impact per error, a well-designed autonomous agent is defensible. Between €50 and €500, we enforce an assisted-human mode for at least the first 90 days. Above €500, permanent human validation or restricted scope. These thresholds protect both the project and end users.
Step 4 — Pick the right framework
The framework market stabilized in 2025-2026 around a few players. For a RAG chatbot: LangChain or LlamaIndex for orchestration, Qdrant or Pinecone for the vector store, Cohere Rerank for relevance, RAGAS for evaluation. LLM of your choice: GPT-4 Turbo, Claude 3.7 Sonnet, Mistral Large 2 for sovereignty.
For an agent, the choice depends on workflow complexity. OpenAI Assistants API to start fast on a simple case. LangGraph for stateful workflows with branching and resumes. AutoGen for multi-agent with role specialization. Anthropic tool use for robustness in production on a monolithic agent. vLLM to serve a self-hosted model if sovereignty requires it.
- Chatbot €5-15k implementation: RAG + cloud LLM, 4-8 week deployment, OPEX €100-500/month
- Simple agent €20-35k: 2-5 tools, OpenAI Assistants API or lightweight LangGraph, 8-12 week deployment
- Complex agent €35-60k: multi-step workflow, full observability, possibly multi-agent — 12-20 weeks
- Token consumption: an agent consumes 2 to 5 times more tokens than an equivalent chatbot — a key OPEX factor
Step 5 — Design observability from day zero
Observability isn't a nice-to-have, it's a prerequisite. For a chatbot, LangSmith or OpenAI Evals cover the essentials: prompt traces, automated quality scores, post-conversation CSAT, alerting on drift. For an agent, add a structured audit log of every tool call (input, output, duration, status) and session replay for debugging.
Without observability, an agent in production becomes an unmanageable black box within a few weeks. Incidents pile up without diagnosis, users lose trust, the project dies. Budget 10 to 15% of implementation for observability — it's a defensive investment, not a side expense.
Step 6 — Plan a progressive roll-out
No AI agent should go directly from demo to autonomous mode. Roll-out happens in three distinct phases. First silent: the agent runs in parallel with humans, its decisions are logged without execution, comparisons are drawn. Then assisted: the agent proposes an action, a human validates in one click. Finally autonomous: the agent acts alone, with automatic escalation on low-confidence cases.
This 8 to 16-week progression costs calendar time but avoids the production incidents that kill internal trust. On our agent projects, it's the silent phase — often neglected — that reveals the most structural reasoning flaws. Better seen offline than in customer billing.
What's the real 24-month budget?
Beyond initial implementation, the operating cost of an AI agent often exceeds that of a chatbot by a factor of 3 to 5 over 24 months. Three buckets explain this. Token consumption first: a multi-step agent can consume 5,000 to 20,000 tokens per session versus 1,500 to 4,000 for an equivalent RAG chatbot — the difference adds up quickly to thousands of euros per month at serious volume.
Observability and security make up the second bucket. An agent in production requires a structured audit log, session replay, per-step alerts, automated evaluations of reasoning quality — that is, the equivalent of a lightweight MLOps platform. Count an extra €8-15k in year 1, and €200-600/month recurring in tooling (LangSmith, Langfuse, Datadog). The third bucket is human-in-the-loop, particularly during assisted phases: an operator validating 100 decisions a day at 30 seconds per decision is 0.5 additional FTE to budget.
- RAG chatbot, 24 months — €15-30k impl + €3-12k/year OPEX = ~€21-54k total
- Simple agent, 24 months — €25-45k impl + €12-36k/year OPEX = ~€49-117k total
- Complex agent with HITL, 24 months — €40-70k impl + €40-100k/year OPEX = ~€120-270k total
The pitfalls that wreck conversational projects
Beyond classic technical mistakes, two scoping pitfalls recur and invalidate entire projects.
- Using an agent for a chatbot case: 3x to 5x cost overrun, increased operational complexity, unjustified operational risk — when a well-built RAG would have covered 95% of the need
- Using a chatbot for an action case: the user asks "push my delivery", the bot replies "here's our delivery policy" — guaranteed frustration, guaranteed churn
- Skipping the silent phase: going straight to autonomous on a medium- or high-criticality agent generates 3 to 5 public incidents that will kill the project politically
What's next?
The chatbot vs AI agent choice is a €30,000-100,000 decision over 24 months — it deserves real scoping, not a 30-minute meeting decision. Our method fits in six steps, but the first one — properly qualifying information vs action — saves most subsequent mistakes.
- Start with a free audit including scoping of conversational use cases
- Check our support packages for chatbots or agents — starting at €199/month
- Get in touch for 30 minutes of free scoping and a reasoned chatbot vs agent recommendation
Choosing between chatbot and agent is choosing between operational simplicity and execution power. Both have their place — you just have to put them in the right spot, on the right case, at the right time.