LLM 101: Part 6 — AI Agents: Or, Why Your Chatbot Needs Opposable Thumbs

Previously: We covered the whole pipeline from training to RAG. Now let's talk about what happens when you give an LLM the ability to actually do things.

You asked for more, so here we are. After five parts explaining how LLMs work and how to make them useful, it's time to talk about the logical next step: what happens when you stop treating them like oracle-machines-that-answer-questions and start treating them like autonomous assistants that can take action?

Welcome to AI agents. The concept is simple. The execution is... less so.

What Even Is an AI Agent?

Let's start with what it's not: it's not just a chatbot. A chatbot is stuck in the conversation. You ask, it responds. That's the entire interaction. It can't look things up (unless it has RAG), it can't send emails, it can't book your flights, it can't do literally anything except generate text in response to your text.

An agent is different. An agent can:

Use tools: Call APIs, query databases, run code, search the web
Take actions: Actually do things in external systems
Plan: Break down complex tasks into steps
Remember: Maintain context across multiple interactions
Reason iteratively: Try something, see if it worked, adjust, try again

Think of it like this: a chatbot is a very smart parrot. An agent is a very smart intern. The parrot can tell you things (and occasionally make stuff up). The intern can actually get tasks done (and occasionally mess them up).

The Architecture of an Agent

At its core, an agent is an LLM wrapped in a framework that gives it superpowers. Here are the components:

1. The LLM (The Brain)

Still just a text prediction machine, but now it's generating more than answers. It's generating plans and actions. The LLM decides:

What steps are needed to accomplish a task
Which tool to use next
How to interpret the results of previous actions
When it's done (or when it's stuck)

The LLM is doing the reasoning. Everything else is scaffolding to let that reasoning interact with the world.

2. Tools (The Hands)

Tools are functions the agent can call. Examples:

Web search: Look up current information
Calculator: Do actual math (remember, LLMs are terrible at this)
Database query: Fetch specific data
API calls: Send emails, create tickets, book appointments
Code execution: Run Python/JavaScript to process data
File operations: Read/write files

Each tool has a description that tells the LLM what it does and when to use it. The LLM reads these descriptions and decides which tool to call based on the task at hand.

The agent doesn't have all possible tools. You give it a toolkit relevant to the task domain. A customer service agent gets access to CRM APIs and knowledge bases. A coding agent gets access to code execution and file systems. A research agent gets web search and document retrieval.

3. Memory (The Notepad)

Agents need to remember:

Short-term memory: What happened in this session? What have I tried? What did the user ask for?
Long-term memory: What did we discuss last week? What are this user's preferences?

Short-term memory is usually just the conversation context (the messages back and forth). Long-term memory requires actual storage — often a database with retrieval, basically RAG for past interactions.

Without memory, the agent is starting from scratch every time. With memory, it can build on past work and avoid repeating mistakes.

4. Planning & Reasoning Loop (The Process)

This is where the magic happens. The agent doesn't just take one action and stop. It follows a loop:

Observe: What's the current situation? What's the goal?
Think: What should I do next? Which tool should I use?
Act: Execute the chosen action (call a tool)
Observe: What were the results? Did it work?
Repeat: Until the goal is achieved or the agent gives up

This is sometimes called the ReAct loop (Reasoning + Acting). The LLM generates thoughts (reasoning) and actions (tool calls) in sequence until it solves the problem.

Example flow:

User: "Book me a flight to Tokyo for next week"

Agent (thinks): I need to find flights. I should search for available options.
Agent (acts): [Calls flight_search API with dates and destination]

Agent (observes): Got 3 flight options. User didn't specify preferences.
Agent (thinks): I should ask about preferences or pick a reasonable default.
Agent (acts): [Returns options to user]

User: "The cheapest one"

Agent (thinks): I need to book option 3, which was cheapest.
Agent (acts): [Calls flight_booking API]

Agent (observes): Booking succeeded. Confirmation number received.
Agent (thinks): Task complete.
Agent (acts): [Returns confirmation to user]

The key: the agent is making decisions at each step based on results from previous steps. It's not following a hardcoded flowchart. It's reasoning its way through the problem.

How This Actually Works: The Technical Bits

Let's get slightly more specific. When you ask an agent to do something:

Step 1: The agent receives your request "Find the cheapest hotel near the conference center in Austin for March 15-17"

Step 2: The agent plans The LLM generates a thought process:

"I need to know where the conference center is"
"Then I need to search for hotels nearby"
"Then I need to compare prices"
"Then I need to return the cheapest option"

Step 3: The agent acts It calls the first tool (maybe web_search for conference center location) and gets results.

Step 4: The agent evaluates Did that work? Do I have what I need for the next step? The LLM reads the results and decides what to do next.

Step 5: Repeat Call next tool (hotel_search with location parameters), evaluate results, continue until done.

Step 6: Return results Once the agent has accomplished the goal (or gotten stuck), it returns a final response to you.

The LLM is steering the whole process. At each step, it's being given:

The original task
The conversation history
Descriptions of available tools
Results from previous tool calls

And it generates:

Its reasoning ("I think I should do X because Y")
The next action (tool call with parameters)
Or a final answer if it's done

What Makes a Good Agent Framework?

You could build all this yourself, but people have already done the work. Popular agent frameworks include:

LangChain: The Swiss Army knife. Lots of tools, lots of features, somewhat complex. Good for prototyping.

AutoGPT: Early autonomous agent. Tries to do everything itself with minimal human input. Ambitious but often gets lost.

BabyAGI: Simpler than AutoGPT. Creates task lists and works through them. More focused.

ReAct: A pattern more than a framework. Explicit reasoning steps before each action. Very reliable.

Custom: Many companies just build their own. You need an LLM, a prompt template that explains the reasoning loop, and tool-calling infrastructure. It's not trivial but not impossible.

What these frameworks provide:

Tool integration abstractions (easy to add new tools)
Memory management
The reasoning loop logic
Error handling (what happens when a tool fails?)
Prompt templates that make the agent behavior predictable

What Agents Are Good At

When they work, agents are remarkable:

Complex, multi-step tasks: "Research competitors' pricing, summarize in a spreadsheet, and email it to my team." No human intervention needed.

Tasks requiring multiple data sources: "Find all customer complaints about shipping from last month across email, Slack, and support tickets."

Workflows with conditional logic: "If the server is down, alert the on-call engineer. If it's a false alarm, log it and do nothing."

Tasks that require real-time data: "What's the status of order #12345?" Agent queries the database, no stale information.

Repetitive research tasks: "For each company in this list, find their website, revenue, and recent news."

What Agents Are Terrible At

Reality check time:

Reliability: They fail. A lot. Maybe 70-80% success rate for complex tasks if you're lucky. Tools return errors, the LLM misinterprets results, steps get skipped. It's better than nothing but nowhere near 99% reliable.

Cost: Every step is an LLM call plus tool execution. A task that takes 15 reasoning steps costs 15x a single chatbot response. This adds up fast.

Latency: All those steps take time. A simple question gets answered in 2 seconds. An agent workflow might take 30 seconds or more.

Unpredictability: Same task, different behavior. The LLM might take a different path each time. Usually fine, occasionally baffling.

Tool hallucination: Sometimes the agent will try to call tools that don't exist or pass invalid parameters. You need robust error handling.

Going off the rails: Agents can get stuck in loops, waste actions on irrelevant steps, or decide the task is impossible when it isn't.

Security: Giving an LLM the ability to execute code or call APIs is scary. It needs guardrails. What if it decides to delete your database as part of "cleaning up"?

The Hard Parts No One Tells You About

1. Tool design is critical If your tools are poorly designed (vague descriptions, bad error messages, confusing parameters), the agent will use them wrong. You spend more time debugging tools than the agent itself.

2. Prompt engineering is an art The system prompt that explains the reasoning loop, the tools, and the agent's behavior is everything. Tiny changes can dramatically affect reliability.

3. Error recovery is brutal What happens when a tool fails? Does the agent retry? Give up? Try a different approach? This logic needs to be built in and is surprisingly hard to get right.

4. Observability is essential You need to see what the agent is thinking at each step. Logs, traces, visualizations. Without this, debugging is impossible.

5. Cost control is necessary An agent that gets stuck in a loop can burn through your API budget in minutes. You need max step limits, timeouts, and monitoring.

When Should You Use an Agent?

Be honest with yourself:

Use an agent when:

The task genuinely requires multiple steps across different systems
You're okay with 70-80% success rate
Human oversight is available for when it fails
The time/cost savings justify the complexity

Don't use an agent when:

A simple API call or script would work
You need 99%+ reliability
The task is truly novel every time (not enough patterns to learn)
You're not willing to invest in tooling and observability

Real talk: Most problems don't need agents. They need better integrations, better workflows, or just a well-prompted LLM. Agents are for the messy middle ground where hardcoded logic is too brittle but human intervention is too expensive.

Practical Example: A Customer Support Agent

Let's walk through a realistic use case.

Goal: Answer customer questions about their orders.

Tools the agent has:

search_knowledge_base(query): RAG over support docs
get_order_status(order_id): Query database
get_shipping_info(order_id): Query shipping provider API
send_email(recipient, subject, body): Send follow-up emails

Example interaction:

Customer: "Where's my order? Order #45231"

Agent thinks: "Customer wants order status. I should look it up."

Agent calls: get_order_status(45231)

Result: "Status: Shipped. Expected delivery: Nov 18"

Agent thinks: "Customer might want tracking details too."

Agent calls: get_shipping_info(45231)

Result: "Tracking: XYZ123. Current location: Distribution center."

Agent responds: "Your order #45231 has shipped! Expected delivery is Nov 18. Current status: at distribution center. Tracking number: XYZ123. You can track it here: [link]"

Total cost: 3 LLM calls, 2 database queries, ~5 seconds. Solved without human intervention.

If the order status was "Problem - Address Invalid," the agent might instead offer to update the address or escalate to a human agent. The reasoning loop lets it handle variations without hardcoding every scenario.

Building Your First Agent

If you're curious enough to try:

Step 1: Pick a framework Start with LangChain or build a simple ReAct loop. Don't overcomplicate.

Step 2: Define a narrow use case "Answer questions about company policies using our handbook" is good. "Automate all of customer support" is not.

Step 3: Build 1-3 tools Start simple. Document search, maybe a database query, that's it. Don't give it 50 tools on day one.

Step 4: Write a clear system prompt Explain the agent's role, how to use tools, when to give up. Be specific. Test variations.

Step 5: Add guardrails Max steps (10-20), timeouts (60 seconds), cost limits. The agent will try to escape eventually.

Step 6: Test the hell out of it Run the same queries multiple times. Watch for variability. Check logs obsessively. Agents are flaky; embrace it.

Step 7: Add human oversight Let the agent draft responses but require human approval. Or let it handle simple cases and escalate complex ones. Full autonomy comes later (maybe).

The Philosophical Bit

Are these agents "intelligent"? Not really. They're LLMs with scaffolding. The LLM is still just predicting tokens, but now those tokens are "call this function with these parameters" instead of just conversational responses.

The reasoning is still statistical pattern matching, just applied to planning and tool use instead of pure language. The agent hasn't learned to think; it's learned that certain patterns of tool calls tend to solve certain types of problems.

But you know what? It works often enough to be useful. And "useful" is a higher bar than "intelligent" anyway.

The Honest Takeaway

Agents are the next step beyond chatbots and RAG. They're genuinely useful for complex, multi-step workflows. They're also temperamental, expensive, and require significant engineering investment to make reliable.

The hype says agents will automate everything. The reality is they'll automate some things, poorly at first, better over time, and probably not the things you expect.

Should you build one? Maybe. Should you build one now? Probably not unless you have a specific use case, realistic expectations, and resources for iteration.

The future probably involves agents handling routine stuff while humans focus on exceptions and edge cases. We're not there yet. But the building blocks exist.

Final Thoughts

That's Part 6. You now know what agents are, how they work, and why they're both exciting and frustrating.

If this series has taught you anything, it's this: LLMs are powerful tools, not magic. Each technique we've covered—training, fine-tuning, RAG, agents—solves specific problems. Most of the time, the simplest solution that works is the right one.

Start with prompting. Add RAG if you need knowledge. Consider fine-tuning if you need behavior changes. Use agents when the task genuinely requires multi-step reasoning and tool use. Don't jump straight to the complex solution because it sounds cool.

And remember: these are still just prediction machines. Very impressive, occasionally useful, frequently overconfident prediction machines.

Thanks for sticking around for all six parts. Now go build something practical and try not to let the agent book you a flight to the wrong Tokyo.

Next: Part 7 — Prompt Engineering: Or, Why Talking to Robots Is Harder Than You Think