Building better AI agents, fast (and safely)

Technology

07 October 2025

Elma O’Sullivan-Greene

Here’s why the next decade of AI will be shaped by trust, not just technology, writes Elma O’Sullivan-Greene.

Walk into any tech meet-up in 2025 and you’ll hear the same buzzword: agents. These are the next wave of artificial intelligence; programs designed not just to respond to our questions, but to take initiative and act on our behalf. Depending on who you ask, they’re about to run our businesses and plan our lives. The hype is loud. The reality is more subtle – and more interesting.

If last year was the “year of the agent”, the next 10 will be the decade of the agent: slower, steadier, and ultimately more useful. Success won’t come from flashy demos, but from building systems that real people can trust and use, especially in industries where mistakes have serious consequences, like accounting, health, or law.

Beyond the buzz

So what exactly is an “agent”? Think of it as a spectrum. At one end, you have simple agents that can handle repetitive but useful tasks: scanning receipts, sorting emails, or pulling data from documents. At the other end are more ambitious systems designed to make multistep decisions, like doing your taxes with minimal input. Neither end is inherently “better”. It’s about choosing the right tool for the job.

Why context matters

Here’s a hidden challenge: agents are greedy for context. They don’t just need the question you ask; they often want all the related information. Past conversations, tool descriptions, and decision history. This quickly gets messy. Imagine a friend who insists on rehashing every detail of every past conversation before answering a simple question. That’s what bloated agents can feel like: slow, costly, and confusing.

A practical fix is to keep them on a leash. Break tasks into smaller steps and only save what’s essential at each stage. For example, instead of logging every twist and turn of how an AI decided a bank transfer was suspicious, just keep the verdict (“suspicious”) and the short reason (“unusually large amount”). Clear, simple, and easy to check later. Less noise, more signal.

The value of quick feedback

One of the biggest lessons from building agents is that you don’t need giant amounts of data to get started. In fact, small, realistic examples often teach you the most. Run a few real‑world tests, see where the agent goes wrong, and fix it. Repeat. The cycle of small mistakes and quick corrections is far more valuable than waiting months for a “perfect” benchmark suite.

Just as important: measure success the way a customer would. In accounting, no one cares if the software nailed step three of a process. They care whether the final numbers are right. Did the balance match? Did the report make sense? Focusing on end results keeps teams honest and aligned with real value.

Handling mistakes safely

It’s no secret that AI sometimes “hallucinates”, producing confident answers that are completely wrong. We’ve seen stories of chatbots inventing fake legal cases or accounting tools drifting into error over time. The answer isn’t to panic. It’s to design for mistakes.

That means:

Always keeping a human in the loop for sensitive actions like moving money or filing taxes.
Being honest in the interface: don’t present guesses as facts, and let the system say, “I’m not sure.”
Using traditional software for the parts of a process that demand precision, like calculating dollar amounts.
Limiting what outside systems an agent can access, so it doesn’t run wild.

Simple, auditable records

For industries like accounting, transparency is non‑negotiable. Regulators and customers don’t want a rambling transcript of every thought an AI agent has. They want a clear trail: the inputs, the decision, the reason, and any human checks. By focusing on saving only the key points, companies can create records that are both trustworthy and easy to audit.

The internet question

In the 1990s graduates were asked in job interviews, “Can you do the internet?” Today, the equivalent question is, “Can you do agents?” It’s the wrong question. What matters isn’t whether you can use the latest buzzword. It’s what problems you can solve with it, and whether the results help people in their daily work. In a few years, we might not even say “agents” at all. We’ll just talk about better, more natural software experiences.

Here are six lessons I have found useful:

Pick meaningful problems. If an agent can work in a highly regulated space like finance, it will work elsewhere.
Design the human experience first. How people correct or confirm results is half the value.
Start small. Build step‑by‑step workflows instead of giant “do everything” systems.
Measure what matters. Focus on outcomes customers care about: accuracy, time saved, and trust earned.
Stay cautious. Give agents limited freedom until they’ve earned more.
Keep learning. Review mistakes weekly and use feedback to improve.

The next decade of AI won’t be remembered for endless hype or bigger and bigger models. It will be remembered for the teams who treated autonomy as a design choice, not a given, and who built systems that made work simpler, safer, and more human. The future of agents will arrive not when they can do everything, but when people barely notice they’re there at all. Because the experience simply works.

Elma O’Sullivan-Greene is the principal machine learning engineer at MYOB.