Article

From Phone Trees to AI Agents: Building Trust in Natural Language Systems

Steve Horn

January 6, 2026

I recently had an experience with an Interactive Voice Response (IVR) system. The IVR told me I could ask it questions and personified itself by sharing its name. I was excited! Perhaps modern natural language conversation had finally replaced phone trees! Unfortunately, its recognition broke down quickly, and disappointment replaced curiosity.

In the meantime, I have found myself on the other side of the equation: designing and coding an agentic chat interface with my team. We’re on the hook for creating an experience using natural language that will delight people using our app. Compared to IVR systems, the capabilities are elevated dramatically thanks to an LLM, and the fact that our interface is text-based simplifies things further.

Still, we find ourselves facing familiar truths: building software that earns trust requires a mix of product thinking and engineering excellence. We can check the box for all of the things that are important to us: high test coverage, zero todos, and shift-left thinking, and our product may still create anxiety and confusion for users if we don’t get the experience right.

Now with LLMs, we add an additional dimension of complexity as inputs to the system enter layers of inference that may build upon themselves. What has this meant for our team as a result?

1) We build trust through correctness. We now layer eval systems on top of automated tests to verify useful, context-sensitive outputs. For example, if we’re building a fitness app and a user asks “Can I do squats without weights?”, the agent should recognize that as a request for a bodyweight exercise modification, not a generic answer about squats. An LLM engineering platform like Langfuse captures these distinctions so we can verify that the agent produces relevant guidance. Those outputs can be judged in a few different ways:

a) an LLM can decide correctness.

b) business teams are enabled to provide feedback on agent responses.

c) end-users are given the ability to provide response feedback with a thumbs-up or thumbs-down.

Teams can assess response quality based on this feedback.

2) We build trust through tone. What our agent says doesn’t just reflect product functionality, it reflects the company’s voice. For example, a healthcare provider might want their agent to communicate reassurance and clarity, while a financial-services app might prioritize authority and precision. This means engineers aren’t just shipping features; we’re partnering with product and design teams to test how responses sound, fine-tune prompt strategies, and ensure the agent embodies the values stakeholders expect.

3) We build trust through collaboration and ownership. Traditional software allowed a clearer separation of responsibilities: QA focused on correctness, product managers on features, and engineers on implementation. At Livefront we’ve always encouraged engineers to elevate their ownership where appropriate, but now they are expected to shape prompts, evaluate edge cases, and refine the overall experience. For example, when testing a travel app’s agent, it’s not enough to confirm that “book a flight” works, we also need to evaluate how the agent handles a vague request like “I want to get away somewhere warm next weekend.” Engineers are central in designing eval frameworks for those scenarios, working with stakeholders to define success criteria, and iterating quickly when outputs don’t align with user expectations. The result is that engineers become true co-creators of the product experience.

Each of these strategies plays a role to ensure agentic system outputs meet our high bar for quality.

One of my favorite books is Robert Persig’s Zen and the Art of Motorcycle maintenance which contains a quote:

“The test of the machine is the satisfaction it gives you. There isn’t any other test. If the machine produces tranquility it’s right. If it disturbs you it’s wrong until either the machine or your mind is changed.”

The IVR failed the tranquility test. LLMs give us powerful tools, but success depends on design, testing, and iteration. At Livefront, we bring humility and curiosity to that challenge, helping our partners design agentic systems that earn trust and deliver delight. If you’re exploring how AI can transform your product, we’d love to talk.

Steve Horn

Director, Engineering
More from Steve

View these next.

See all

From Phone Trees to AI Agents: Building Trust in Natural Language Systems

View these next.

How To Sabotage Your Project Using Inconsistency

Unit Testing race conditions by creating chaos (Swift)

UIApplicationDelegate call sequence reference