Building Multi-Agent AI Systems: Lessons from UTRAX

When we started building the AI layer for UTRAX — a white-label telematics platform — the initial instinct was to build one smart chatbot that could handle everything. That instinct was wrong.

After six months of building, testing, and iterating in production, we shipped a system with six specialized AI agents, each handling a distinct domain. Here’s what we learned and why the multi-agent architecture was the right call.

Why Not Just One Agent?

The single-agent approach sounds simpler: one LLM with access to all your tools and data. In practice, it breaks down quickly:

Context window overload. A telematics platform has vehicle tracking, reporting, diagnostics, user management, alerting, and support workflows. Stuffing all the instructions, tool definitions, and context for all of these into one system prompt creates a model that’s mediocre at everything.

Conflicting instructions. The behavior you want from a support diagnostic agent (systematic, cautious, procedure-following) is fundamentally different from what you want from an analytics agent (exploratory, creative, pattern-finding). One system prompt can’t serve both personalities well.

Unpredictable routing. When a single agent handles everything, users hit edge cases where the model can’t decide if a request is a report query, a monitoring command, or a support issue. These ambiguous cases create the worst user experiences.

Testing and iteration friction. When you improve one capability, you risk regressing another. A monolithic agent makes it nearly impossible to iterate on one function without affecting others.

The UTRAX Architecture: Six Specialists

We built UTRAX with six agents, each with a focused domain, dedicated tools, and tailored system prompts:

1. Monitoring Agent

Domain: Real-time fleet tracking and map control. What it does: Translates natural language queries into vehicle filters and map commands. A fleet manager can say “Show me all trucks in Ankara that haven’t moved in 2 hours” and the agent translates that into the appropriate API calls. Key design decision: This agent has read-only access to vehicle data. It can filter and display but never modify vehicle state. Strict tool boundaries prevent accidental side effects.

2. General Agent

Domain: Platform help and onboarding. What it does: Answers questions about the platform using RAG over the user manual, FAQ, and feature documentation. Provides interactive walkthroughs for new users. Key design decision: This agent uses a carefully curated knowledge base with explicit versioning. When the platform updates, the knowledge base updates. No stale answers.

3. Reporting Agent

Domain: Report generation and scheduling. What it does: Users describe what data they want in plain language. The agent configures report parameters, generates reports, and sets up scheduled deliveries. Key design decision: The agent shows a preview before generating full reports. This “confirm before execute” pattern prevents wasted computation and gives users a chance to refine their request.

4. Health Check Agent

Domain: Device data quality and diagnostics. What it does: Monitors data quality from tracking devices in real-time. Detects anomalies like missing GPS fixes, battery drain patterns, or communication gaps. Alerts operators before issues become customer-facing. Key design decision: This agent runs both reactively (responding to queries) and proactively (monitoring streams). The proactive mode required careful rate limiting to avoid alert fatigue.

5. Intelligence Agent

Domain: Advanced analytics and alarm queries. What it does: Handles complex analytical questions — driver behavior scoring, route optimization insights, fuel efficiency trends, and alarm pattern analysis. Key design decision: This agent has access to aggregated historical data, not raw streams. This keeps query costs manageable and response times reasonable for analytical workloads.

6. Support Agent

Domain: Data flow troubleshooting and ticket creation. What it does: When a device isn’t reporting data correctly, this agent walks through a diagnostic flow — checking connectivity, data format issues, server-side processing — and creates structured support tickets with all relevant diagnostic information. Key design decision: The support agent creates tickets but cannot resolve them autonomously. It escalates to humans with full diagnostic context, making the support team dramatically more efficient.

Architecture Decisions That Mattered

Agent Router

We built a lightweight router that classifies incoming messages and directs them to the appropriate agent. The router uses a combination of intent classification and explicit user context (which page they’re on, what they were last doing) to make routing decisions.

Critical insight: the router should be fast and conservative. If it’s not confident about the routing, it asks the user to clarify rather than guessing. A one-second clarification question is better than a ten-second wrong-agent response.

Shared Context, Isolated Execution

All agents share a common user context (who the user is, their permissions, their fleet configuration) but have isolated tool access and system prompts. This means an agent can reference what another agent did (“I see you just generated a fuel report — would you like to set up alerts for vehicles exceeding your threshold?”) without having access to the other agent’s tools.

Structured Tool Outputs

Every tool returns structured data, not free text. The agents format the response for the user, but the underlying data is always machine-readable. This makes testing deterministic — you can assert on the structured output regardless of how the model phrases its response.

Evaluation Pipeline

We built an automated evaluation pipeline that tests each agent independently with hundreds of test cases. Each test case has an expected tool call sequence and expected output structure. This catches regressions before they reach production.

Lessons Learned

Start with two agents, not six. We didn’t build all six agents at once. We started with Monitoring and General, validated the architecture, then added agents incrementally. Each new agent took less time because the patterns were established.

Agent boundaries should follow permission boundaries. The cleanest way to define agent scope is by what tools and data it can access. If two capabilities require different permission levels, they probably belong in different agents.

Users don’t need to know about agents. The routing is invisible to users. They type a message and get a response. The multi-agent architecture is an implementation detail, not a feature to advertise.

Monitoring is non-negotiable. Each agent has its own dashboard tracking response times, error rates, tool call patterns, and user satisfaction signals. When something goes wrong, you need to know which agent is struggling and why.

Fallback to humans gracefully. Every agent has a clear escalation path. When confidence is low or the task is outside scope, the agent hands off to a human with full context. The worst user experience is an AI that confidently gives the wrong answer.

When to Use Multi-Agent Architecture

Multi-agent systems add complexity. They’re not always the right choice. Use them when:

Your domain has naturally distinct functional areas with different data access patterns
Different functions require fundamentally different agent behaviors
You need to iterate on individual capabilities without risking regressions
Permission boundaries are important and differ across functions
The system needs to scale beyond what a single context window can handle

For simpler use cases — a single-purpose chatbot, a document Q&A system, a basic workflow automation — a single well-designed agent is the better choice. Don’t add architectural complexity you don’t need.

Building Your Own Multi-Agent System

If your platform would benefit from a multi-agent architecture, the key is starting with the right boundaries. Map your domain into functional areas, define the tool access and permissions for each, and build the simplest two-agent system that proves the architecture works.

At Owlica AI, this is what we do. UTRAX was our proving ground, and the patterns we developed there apply across industries — from logistics to finance to manufacturing.

Want to explore whether a multi-agent approach fits your platform? Let’s talk.