Multi-Agent Systems
Power, Pitfalls, and the Hidden Costs of Orchestration
A deep dive into how multi-agent LLM systems work, where they break down, and how to build them responsibly on AWS.
By: GeoGizmodo staff
03/20/2026
What Is an Agent?
Before we talk about multi-agent systems, it helps to be precise about what a single agent actually is.
01
Reason
The LLM thinks about the current state
02
Pick a Tool
It selects a function to call
03
Observe
It reads the result
04
Repeat or Stop
It loops until it decides it's done
That loop is the critical difference between a chatbot and an agent. A chatbot answers. An agent acts.
import boto3, json bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") tools = [ { "toolSpec": { "name": "query_database", "description": "Run a SQL query and return results", "inputSchema": { "json": { "type": "object", "properties": {"sql": {"type": "string"}}, "required": ["sql"] } } } } ] messages = [{"role": "user", "content": [{"text": "How many orders came in last week?"}]}] while True: response = bedrock.converse( modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", messages=messages, toolConfig={"tools": tools} ) output = response["output"]["message"] messages.append(output) stop_reason = response["stopReason"] if stop_reason == "end_turn": print(output["content"][0]["text"]) break elif stop_reason == "tool_use": # handle tool call, append result, continue loop ...
The agent decides when it's done. That autonomy is powerful — and it's also what makes things complicated when you chain agents together.
Enter Multi-Agent Systems
A multi-agent system (MAS) is when you wire several agents together so they can divide labour. One common pattern is an orchestrator that breaks a goal into subtasks and dispatches each to a subagent specialized for that work.
1
User
Submits a goal
2
Orchestrator
Breaks it into subtasks and delegates
3
Subagents
Research, Coder, and Reviewer agents work in parallel
4
Orchestrator
Synthesises results
5
User
Receives the final output
On AWS, a natural implementation is to run each agent as its own Lambda function, with an SQS queue or Step Functions state machine handling the routing. Bedrock Agents also natively supports multi-agent collaboration if you want managed orchestration.
import boto3, json stepfunctions = boto3.client("stepfunctions", region_name="us-east-1") # Kick off a multi-agent pipeline defined in Step Functions response = stepfunctions.start_execution( stateMachineArn="arn:aws:states:us-east-1:123456789:stateMachine:MultiAgentPipeline", input=json.dumps({ "goal": "Write and review a market analysis for Q2", "context": {"vertical": "SaaS", "region": "EMEA"} }) )
The appeal is obvious: parallelism, specialisation, and modularity. But the architecture introduces failure modes that don't exist in single-agent systems, and they're worth understanding before you build.
Problem 1: Context Explosion
Every agent needs context to do its job. In a single-agent system, context grows linearly as the conversation progresses. In a multi-agent system, context fans out.
Imagine an orchestrator that passes its full working state to three parallel subagents. Each subagent accumulates its own reasoning, tool outputs, and intermediate results. When those results are returned and merged, the orchestrator now needs to process all of it — often including verbose tool outputs, redundant reasoning traces, and near-duplicate findings from agents that approached the same data differently.
This is context explosion: the total tokens in flight across the system balloon far beyond what any single agent would have consumed.
Three consequences:
Latency
Larger contexts take longer to process
Cost
Token costs scale fast; an inefficient MAS can be an order of magnitude more expensive
Quality
Models perform worse reasoning over very long, noisy contexts
Mitigation: Compress Before Passing
Instead of forwarding raw agent output, have each subagent produce a structured summary before returning.
import boto3, json bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") def compress_agent_output(raw_output: str, task_description: str) -> dict: """Ask Claude to distill a subagent's verbose output into a tight summary.""" prompt = f""" You are a compression step in a multi-agent pipeline. Task the subagent was solving: {task_description} Raw subagent output: {raw_output} Return ONLY a JSON object with: - "summary": 2-3 sentence synthesis of the key findings - "key_facts": list of up to 5 critical facts the orchestrator needs - "confidence": high | medium | low - "flags": any issues or blockers the orchestrator should know about """ response = bedrock.converse( modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", messages=[{"role": "user", "content": [{"text": prompt}]}] ) return json.loads(response["output"]["message"]["content"][0]["text"])
The orchestrator receives a compact dict rather than a wall of text. Over many subagents, this keeps the orchestrator's context window manageable and its reasoning sharp.
Problem 2: The Reasoning Bottleneck
In most MAS designs, the orchestrator is where all the intelligence lives. It plans, it delegates, it synthesises. This creates a single point of cognitive load: everything eventually flows back through one model, which then has to hold the entire picture in its head and make high-quality decisions about it.
Planning Drift
The orchestrator makes a plan at step 0, but by step 5 the world has changed. Orchestrators are bad at revising earlier plans buried deep in context.
Lost Thread
After processing five verbose subagents, the orchestrator "forgets" a constraint from the original instruction. Not hallucination — attention failing on a long, noisy context.
Serial Coupling
Even in parallel pipelines, the orchestrator step is serial. If subagents run in 2s each but the orchestrator takes 8s to synthesise, wall-clock time is dominated by the bottleneck.
Over-Orchestration
The orchestrator tries to manage every minute detail, leading to excessive communication overhead and redundant instructions for capable subagents.
Mitigation: Structured State + Explicit Re-Grounding
Give the orchestrator a structured state object it updates on each cycle rather than relying on conversational memory. Re-inject the original goal explicitly at every orchestrator call.
import boto3, json bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") def orchestrator_cycle(state: dict, subagent_results: list[dict]) -> dict: """ Orchestrator step: given current state and new subagent results, update state and decide next actions. """ system_prompt = """You are an orchestrator in a multi-agent pipeline. You will receive your current state (including the original goal) and new results from subagents. Update the state and output your next actions. ALWAYS re-read the original_goal before deciding anything. Respond only with valid JSON.""" prompt = f""" Current state: {json.dumps(state, indent=2)} New subagent results: {json.dumps(subagent_results, indent=2)} Respond with a JSON object: {{ "updated_state": {{ ...full updated state including original_goal... }}, "next_actions": [ {{"agent": "research|coder|reviewer", "task": "...", "context": {{...}} }} ], "done": false, "final_answer": null }} """ response = bedrock.converse( modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", system=[{"text": system_prompt}], messages=[{"role": "user", "content": [{"text": prompt}]}] ) raw = response["output"]["message"]["content"][0]["text"] return json.loads(raw)
The key discipline: state["original_goal"] is always present in the payload — the orchestrator can never drift away from it because it's structurally required in every cycle's input.
Problem 3: Error Propagation
In a single-agent system, a bad tool call produces a bad result that the agent can observe and retry. In a multi-agent pipeline, a bad output from one agent becomes the input of the next. Garbage in, garbage out — but now distributed across a pipeline you can't easily inspect.
If a research agent hallucinates a statistic, a coder agent might build logic around it, and a reviewer agent might miss it because it's evaluating code quality, not factual accuracy. The error has propagated through three agents and the damage is compounded.
Mitigation: Validation Gates
Insert a lightweight validation step between agents. On AWS you can implement this as a Lambda function that sits between SQS queues. Notice the use of Haiku for the validation gate — you don't need a frontier model for a binary check. Running cheaper models on validation steps can significantly reduce the cost overhead of adding safety layers.
import boto3, json bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") sqs = boto3.client("sqs", region_name="us-east-1") DOWNSTREAM_QUEUE = "https://sqs.us-east-1.amazonaws.com/123456789/coder-agent-queue" DLQ = "https://sqs.us-east-1.amazonaws.com/123456789/failed-outputs-dlq" def validate_research_output(event, context): for record in event["Records"]: payload = json.loads(record["body"]) research_output = payload["output"] original_task = payload["task"] validation_prompt = f""" You are a validation gate in a multi-agent pipeline. A research agent produced the following output for this task: "{original_task}" Output to validate: {research_output} Check for: 1. Does the output actually address the task? 2. Are there any obvious factual contradictions? 3. Are claims that need sources missing citations? Respond with JSON: {{"valid": true/false, "reason": "...", "corrected_output": null or corrected string}} """ response = bedrock.converse( modelId="anthropic.claude-3-haiku-20240307-v1:0", # cheap model for gates messages=[{"role": "user", "content": [{"text": validation_prompt}]}] ) result = json.loads(response["output"]["message"]["content"][0]["text"]) target_queue = DOWNSTREAM_QUEUE if result["valid"] else DLQ payload["output"] = result.get("corrected_output") or research_output payload["validation"] = result sqs.send_message(QueueUrl=target_queue, MessageBody=json.dumps(payload))
Problem 4: Runaway Loops and Agent Deadlock
Agents can get stuck. An orchestrator waiting on a subagent that's waiting on a tool that's waiting on an external API that's down will just wait. Unless you've built explicit timeouts and circuit breakers, a MAS can hang indefinitely — or worse, loop.
Runaway loops happen when an agent retries a failing action without backing off, or when two agents are each waiting for the other to act first (deadlock). These are distributed systems problems, not AI problems — but because LLM calls are slow and expensive, they hurt more than in a typical microservices architecture.
Mitigation: Step Functions with Timeouts and Retries
AWS Step Functions is well-suited to this because it gives you timeout and retry configuration as first-class primitives. Hard timeouts at every step. A failure handler that can notify, retry from a checkpoint, or gracefully degrade. Step Functions makes all of this observable too — you get a visual execution trace that shows exactly where a pipeline stalled.
{ "Comment": "Multi-agent pipeline with guardrails", "StartAt": "OrchestratorPlan", "States": { "OrchestratorPlan": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789:function:orchestrator", "TimeoutSeconds": 60, "Retry": [ { "ErrorEquals": ["States.Timeout", "Lambda.ServiceException"], "IntervalSeconds": 5, "MaxAttempts": 2, "BackoffRate": 2 } ], "Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "HandleFailure" }], "Next": "ParallelSubagents" }, "ParallelSubagents": { "Type": "Parallel", "TimeoutSeconds": 120, "Branches": [ { "StartAt": "ResearchAgent", "States": { "ResearchAgent": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789:function:research-agent", "End": true } } }, { "StartAt": "CoderAgent", "States": { "CoderAgent": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789:function:coder-agent", "End": true } } } ], "Next": "OrchestratorSynthesize", "Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "HandleFailure" }] }, "OrchestratorSynthesize": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789:function:orchestrator-synthesize", "TimeoutSeconds": 60, "End": true }, "HandleFailure": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789:function:handle-failure", "End": true } } }
When NOT to Use Multi-Agent Systems
Multi-agent systems are genuinely powerful — but they carry real overhead: latency, cost, complexity, and new failure modes. For tasks that a single well-prompted agent with good tools can handle in one pass, the added architecture often hurts more than it helps.
When a Multi-Agent System IS a Good Fit
Parallelisable
Independent subtasks that can run concurrently
Long-horizon
Too many steps to fit comfortably in a single context window
Heterogeneous
Different subtasks need different tools, models, or permissions
Start with the simplest thing that could work. Add agents when you've hit a real ceiling, not because the pattern looks elegant.
Summary
Context Explosion
Raw outputs passed between agents
Compress subagent results before forwarding
Reasoning Bottleneck
Orchestrator carries all cognitive load
Structured state; re-inject goal every cycle
Error Propagation
Bad output becomes upstream input
Validation gates between agent steps
Runaway Loops / Deadlock
No timeouts on async coordination
Step Functions with explicit timeouts and retries
Multi-agent systems are not a shortcut to capability — they're a tool for scaling capability that already works. Get the single agent right first. Then split it apart carefully, add guardrails at every seam, and keep a close eye on what your token bills are telling you. They're usually the most honest signal about whether your architecture is actually efficient.
Ready to Collaborate?
Explore how multi-agent systems can transform your projects. Whether for consulting, custom solutions, or partnerships, we're eager to hear from you. hello@geogizmodo.ai