Serverless MCP Agent Architecture: Building Scalable AI Systems on Cloud Functions
Design patterns for deploying MCP agents on serverless platforms, balancing statelessness, cost efficiency, and scalability
Serverless computing has transformed how teams build and deploy applications—but AI agents add a new layer of complexity. Traditional MCP (Model Context Protocol) agents maintain state, make sequential decisions, and coordinate across tools. Cloud functions are ephemeral, isolated, and designed for quick in-out-in workflows. How do you reconcile the two?
The answer isn't to fight the serverless model—it's to embrace it. This guide covers the architectural patterns that let you build MCP agents that scale to millions of invocations while keeping costs predictable and operational overhead minimal.
The Serverless-Agent Mismatch
MCP agents typically flow like this:
- Accept user input
- Query tools (APIs, databases, services)
- Evaluate results
- Loop back to step 2 (or finish)
This works beautifully on a long-lived server. On AWS Lambda or Google Cloud Functions, each invocation is isolated, RAM is limited, and execution time caps at 15 minutes (AWS) or 9 minutes (GCP). State vanishes when the function returns.
Agents need context across multiple tool calls. Without careful design, you'll either:
- Bloat every function argument with the full state (slow, large payloads)
- Lose context between invocations (broken agent loops)
- Keep an external state store hot (defeats cost savings)
Serverless MCP architectures solve this by shifting where loops live.
Pattern 1: Orchestration Loops in the Client
The simplest pattern: push the agent loop out to the client or a durable orchestrator.
Flow:
- Client sends a request:
{"goal": "find cheapest flights to NYC", "state": {}} - Lambda executes one MCP agent step: "I need to query flight APIs"
- Lambda returns
{"next_action": "search_skyscanner", "reasoning": "..."}+ new state - Client (or orchestrator) sees the response, invokes Lambda again with updated state
- Repeat until the agent says
{"done": true}
Pros:
- Each Lambda invocation is stateless and isolated
- Easy to add tracing, replay, or audit logs between steps
- Cost: you pay for compute time only
- Simple to test: each step is a pure function
Cons:
- Network overhead: client makes multiple round-trips (each turn adds latency)
- State serialization: every turn sends the full context over HTTP
When to use: Interactive agents (chat UIs, dashboards) where users expect incremental updates. Works great with WebSocket backends for real-time streaming of agent thoughts.
Pattern 2: Durable Task Queues (Temporal, durable-js)
If latency isn't critical and you want the agent loop server-side, use a durable task orchestrator.
Architecture:
- Client sends a goal to a Temporal workflow or Durable Objects workflow
- Orchestrator spawns an MCP agent as a subprocess
- Agent yields control when it needs to call a tool:
{step: 1, action: "call_api"} - Orchestrator suspends, invokes a Lambda to execute the tool, captures the result
- Orchestrator resumes the agent with the tool result
- Repeat until done, then return final answer to client
Pros:
- Agent loop runs in a single "virtual process" (no state passed back-and-forth)
- Automatic retry on transient failures
- Built-in observability: see full execution history
- Long-running workflows (hours, days) are native
Cons:
- Adds operational complexity (Temporal/Durable Objects cluster to run)
- State persisted to a database (not free, but cheaper than Lambda in loops)
- Slightly harder to test in isolation
When to use: Long-running background tasks (background job processing, batch analytics). Temporal/Durable Objects handle state persistence so you don't have to.
Pattern 3: Step Functions + Lambda for Tool Calls
AWS Step Functions orchestrate serverless workflows visually. They're good at sequences but overkill for complex agent loops—unless you use them only for tool dispatch.
Flow:
- API Gateway receives agent goal
- Lambda #1 (agent) decides the next action:
{tool: "call_api", params: {url: "..."}} - Lambda #1 returns state + action descriptor
- Step Functions routes the action to the right tool executor Lambda
- Tool Lambda #2 calls the API, returns raw result
- Step Functions feeds result back to Lambda #1 (agent loop step)
- Repeat
Pros:
- Visual workflow editor: easy to understand and modify without code
- Built-in retry/catch for tool failures
- Cost: Step Functions charge per transition, not per second
Cons:
- Transitions are slower than in-process calls (state serialization overhead)
- Limited to
25MB payloads between steps (can be tight for large tool results) - Boilerplate: defining every tool as a Lambda + Step Function state
When to use: When you need Step Functions' error handling and don't mind the overhead. Teams already using Step Functions for other workflows can reuse them for agents.
Pattern 4: Hybrid—Agents on Container Orchestration, Tools on Lambda
For agents with tight tool-call loops, run the agent on a lightweight container (ECS, Cloud Run) and invoke Lambda for individual tool calls.
Why this works:
- The agent runs continuously (not ephemeral), so it keeps state in RAM
- Each tool call is isolated: one tool crashes, agent recovers
- You pay container costs (cheap,
$5–20/monthfor a small instance) + Lambda invocation costs - Scales when tool calls spike, scales back when idle
Trade-off: You're not "fully serverless," but you're still cost-efficient because the agent itself doesn't scale with load—only tools do.
When to use: Agents with 100+ tool calls per request. The container's always-on cost is offset by avoiding hundreds of Lambda cold starts and state transfers.
Cost Considerations
Serverless MCP agents have three main cost drivers:
- Agent step compute: Lambda cost per invocation (usually negligible—agent steps are fast)
- Tool invocation compute: Each tool call runs on Lambda (or external API)
- State storage: If persisting agent context to DynamoDB or Temporal
- Data transfer: Egress from Lambda (free within AWS region)
A rule of thumb: if your agent makes <10 tool calls per request, use Pattern 1 (client orchestration). If 10–100 calls, use Pattern 2 (Temporal) or Pattern 4 (hybrid). For 100+, Pattern 4 with container + Lambda is often cheaper than pure Lambda loops.
Best Practices
1. Keep agent steps fast Use a fast, small model (Haiku, Sonnet) or a non-LLM reasoner. Agent step compute adds up across invocations.
2. Compress state Don't send the full conversation history on each turn. Summarize context, store full history in a database, and fetch on demand.
3. Use tool caching
If the same tool output is needed multiple times in one agent loop, cache it (in memory during the loop, or in Lambda's /tmp for brief multi-invocation use).
4. Implement circuit breakers
If a tool fails 5 times in a row, exit the agent loop. Prevent runaway costs from cascading failures.
5. Monitor cold starts
Lambda cold starts add 100–1000ms. Use provisioned concurrency or containers if latency is critical.
Wrapping Up
Serverless doesn't mean stateless agents are impossible—it means rethinking where the loop lives. For most teams, the client-orchestration pattern (Pattern 1) is the sweet spot: simple, cheap, observable, and easy to debug. As agents grow more complex or tool-heavy, move the orchestration to a durable layer (Temporal, Step Functions) or hybrid container.
The key insight: don't fight serverless. Design agents that respect its constraints—isolation, ephemeral execution, fast cold start—and you'll have a system that scales, costs less, and breaks less often.