May 9, 2026·6 min·BitAtlas Team

Serverless MCP Agent Architecture: Building Scalable AI Systems on Cloud Functions

Design patterns for deploying MCP agents on serverless platforms, balancing statelessness, cost efficiency, and scalability

serverlessMCP agentscloud functionsevent-drivenscalability

Serverless computing has transformed how teams build and deploy applications—but AI agents add a new layer of complexity. Traditional MCP (Model Context Protocol) agents maintain state, make sequential decisions, and coordinate across tools. Cloud functions are ephemeral, isolated, and designed for quick in-out-in workflows. How do you reconcile the two?

The answer isn't to fight the serverless model—it's to embrace it. This guide covers the architectural patterns that let you build MCP agents that scale to millions of invocations while keeping costs predictable and operational overhead minimal.

The Serverless-Agent Mismatch

MCP agents typically flow like this:

Accept user input
Query tools (APIs, databases, services)
Evaluate results
Loop back to step 2 (or finish)

This works beautifully on a long-lived server. On AWS Lambda or Google Cloud Functions, each invocation is isolated, RAM is limited, and execution time caps at 15 minutes (AWS) or 9 minutes (GCP). State vanishes when the function returns.

Agents need context across multiple tool calls. Without careful design, you'll either:

Bloat every function argument with the full state (slow, large payloads)
Lose context between invocations (broken agent loops)
Keep an external state store hot (defeats cost savings)

Serverless MCP architectures solve this by shifting where loops live.

Pattern 1: Orchestration Loops in the Client

The simplest pattern: push the agent loop out to the client or a durable orchestrator.

Flow:

Client sends a request: {"goal": "find cheapest flights to NYC", "state": {}}
Lambda executes one MCP agent step: "I need to query flight APIs"
Lambda returns {"next_action": "search_skyscanner", "reasoning": "..."} + new state
Client (or orchestrator) sees the response, invokes Lambda again with updated state
Repeat until the agent says {"done": true}

Pros:

Each Lambda invocation is stateless and isolated
Easy to add tracing, replay, or audit logs between steps
Cost: you pay for compute time only
Simple to test: each step is a pure function

Cons:

Network overhead: client makes multiple round-trips (each turn adds latency)
State serialization: every turn sends the full context over HTTP

When to use: Interactive agents (chat UIs, dashboards) where users expect incremental updates. Works great with WebSocket backends for real-time streaming of agent thoughts.

Pattern 2: Durable Task Queues (Temporal, durable-js)

If latency isn't critical and you want the agent loop server-side, use a durable task orchestrator.

Architecture:

Client sends a goal to a Temporal workflow or Durable Objects workflow
Orchestrator spawns an MCP agent as a subprocess
Agent yields control when it needs to call a tool: {step: 1, action: "call_api"}
Orchestrator suspends, invokes a Lambda to execute the tool, captures the result
Orchestrator resumes the agent with the tool result
Repeat until done, then return final answer to client

Pros:

Agent loop runs in a single "virtual process" (no state passed back-and-forth)
Automatic retry on transient failures
Built-in observability: see full execution history
Long-running workflows (hours, days) are native

Cons:

Adds operational complexity (Temporal/Durable Objects cluster to run)
State persisted to a database (not free, but cheaper than Lambda in loops)
Slightly harder to test in isolation

When to use: Long-running background tasks (background job processing, batch analytics). Temporal/Durable Objects handle state persistence so you don't have to.

Pattern 3: Step Functions + Lambda for Tool Calls

AWS Step Functions orchestrate serverless workflows visually. They're good at sequences but overkill for complex agent loops—unless you use them only for tool dispatch.

Flow:

API Gateway receives agent goal
Lambda #1 (agent) decides the next action: {tool: "call_api", params: {url: "..."}}
Lambda #1 returns state + action descriptor
Step Functions routes the action to the right tool executor Lambda
Tool Lambda #2 calls the API, returns raw result
Step Functions feeds result back to Lambda #1 (agent loop step)
Repeat

Pros:

Visual workflow editor: easy to understand and modify without code
Built-in retry/catch for tool failures
Cost: Step Functions charge per transition, not per second

Cons:

Transitions are slower than in-process calls (state serialization overhead)
Limited to 25 MB payloads between steps (can be tight for large tool results)
Boilerplate: defining every tool as a Lambda + Step Function state

When to use: When you need Step Functions' error handling and don't mind the overhead. Teams already using Step Functions for other workflows can reuse them for agents.

Pattern 4: Hybrid—Agents on Container Orchestration, Tools on Lambda

For agents with tight tool-call loops, run the agent on a lightweight container (ECS, Cloud Run) and invoke Lambda for individual tool calls.

Why this works:

The agent runs continuously (not ephemeral), so it keeps state in RAM
Each tool call is isolated: one tool crashes, agent recovers
You pay container costs (cheap, $5–20/month for a small instance) + Lambda invocation costs
Scales when tool calls spike, scales back when idle

Trade-off: You're not "fully serverless," but you're still cost-efficient because the agent itself doesn't scale with load—only tools do.

When to use: Agents with 100+ tool calls per request. The container's always-on cost is offset by avoiding hundreds of Lambda cold starts and state transfers.

Cost Considerations

Serverless MCP agents have three main cost drivers:

Agent step compute: Lambda cost per invocation (usually negligible—agent steps are fast)
Tool invocation compute: Each tool call runs on Lambda (or external API)
State storage: If persisting agent context to DynamoDB or Temporal
Data transfer: Egress from Lambda (free within AWS region)

A rule of thumb: if your agent makes <10 tool calls per request, use Pattern 1 (client orchestration). If 10–100 calls, use Pattern 2 (Temporal) or Pattern 4 (hybrid). For 100+, Pattern 4 with container + Lambda is often cheaper than pure Lambda loops.

Best Practices

1. Keep agent steps fast Use a fast, small model (Haiku, Sonnet) or a non-LLM reasoner. Agent step compute adds up across invocations.

2. Compress state Don't send the full conversation history on each turn. Summarize context, store full history in a database, and fetch on demand.

3. Use tool caching If the same tool output is needed multiple times in one agent loop, cache it (in memory during the loop, or in Lambda's /tmp for brief multi-invocation use).

4. Implement circuit breakers If a tool fails 5 times in a row, exit the agent loop. Prevent runaway costs from cascading failures.

5. Monitor cold starts Lambda cold starts add 100–1000ms. Use provisioned concurrency or containers if latency is critical.

Wrapping Up

Serverless doesn't mean stateless agents are impossible—it means rethinking where the loop lives. For most teams, the client-orchestration pattern (Pattern 1) is the sweet spot: simple, cheap, observable, and easy to debug. As agents grow more complex or tool-heavy, move the orchestration to a durable layer (Temporal, Step Functions) or hybrid container.

The key insight: don't fight serverless. Design agents that respect its constraints—isolation, ephemeral execution, fast cold start—and you'll have a system that scales, costs less, and breaks less often.

Encrypt your agent's data today

BitAtlas gives your AI agents AES-256-GCM encrypted storage with zero-knowledge guarantees. Free tier, no credit card required.