# Mastra in production: the advanced patterns I actually ship

URL: https://www.thedeepfeed.ai/posts/2026-06-26-mastra-advanced-patterns-in-production/
Category: Tools
Published: 2026-06-26
Author: the-deep-feed
Tags: mastra, agents, typescript, mcp, memory, workflows, infrastructure
Kind: deep

> I have shipped Mastra across six TypeScript codebases over the full 1.x line — a market-intelligence API, an autonomous-CMO product, a crypto trading agent, a personal assistant. This is the comprehensive operator playbook: memory's four layers, tools and workflows, supervisors, MCP on both ends, RAG and evals, the free server and auth layers, observability, the whole 1.x migration history, and the production bugs that taught me the most.

## TL;DR

- I run Mastra in six production TypeScript codebases and have lived the whole 1.x line — v0.10 → v1.5 → v1.21 → v1.32. This is the comprehensive operator's reference, not a feature tour: ~11,000 words on the primitives, the integration surface, the platform you get for free, and the operational scars.
- The primitives at depth: memory's **four composing layers** (history, working-memory scratchpad, semantic recall, and Observational Memory's 3-agent Actor/Observer/Reflector design), **tools** that never throw and cap their own output, **workflows** for deterministic control flow, and **supervisor agents** (the replacement for the now-deprecated `.network()`).
- The surface nobody documents well: **MCP** as client and server with four transports and mid-tool elicitation, **RAG** where chunking is the whole game, **evals** as a thermometer not a thermostat, a **free HTTP server layer** with stream redaction on by default, and **auth** where JWT is free but RBAC is an Enterprise license line. Two production non-negotiables underneath it all: a **`ModelWithRetries[]`** fallback chain and a **`RequestContext`** stamped onto every trace span.
- The bugs that taught me the most are the silent ones: Mastra's Zod-to-JSON-Schema layer emits schemas **OpenAI strict mode rejects** (we patch compiled `node_modules` to fix it), the default `temperature: 0.7` **400s on newer models** that ban sampling params, `Tool.background` is **silently dropped at runtime** in 1.32.1, and `PostgresStore` without `disableInit: true` **runs migrations on boot** and crash-loops under load.
- Honest take: weekly releases mean constant minor bumps and the occasional hard rename — the forty-five days while I finished writing shipped **twelve more minors (v1.33→v1.46)**, proof of the treadmill itself. The big moves: Mastra now runs **Claude Code / Cursor / Codex / OpenAI Agents SDK as subagents** (an "agent meta-harness"), a new **Signals** primitive with **Task Lists**, a **built-in pub/sub event system** with disconnect-replay, plus **system-actor execution** and a **multi-session Harness** for multi-tenant work — with two hard breaking changes (the v1.46 Harness→Session split and the v1.42 tool-suspension rename). The codemods cover most of it. If your stack is TypeScript, the all-in-one surface is worth the churn. If it is Python, this is not your framework. And the public sentiment maps cleanly onto these scars: praise is specific and DX-shaped (one dev logged **18 hours vs 41** switching from LangGraph; another shipped for **three months**, called it "best in class for TypeScript," and independently hit the same non-LLM workflow clunk I did), while the sharp criticism is strategic — **platform** lock-in over model lock-in, and the open question of how Mastra survives if **Vercel's AI SDK runs the Next.js playbook** up the stack: "the framework is great, but how are you gonna make real money?"

Most framework write-ups are written by someone who built a to-do app over a weekend. This one is not. I have shipped [Mastra](https://github.com/mastra-ai/mastra) into six production TypeScript codebases over the last year — a crypto market-intelligence API, an autonomous-CMO product, an on-chain trading agent, a personal assistant, a general-purpose API service, and a reference multi-agent build I keep around to test new primitives. Across those repos my `@mastra/core` pins span **v0.10, v1.5, v1.21, v1.24, v1.25, and v1.32**. I have run the codemods, chased the renames, and filed the bug reports.

So this is not "here are the docs, reworded." It is long on purpose — the set of patterns I actually reach for, the primitives examined at the depth you only get from running them in anger, the four production bugs that cost me the most, the free server and auth layers nobody documents well, and an honest read on whether the weekly-release treadmill is worth it. Read it top to bottom or jump to the section you are fighting with right now. If you build agents in TypeScript, it is probably worth your time. If you build them in Python, stop reading — this is not your framework, and pretending otherwise wastes yours.

# Why a framework at all, in 2026

The lazy take is that frameworks are dead weight: models keep getting better, so the scaffolding around them will evaporate. Mastra's CEO Sam Bhagwat has the sharpest rebuttal I have seen, and it matches my experience exactly:

> "Some people argue that as models get better the harnesses will go away. But over the last 18 months, the models have gotten much better. And harnesses have become bigger, not smaller."

That is the whole thesis. The model is one component. Everything else — the loop, tool execution, memory, the routing between sub-agents, the guardrails, the traces — is the harness, and it has been growing, not shrinking. Two years ago "build an agent" meant calling the completions API in a `while` loop and parsing tool calls by hand. I did that. It does not survive contact with a second agent, a flaky provider, or a compliance requirement.

Mastra's framing is "Next.js for AI agents" — TypeScript-first, **not** a port of a Python library. That distinction is not marketing. LangChain.js always felt like a translation; Mastra's Zod-on-everything typing and IDE integration are the reason I stopped writing the loop myself. The company raised a $22M Series A from Spark Capital in April 2026 ($35M total), ships roughly weekly, and is sitting around 22K GitHub stars and ~220K npm downloads a week. LangChain still has 100K+ stars and far more history. I am not telling you Mastra is bigger. I am telling you that for a TypeScript shop, it is the one I would pick again.

# The mental model: one registry, five primitives

Everything wires through a single `Mastra` class — the registry that holds your agents, workflows, tools, storage, memory, logger, observability, and MCP servers. The five primitives that matter day to day:

- **Agents** — an LLM + instructions + tools, optionally memory, sub-agents, workflows, input/output processors, and a model-fallback array.
- **Tools** — typed functions with **Zod schemas on both input and output**, suspend/resume, and `requireApproval` for human-in-the-loop.
- **Workflows** — explicit graph orchestration (`.then`, `.parallel`, `.branch`, `.foreach`, `.dowhile`, `.sleep`, `suspend`/`resume`) for when you do *not* want the agent improvising the control flow.
- **Memory** — thread-based, four complementary types (more below).
- **MCP** — first-class as both client and server.

The rest of this post is the advanced layer on top of those — the patterns that took me a few production incidents to get right. I have grouped it: first the primitives at depth (memory, tools, workflows, multi-agent), then the integration surface (MCP, RAG, evals), then the things that keep you alive in production (guardrails, fallback, tracing, observability), then the platform you get for free (server, auth), and finally the hard-won operational lessons (the bugs, the migrations, the deploy targets).

# Working memory is the feature I underused for too long

Mastra's memory is not one thing. It is four, and they compose:

1. **Message history** — a sliding window (`lastMessages`).
2. **Working memory** — a persistent, structured markdown scratchpad the agent edits itself.
3. **Semantic recall** — vector search over past messages (`topK` + `messageRange`).
4. **Observational memory** — background AI summaries; Mastra reports 95% on LongMemEval, which is state of the art for that benchmark.

For a long while I leaned on semantic recall for everything and wondered why the agent kept "forgetting" the obvious. The fix was working memory. You give the agent a template, and Mastra injects an `updateWorkingMemory` tool so the agent maintains its own structured state between turns — no retrieval, no embeddings, just a block that renders into the system prompt every turn:

```typescript
const memory = new Memory({
  storage,
  options: {
    lastMessages: 20,
    workingMemory: {
      enabled: true,
      template: `# User Profile
- **Name**:
- **Risk tolerance**:
- **Open positions**:
- **Stated goals**:
- **Lessons learned**:`,
    },
    semanticRecall: { topK: 5, messageRange: 2 },
  },
})
```

This is the same idea MemGPT popularized as "core memory" and that Letta now ships as default-on markdown files. The thing the model cannot afford to miss does not belong in a vector index it might fail to retrieve — it belongs in a block that is *always* in context. Working memory is where the agent's identity and current focus live; semantic recall is for the long tail. Getting that split right fixed more "dumb agent" complaints than any model upgrade did.

# Memory, all four layers, and the numbers that matter

Memory is where most Mastra deployments quietly fall apart, because people treat it as a single switch instead of four distinct layers that compose. Get the composition right and your agent feels like it remembers. Get it wrong and it either forgets the thing the user said two turns ago or it drowns in 40,000 tokens of stale chatter. I run all four layers in production, and the interesting parts are the trade-offs nobody puts in the quickstart.

## The four layers, and how they actually compose

There are four memory mechanisms and they stack, they don't replace each other. **Message history** is the sliding window — `lastMessages: 40` prepends the most recent forty messages verbatim. Dumb, cheap, reliable; your short-term working set. **Working memory** is the persistent markdown scratchpad I covered above — the second layer, a small structured block that survives across threads. **Semantic recall** is vector search over past messages: a new message gets embedded, similarity-searched against the message store, the `topK` best hits pulled, and — crucially — each hit expanded with its surrounding context via `messageRange`, so a single matched line becomes a small conversational excerpt instead of an orphan. **Observational memory** is the new and genuinely clever layer: background agents that compress long conversations into a dense observation log so the window stays small even as the thread runs for hours.

These four are additive. History gives you recency, working memory gives you durable facts, semantic recall gives you "that thing from 300 messages ago," and observational memory keeps the whole thing from blowing the window. You tune each independently.

![Four-tier vertical schematic of Mastra's composing memory system: a bottom sliding-window band of recent message blocks, a persistent scratchpad notebook above it, a vector-similarity retrieval field pulling distant past blocks forward, and a top three-agent reflection loop of writer, observer, and reflector nodes — all threaded by connector lines into a single composed-context column on the right, with one red marker on the always-loaded distilled-profile feed.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/memory-four-layers.jpg)

## Observational memory internals

Observational memory (OM) is a three-agent architecture, and once you see it the design is obvious. The **Actor** is your main agent — it sees the current observations plus the recent unobserved messages, never the full raw history. The **Observer** wakes up when unobserved messages cross a token threshold and extracts them into structured observations. The **Reflector** wakes up when the observations themselves get too big and condenses them.

The defaults tell you everything about the intended shape. The observer model is `google/gemini-2.5-flash` — cheap, fast, good enough for extraction. The observation threshold is **30,000 tokens** (unobserved messages accumulate until 30k, then the observer runs), the reflection threshold is **40,000 tokens** (once the observation log itself crosses 40k, the reflector condenses it), and the observer processes in batches capped at `maxTokensPerBatch: 10,000` so a single pass doesn't itself blow up.

The parameter I always tune by hand is `bufferActivation`, default `0.8` — OM retains the most recent ~20% of raw messages un-summarized even after observation runs. That matters more than it sounds: summaries are lossy, and the last few exchanges are exactly where the user expects verbatim fidelity. Keeping 20% raw is the difference between "the agent paraphrased what I just said" and "the agent quoted me."

There are three observation strategies, and the choice is real:

- **Sync** — blocking. When the threshold is hit, observation runs inline before the agent responds. Pick it when correctness beats latency and threads are short enough that the pause is rare.
- **AsyncBuffer** — background buffering at intervals with instant activation. Observation happens off the critical path; the agent keeps responding from buffered observations. This is what you want for interactive chat where a half-second stall is unacceptable.
- **ResourceScoped** — handles observation across multiple threads for a single resource (user). Pick this when one user has many parallel conversations and you want memory shared across all of them.

Mastra reports OM hitting **95% on LongMemEval** with GPT-5-mini and **84.23%** with GPT-4o, which they describe as state of the art. I take those at face value as a benchmark result — a reason to turn OM on, not a promise about your workload. Benchmarks measure recall on curated long-context tasks; your production threads have weirder shapes. Use the number to justify enabling OM, then measure your own retention.

## My real production config

Here's the actual memory wiring I ship in my personal assistant. No OM in this particular service yet — moderately long threads, so I lean on history plus semantic recall plus local embeddings:

```typescript

const memory = new Memory({
  storage: new LibSQLStore({ id: 'assistant-memory', url: dbUrl, authToken }),
  vector: new LibSQLVector({ id: 'assistant-vector', url: dbUrl, authToken }),
  embedder: fastembed,
  options: {
    lastMessages: 40,
    semanticRecall: { topK: 5, messageRange: { before: 2, after: 1 } },
  },
})
```

`topK: 5` with `messageRange: { before: 2, after: 1 }` means each of the five vector hits expands into a four-message excerpt — the hit plus two before and one after. Deliberately wider before than after, because the lead-up to a relevant message usually carries the context you need. The defaults are stingier (`topK: 4`, range `{ before: 1, after: 1 }`); I widened both.

The embedder choice is the real trade-off. `@mastra/fastembed` runs the embedding model locally, in-process — no embedding API calls, so no per-embedding cost and no network latency on the recall path. The cost is local CPU and a model weaker than `text-embedding-3-small`. For conversational recall that's a fine trade; for a precision-critical RAG corpus I reach for a hosted model instead. Know which side of that line you're on.

## The honest bit: I had to build my own compaction

Here's what the docs won't tell you: the built-in memory is good, and you will still hit the context wall on long-lived threads. A thread that runs for days accumulates messages faster than `lastMessages` and semantic recall can gracefully manage, and OM's summaries are tuned for its own retrieval flow, not for the hard ceiling of "this request must fit in the window." So I built an LLM-based compaction layer on top of Mastra's memory:

```typescript
export async function compactThread(messages, keepFraction = 0.4) {
  const keepCount = Math.ceil(messages.length * keepFraction)
  const toSummarize = messages.slice(0, messages.length - keepCount)
  if (toSummarize.length === 0) return { summary: '', keptCount: messages.length }

  const prompt = `Summarize the conversation below. Preserve ALL numbers, decisions,
action items, configs, names, and dates. Be concise but lossless on facts. Lossy only on chitchat.
---
${toSummarize.map((m) => `[${m.role}]: ${m.content}`).join('\n\n')}
---
Summary:`

  const { text } = await generateText({ model: google(cheapModel), prompt })
  return { summary: text, keptCount: keepCount }
}
```

Two design choices carry it. `keepFraction = 0.4` keeps the most recent 40% of the thread verbatim and summarizes the older 60% — same instinct as OM's `bufferActivation`. And the prompt is explicit: **lossless on facts, lossy on chitchat.** Numbers, decisions, commitments, configs, names, and dates survive; pleasantries get crushed. I run it on a cheap Google model because compaction is high-volume and doesn't need a frontier brain. This isn't a knock on Mastra — production long-lived threads need a hard backstop, and a 30-line function is the right tool.

## Resilience: degrade, don't crash

One idiom I apply to every memory store: never let memory take down the agent. The store is a dependency, and dependencies fail.

```typescript
let memory: Memory | undefined
try {
  memory = createMemory()
} catch (err) {
  log.warn({ err: err.message }, 'Memory init failed — running without memory')
}

const agent = new Agent({
  name: 'assistant',
  model: [/* fallback chain */],
  ...(memory ? { memory } : {}),
  tools,
})
```

If the memory store is unreachable at init, the agent comes up *without* memory and keeps answering. A degraded agent that forgets is infinitely better than a crashed agent that 500s. The conditional spread `...(memory ? { memory } : {})` is the whole trick — memory is optional wiring, not a hard requirement. Apply the same pattern everywhere a remote store sits on your critical path.

# Tools: typed, suspendable, and where the schema pain lives

A tool is the smallest unit of trust you hand an LLM. Get it wrong and the model either hallucinates a result or derails the whole loop. Across six codebases I've converged on a set of conventions for `createTool` that I now treat as non-negotiable, and almost none of them are about the happy path.

Start with the thing Mastra gets right that most frameworks fumble: you type **both** sides of the call.

```typescript

export const marketScreenerTool = createTool({
  id: 'market-screener',
  description: 'Screen markets by query, status, and 24h volume',
  inputSchema: z.object({
    query: z.string().optional(),
    status: z.enum(['active', 'closed']).optional(),
    minVolume24h: z.number().optional(),
  }),
  outputSchema: z.object({
    available: z.boolean(),
    count: z.number(),
    items: z.array(MarketSchema),
    error: z.string().optional(),
  }),
  execute: async (input) => screenMarkets(input),
})
```

Most frameworks let you describe the arguments the model passes in and stop there. The return value is just `any` flowing back into the conversation. That's a mistake. The output schema is what the model reads on the next turn — it's not documentation, it's the contract for the loop. When you declare `outputSchema`, you decide exactly which fields the model sees, in what shape, with what names. The model stops guessing whether the result has `pnl` or `totalPnlUsd`, because the shape is fixed and it learns it. Output schemas are the difference between a tool the model uses confidently and a tool it pokes at.

Mastra goes one step further with `toModelOutput`, which lets the app keep the full structured result while the model sees a trimmed, human-readable projection:

```typescript
toModelOutput: (output) => ({
  type: 'content',
  value: [{ type: 'text', text: `${output.count} markets, top by 24h volume` }],
})
```

Your frontend gets the whole object. The model gets a sentence. Two different consumers, one tool.

## Never throw — absorb the error and keep the loop alive

Here is the convention that matters most, and it took a couple of production incidents to internalize: **a tool must never throw.** An exception out of `execute` doesn't just fail the call — it can tear down the agent's turn, and now the model has no result, no error it can reason about, and no way to recover. The loop derails.

So every tool I ship wraps its provider call and absorbs failure into a structured, schema-valid object. The shape is always the same: `available: false`, an `error` string, and the rest of the output filled with its empty form.

```typescript
execute: async (input) => {
  try {
    const rows = await fetchMarketScreener(input)
    return shapeMarkets(rows) // { available: true, count, items }
  } catch (err) {
    return {
      available: false,
      error: err instanceof Error ? err.message : String(err),
      count: 0,
      items: [],
    }
  }
}
```

The unit tests assert this directly. When the provider rejects with `upstream 503`, the tool returns `available: false`, `error` contains `"503"`, and `items` is `[]`. The model reads `available: false`, sees the error string, and decides what to do — retry, tell the user the data source is down, move on. A structured failure is a fact the model can reason about. A thrown exception is a dead end. `null` is a first-class outcome too: when a summary provider returns `null`, the tool reports `available: false` with `summary: null` — not a crash, just an honest "nothing here."

![Schematic of an agent tool drawn as a sealed contract box: a typed input-schema funnel on the left feeds a guarded execute chamber whose error path is caught and routed back as a structured result token instead of tearing down the agent turn, an output-cap valve clamps oversized payloads, and a suspended-state branch loops out to a human approval node and resumes — with one red marker on the caught-error return path.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/tool-contract.jpg)

## Context-cost control belongs at the tool layer

Every field a tool returns is tokens the model pays for on every subsequent turn. A provider that hands back 75 rows of 30-field objects with 2,000-character description strings will blow your context window and your bill in a single call. So I enforce hard caps inside the tool, before the result ever reaches the model: **sort, then trim to ≤50 items, then cap strings at 500 characters** with an ellipsis.

```typescript
function shapeMarkets(rows: Market[]) {
  const items = rows
    .sort((a, b) => b.volume24h - a.volume24h) // sort BEFORE you trim
    .slice(0, 50)
    .map((m) => ({
      ...m,
      question: m.question.length > 500 ? `${m.question.slice(0, 500)}…` : m.question,
    }))
  return { available: true, count: items.length, items }
}
```

The order matters. Sort first so the 50 you keep are the 50 that matter — top by 24h volume, top by position size, highest value. Trimming an unsorted list throws away signal at random. The tests pin this down: feed in 75 rows and you get back exactly 50; a 2,000-character string comes back at ≤501 characters. A balances tool does the same — 60 balances in, sorted by `valueUsd` descending, trimmed to 50, with the aggregate `totalValueUsd` computed from the full set so trimming never lies about the total. This is not premature optimization. It's the difference between a tool that costs 800 tokens and one that costs 40,000.

## Gate on required inputs before you ever call the provider

Some tools need at least one of several inputs. A wallet-balances tool needs an `address` or an `entityName`. If the model calls it with neither, the right answer is not to call the provider and let it 400 — it's to short-circuit with the same structured-failure shape and tell the model what it's missing.

```typescript
execute: async ({ address, entityName, chain }) => {
  if (!address && !entityName) {
    return { available: false, error: 'Provide an address or entityName', balances: [] }
  }
  return shapeBalances(await fetchWalletBalances({ address, entityName, chain }))
}
```

The test asserts that when called with `{}`, the tool returns `available: false`, the error contains `"address or entityName"`, and the provider mock was **never called**. No wasted round-trip, no upstream 400 to absorb, and the model gets an actionable message it can fix on the next turn.

## Suspend, resume, and human approval

Some tool calls shouldn't fire autonomously. For those, Mastra gives tools the same suspend/resume machinery the workflow engine has. You declare a `suspendSchema` and a `resumeSchema`, flip `requireApproval: true`, and the agent emits a tool-call-approval chunk instead of executing.

```typescript
export const approvalTool = createTool({
  id: 'request-approval',
  inputSchema: z.object({ action: z.string() }),
  outputSchema: z.object({ approved: z.boolean(), result: z.string() }),
  suspendSchema: z.object({ reason: z.string() }),
  resumeSchema: z.object({ approved: z.boolean() }),
  requireApproval: true,
  execute: async (input) => ({ approved: true, result: `Action ${input.action} completed` }),
})
```

For a trading agent, this is the rail between "the model suggested a position" and "the model opened one." The approval happens at the agent level — the tool stays suspended, a human says yes or no, and the loop resumes with typed `resumeData`. Nothing irreversible happens without a typed gate in front of it.

# Workflows: when you do not let the agent improvise

An agent is a loop that decides what to do next. That's its strength and, in production, its liability. When the steps are known, the order is fixed, and a wrong turn costs real money, you do not want a language model improvising control flow. You want a graph. That's what Mastra workflows are: a type-safe, composable engine for the paths you've already decided on.

The mental model I use is simple. Agents are for open-ended reasoning. Workflows are for everything I can draw on a whiteboard. If I can name the steps and the branches, it's a workflow, and the LLM only gets invoked *inside* a step where I explicitly want judgment — never as the thing choosing which step runs.

![Architectural flow schematic contrasting deterministic orchestration with agent improvisation: a fixed directed graph of sequential and parallel step boxes joined by branch gates, a retry-counter loop on one node, and a suspend/resume checkpoint, all running on fixed rails — set against a faint greyed-out cloud of free-form agent wandering it deliberately replaces, with one red marker on the suspend/resume checkpoint.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/workflow-control.jpg)

## The graph primitives

The fluent API is small enough to hold in your head, and that's the point. You compose steps with a handful of methods and call `.commit()` to finalize:

| Method | What it does |
|--------|--------------|
| `.then(step)` | Sequential |
| `.parallel([steps])` | Run steps concurrently |
| `.branch([[cond, step], ...])` | Conditional fan-out |
| `.dowhile(step, cond)` / `.dountil(step, cond)` | Loop while/until |
| `.foreach(step)` | Iterate over an array input |
| `.sleep(ms)` / `.sleepUntil(date)` | Pause |
| `.waitForEvent(...)` | Pause until an external event |

```typescript

export const tradeReviewWorkflow = createWorkflow({
  id: 'trade-review',
  inputSchema: z.object({ ticker: z.string(), notionalUsd: z.number() }),
  outputSchema: z.object({ executed: z.boolean() }),
})
  .then(gatherSignals)
  .parallel([scoreMomentum, scoreRisk])
  .then(combineScore)
  .branch([
    [async ({ inputData }) => inputData.score > 80, autoExecuteStep],
    [async ({ inputData }) => inputData.score > 50, humanApprovalStep],
    [async ({ inputData }) => inputData.score <= 50, rejectStep],
  ])
  .commit()
```

That `.parallel` step runs two scorers concurrently and the next step receives `{ 'score-momentum': {...}, 'score-risk': {...} }`. The `.branch` evaluates conditions and the downstream step receives only the branches that actually ran. None of this is the model deciding — it's your code, deterministic, testable, replayable. The engine even persists state across server restarts, so a run that's mid-flight survives a deploy.

## Steps are typed on every edge, including resume

A step is the unit of work. You give it an `id`, an `inputSchema`, an `outputSchema`, and — when it pauses for a human — a `resumeSchema`. The schemas type every edge of the graph: what flows in, what flows out, and what a human hands back on resume.

```typescript
const humanApprovalStep = createStep({
  id: 'human-approval',
  inputSchema: z.object({ ticker: z.string(), notionalUsd: z.number(), score: z.number() }),
  outputSchema: z.object({ approved: z.boolean() }),
  resumeSchema: z.object({ approved: z.boolean() }),
  suspendSchema: z.object({ reason: z.string() }),
  execute: async ({ inputData, resumeData, suspend }) => {
    if (!resumeData?.approved) {
      return await suspend({ reason: `Approve ${inputData.notionalUsd} on ${inputData.ticker}?` })
    }
    return { approved: resumeData.approved }
  },
})
```

The pattern is the same every time: on first entry there's no `resumeData`, so the step calls `suspend(...)` with a payload typed by `suspendSchema`, and the run halts. When a human responds, the step re-executes — same code, top to bottom — but now `resumeData` is populated and it returns. One function, two entry points, both typed. You can also drop an agent or a tool straight into the graph: `createStep(myAgent)` and `createStep(myTool)` both work, so a step that needs judgment delegates to an agent while the surrounding control flow stays deterministic.

## The run lifecycle

Three calls. You create a run, you start it with explicit input, and if it suspends you resume it with explicit resume data.

```typescript
const run = await workflow.createRun()
const result = await run.start({ inputData: { ticker: 'ETH', notionalUsd: 5000 } })

if (result.status === 'suspended') {
  // result.suspended === ['human-approval']
  const final = await run.resume({ step: humanApprovalStep, resumeData: { approved: true } })
}
```

`run.start` takes `{ inputData }`. `run.resume` takes `{ step, resumeData }` — and `step` accepts either the step object or its string id. The result `status` is one of `success`, `failed`, `suspended`, `tripwire`, or `paused`, and `result.suspended` tells you exactly which step is waiting. The `runId` is a `crypto.randomUUID()` minted by `createRun()`. Remember that fact — it matters in a second.

## War story: the boundary between HTTP and your workflow runtime is a security boundary

In the market-intelligence API, this lifecycle is exposed over HTTP behind admin scope: a `POST /workflows/:id/execute` creates a run and starts it, a `POST /workflows/:id/runs/:runId/resume` rehydrates and resumes it. Clean — until you look at how an early version wired up the input.

The execute route accepted a JSON body and forwarded it to the workflow. The intent was to read `body.inputData`. But an early version had a fallback: if `inputData` was missing, it used the **entire request body** as `inputData`. That fallback is a security hole. Whatever a client puts in the body — arbitrary, unvalidated fields — flows straight into the workflow runtime context, including the chance of colliding with reserved keys the workflow system uses internally. The HTTP boundary was leaking arbitrary input into a place that assumed it was trusted.

The fix has two parts, and both are about treating the edge as a security boundary:

```typescript
// Only ever forward inputData. No fallback to the whole body.
const run = await workflow.createRun()
const result = await run.start({ inputData: body.inputData ?? {} })

// On resume, validate runId — createRun() emits UUIDs, so anything else is a client bug.
const runId = requireUuidParam(c.req.param('runId'), 'runId') // malformed → 400, not 500
```

First, make the contract explicit: the route forwards `body.inputData` and nothing else — if it's absent, the workflow gets `{}`, never a smuggled body, and the workflow's own `inputSchema` does the deep validation from there. Second, validate the `runId` as a UUID at the boundary, so a malformed value is a clean `400` instead of a `500` exploding out of the engine. The lesson generalizes past this one route: anything that crosses from HTTP into your workflow runtime must be explicitly named and explicitly validated. The body is attacker-controlled; the runtime context is trusted. The schema validation, the UUID check, the `?? {}` — that's the wall between the two. Build the wall.

# Supervisors, not `.network()`

Multi-agent routing is where most teams either over-engineer or footgun themselves. Mastra gives you a coordinator agent that holds other agents as sub-agents. My reference build wires a head analyst over four specialists:

![Hub-and-spoke delegation schematic: a central supervisor agent node ringed by labeled specialist sub-agent boxes, directed delegation arrows routing tasks outward and results back inward, with depth-limiter and hand-off-budget guard controls gating the spokes, and a faint greyed-out older peer-to-peer mesh sitting behind it as the deprecated design it replaces — one red spoke marks the supervisor's routing decision.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/supervisor-delegation.jpg)

```typescript
const coordinator = new Agent({
  id: 'lead-analyst',
  name: 'LeadAnalyst',
  description: 'Head analyst — synthesizes all intelligence into trade signals',
  instructions: personaMd,
  model: 'anthropic/claude-sonnet-4-20250514',
  agents: { macroAgent, fundingAgent, narrativeAgent, whaleAgent },
  tools: { getSignal, executeTrade, generatePost, publishPost },
  workflows: { hourlyCycleWorkflow, monitoringWorkflow },
  memory: networkMemory, // required for .network()
})

const result = await coordinator.network('Analyze the BTC setup and give me a trade signal.', {
  memory: { thread: 'daily-analysis', resource: 'analyst-team' },
})
```

That `.network()` call lets the LLM dynamically route to sub-agents, workflows, or tools. It works. But **`AgentNetwork` was deprecated in favor of supervisor agents (added in `@mastra/core@1.8.0`)**, and if you are starting today you should skip it. The supervisor pattern is just normal delegation: you hold the sub-agents in `agents: {}` and the object keys *become the tool names the LLM calls*. The coordinator decides who to invoke through ordinary tool calls. No special `.network()` method, no second migration waiting for you when the deprecation lands fully.

The supervisor API also gives you control points the old `.network()` did not, and they are the difference between a demo and a production system. Two I rely on:

- **`messageFilter`** — by default, **the entire conversation is forwarded to every sub-agent**. That is a real privacy and token-cost footgun: a specialist that only needs the last question gets the whole thread, including anything confidential earlier in it. I filter it down (`messages.slice(-10)`, or strip flagged content) before delegation.
- **`onDelegationStart` / `onDelegationComplete`** — the start hook can return `{ proceed, modifiedPrompt, modifiedMaxSteps, rejectionReason }`, so you can rewrite the prompt per sub-agent or cap its iteration budget on the fly; the complete hook gives you `ctx.bail()` to kill the loop early and a `feedback` string saved to the supervisor's memory.

There is also a memory-isolation detail worth knowing: each sub-agent invocation gets a **fresh thread**, and only the delegation prompt and the sub-agent's response are written to its memory — not the supervisor's full history. That keeps specialists from polluting each other's context, but it also means a sub-agent has no memory of its own previous calls unless you wire that yourself.

The lesson generalizes: Mastra moves fast, and "the impressive method from six months ago" is sometimes the deprecated path. Read the changelog before you build your architecture around a named API.

# MCP on both ends

This is the capability I value most and the one LangChain answers with "community integrations." Mastra is a first-class MCP **client and server**.

![Mirrored architectural schematic of one system acting as both MCP client and MCP server at once: on the left it consumes external tool servers through a client adapter with multiple transport channels, on the right it publishes its own tools out to other agents through a server adapter, and a mid-tool elicitation loop pauses to request more input before resuming across the boundary — one red marker on the elicitation pause.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/mcp-both-ends.jpg)

As a client, you point `MCPClient` at remote servers (stdio, SSE, or StreamableHTTP), and `await mcp.getTools()` injects their tools straight into an agent. My personal assistant wires up whatever the environment provides:

```typescript
const servers: Record<string, any> = {}
if (process.env.GITHUB_TOKEN)
  servers.github = {
    command: 'npx',
    args: ['-y', '@modelcontextprotocol/server-github'],
    env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN! },
  }
servers.filesystem = {
  command: 'npx',
  args: ['-y', '@modelcontextprotocol/server-filesystem', '/workspace'],
}
const mcpClient = new MCPClient({ id: `assistant-${Date.now()}`, servers })
```

As a server, you expose your own tools, agents, and workflows to any MCP client — Claude Desktop, Cursor, Windsurf. Tools pass through directly; **agents become `ask_<id>` tools and workflows become `run_<id>` tools**. My trading build publishes its intelligence layer this way:

```typescript
new MCPServer({
  id: 'market-intelligence',
  name: 'Market Intelligence Server',
  version: '1.0.0',
  tools: { getSignal, getMacro, getFunding, getNarratives, getWhaleActivity, getPrice },
  agents: { macroAgent, fundingAgent, narrativeAgent, whaleAgent }, // → ask_macro-agent, etc.
})
```

The payoff: the same agents I orchestrate internally are callable from any MCP host with no extra glue. That bidirectional symmetry — consume the ecosystem and publish into it from one library — is the single biggest reason I keep choosing Mastra.

# MCP, deeper: transports, elicitation, and OAuth

The basics above — standing up an `MCPServer`, pointing an `MCPClient` at remote servers, agents becoming `ask_<key>` tools and workflows becoming `run_<key>` tools — are the easy 80%. The part that decides whether you can actually expose agents to the public is the transport, the human-in-the-loop primitive, and the auth surface.

## Four transports, and when each one is right

`MCPServer` can speak over four transports, and choosing wrong is the most common MCP mistake I see. **`startStdio()`** is the subprocess transport over stdin/stdout — for when an MCP client launches your server as a child process (the Claude Desktop / IDE model). No network, no ports, no auth; useless for a hosted service. **`startSSE(...)`** is Server-Sent Events over a Node HTTP server: a long-lived `GET` streams server→client, the client POSTs for client→server. **`startHonoSSE(...)`** is the same SSE protocol mounted into an existing Hono request context — how you add MCP to your app without standing up a second server. And **`startHTTP(...)`** is Streamable HTTP, the modern transport, and my default for hosted deployments:

```typescript
await server.startHTTP({
  url: new URL(req.url, `https://${req.headers.host}`),
  httpPath: '/mcp',
  req,
  res,
  options: { serverless: true }, // stateless — no session affinity required
})
```

`serverless: true` is the one to reach for on Lambda, Cloudflare Workers, or any platform where you can't pin a client to a warm instance. Stateful sessions assume the same process handles every request in a session; serverless platforms don't guarantee that, so stateless mode drops the per-session server and treats each request independently. Pick stateful for long-lived Node processes where you want session continuity; pick `serverless: true` the moment your runtime is ephemeral.

## Elicitation: asking the user a question mid-tool

This is the feature I wish were on the front page. **`server.elicitation.sendRequest()` lets a tool pause mid-execution and ask the calling user for input** — not at the start, not via a separate approval endpoint, but *during* the tool's own execution, over the MCP connection:

```typescript
execute: async ({ context }, { elicitation }) => {
  const result = await elicitation.sendRequest({
    message: 'This will delete 3,200 records. Confirm the target environment:',
    requestedSchema: {
      type: 'object',
      properties: { environment: { type: 'string', enum: ['staging', 'production'] } },
      required: ['environment'],
    },
  })
  if (result.action !== 'accept') return { cancelled: true }
  // proceed with result.content.environment
}
```

Client-side, the consumer handles it with `mcp.elicitation.onRequest(serverName, handler)`. This is a real human-in-the-loop primitive over a standard protocol — the tool requests structured input, the client surfaces it to the user, the answer flows back, and the tool continues. I use it for destructive operations and for disambiguation ("you said 'the report' — which one?") without inventing my own out-of-band confirmation channel.

## OAuth for public servers

The moment you expose an MCP server beyond your own machine, you need auth, and Mastra implements the modern MCP spec here. The server publishes **OAuth Protected Resource Metadata (RFC 9728)**, so a client that gets bounced with a `401` learns *where* to authenticate rather than just failing. Client-side, `MastraOAuthClientProvider` is a full OAuth 2.0 client — PKCE, token refresh and revocation, custom redirect handling, and a pluggable `OAuthStorage` interface (`InMemoryOAuthStorage` for dev, your own implementation for production). The shape I ship for a public agent server: `startHTTP` with `serverless: true` behind the OAuth protected-resource config, agents and workflows auto-exposed as `ask_*` and `run_*`, elicitation for anything destructive, and a real `OAuthStorage` backing token persistence. That combination gives you a public, authenticated, human-in-the-loop-capable agent server any spec-compliant MCP client can discover and call.

# RAG: chunking is the whole game

Everybody demos RAG by stuffing a PDF into a vector store and asking it a question. Then they ship it and wonder why retrieval is garbage. Retrieval quality is mostly decided before you ever embed anything — it's decided at chunking. Mastra's `MDocument` gives you nine chunking strategies, and picking the right one per document type matters more than any model choice downstream.

![Editorial data illustration of a retrieval pipeline where chunking decides the outcome: a long document ribbon sliced into overlapping segments of varying granularity, each embedded into a vector field of scattered points, a similarity query pulling the nearest cluster into a reranked shortlist that feeds a grounded answer panel, with bars comparing coarse versus fine chunk quality and one red marker on the best-grounded chunk.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/rag-chunking.jpg)

```typescript

const doc = MDocument.fromMarkdown(markdownText)
const chunks = await doc.chunk({ strategy: 'recursive', maxSize: 512, overlap: 50 })
```

The defaults — `recursive`, `maxSize: 512`, `overlap: 50` — are a sane start, and recursive is right for most prose. But the strategy list is there for a reason. Use `markdown` or `semantic-markdown` for docs with heading structure so chunks respect sections instead of slicing mid-thought. Use `html` with header-based splitting for scraped pages. Use `token` when you need chunks sized to a model's exact tokenizer, `json` for structured payloads, `sentence` for clean sentence boundaries. There's even language-aware recursive splitting across 26 programming languages — chunk a TypeScript file and it splits on function and class boundaries, not arbitrary character counts.

The two knobs that move the needle: `maxSize` and `overlap`. Bigger chunks carry more context per hit but dilute relevance and cost more tokens. Overlap stops you cutting a fact in half at a boundary — 50 over 512 is ~10%, which I rarely go below. Tune these against your eval set, not your gut. You can also extract metadata during chunking — title, summary, keywords, or a Zod schema, all LLM-driven — but it runs an LLM per chunk, so it's slow and not free. I use it selectively on high-value corpora.

## Embedding, stores, and the reranking nobody does

Embedding is a one-liner via magic strings — `'openai/text-embedding-3-small'` or `'google/text-embedding-004'` — and Mastra abstracts over vector stores with a common `MastraVector` interface (`query`, `upsert`, `createIndex`), so your retrieval code doesn't change when you swap LibSQL for PgVector or Pinecone or Chroma. The abstraction still exposes per-store configs where it counts: PgVector's `minScore` and HNSW `ef`, Pinecone's `namespace` and `sparseVector` for hybrid search, Chroma's `where` filters.

Then the part people skip: **reranking is not optional.** Vector similarity gets you candidates; reranking gets you the *right* candidates. The pattern I use is over-fetch then rerank — pull `topK: 20` from the store, rerank down to the 5 I actually feed the model:

```typescript

const queryTool = createVectorQueryTool({
  indexName: 'docs',
  model: 'openai/text-embedding-3-small',
  vectorStore,
  reranker: {
    model: cohereReranker,
    options: { topK: 5, weights: { semantic: 0.4, vector: 0.4, position: 0.2 } },
  },
})
```

The score blends three signals — `semantic`, `vector`, and `position`, defaulting to `0.4 / 0.4 / 0.2`. The vector search is recall-optimized and noisy; the reranker is precision-optimized. Two stages, each doing its job. Position weighting is low for a reason — you usually don't want chunk order in the source doc to dominate relevance. There's also `createGraphRAGTool` for graph-based retrieval that builds a semantic graph over chunks and does random-walk reranking, which helps when answers require connecting facts across documents.

# Evals: the part teams skip until they regress

This is the part teams skip entirely, and it's why their agents silently regress. Mastra's eval system is a `createScorer()` builder with a four-stage pipeline — `preprocess → analyze → generateScore → generateReason` — with **7 code-based scorers** (deterministic, no LLM) and **12 LLM-based scorers** (LLM-as-judge) out of the box.

The code-based ones are fast and free: completeness, content similarity, keyword coverage, textual difference, tone, tool-call accuracy, and trajectory accuracy. The LLM-based ones are the ones I care about for RAG: **faithfulness** (is the output grounded in the retrieved context, or fabricated?), **hallucination** (content with no source?), **context precision** and **context relevance** (was the retrieved context any good?), plus answer relevancy, bias, toxicity, prompt alignment, and noise sensitivity.

```typescript

const faithfulness = createFaithfulnessScorer({ model: 'openai/gpt-4o' })
const result = await faithfulness.run(
  createTestRun(
    { inputMessages: [{ role: 'user', content: question }] },
    { text: agentOutput },
    { context: retrievedChunks },
  ),
)
// result.score (0–1) + result.reason (why)
```

`generateReason` is the underrated stage: every score comes with a human-readable explanation of *why*, which turns a failing number into an actionable diagnosis. I run these two ways. **In CI**, code-based scorers and a small LLM-judged suite gate every deploy against a fixed eval set — if faithfulness drops on the golden questions, the build fails. That's the regression net. **Live**, I sample a fraction of real production traffic and score it asynchronously off the critical path, because CI eval sets go stale and real users ask things you never anticipated.

The honest take: the scorers measure, they don't *fix*. There's no built-in loop where a low faithfulness score automatically re-retrieves or escalates the model — you build that feedback loop yourself on top of the scores. Mastra gives you the measurement primitives and the explanations; turning those into a self-correcting agent is still your code. That's the right division of labor — I'd rather own the policy than inherit someone's opinion of it — but go in knowing the scorer is a thermometer, not a thermostat.

# Guardrails are output processors that can block

Most "guardrail" features are a moderation API you call before you send. Mastra's are a processor pipeline that intercepts at five stages — `processInput`, `processInputStep`, `processOutputStream`, `processOutputStep`, `processOutputResult` — with built-ins for prompt-injection detection, PII, and moderation. The part I actually rely on is custom processors that can **block a tool call mid-stream**. My trading agent enforces position-size and daily-loss limits this way, so the safety rule lives in code, not in a prompt the model can rationalize around:

```typescript
class TradeSafetyProcessor implements Processor {
  async processOutputStep(params) {
    for (const call of params.toolCalls ?? []) {
      if (call.toolName === 'execute-trade') {
        if (call.args.sizeUsd > MAX_POSITION_SIZE)
          return { ...params, action: 'block', reason: 'exceeds max position size' }
        if (dailyLossCount >= DAILY_LOSS_LIMIT)
          return { ...params, action: 'block', reason: 'daily loss limit reached' }
      }
    }
    return params
  }
}
```

A safety limit enforced by a processor is deterministic. A safety limit written into the system prompt is a suggestion. For anything that moves money — or sends an email, or merges a PR — put it in a processor and let the trip-wire fire regardless of what the model decided.

# Model fallback is one line, so there is no excuse

Provider outages are not hypothetical; they are a monthly event. Mastra agents accept a `ModelWithRetries[]` array instead of a single model, and it cascades automatically:

```typescript
return [
  { id: `${row.provider}-primary`, model: primary, maxRetries: 2 },
  { id: `${row.fallbackProvider}-fallback`, model: fallback, maxRetries: 1 },
] satisfies ModelWithRetries[]
```

Primary fails twice, it rolls to the fallback provider. In the market-intelligence API this is the difference between a degraded response and a 500 during an Anthropic blip. Combined with the model router — magic strings like `anthropic/claude-sonnet-4` resolving across 600+ models through the Vercel AI SDK — swapping providers is a config change, not a refactor. There is no good reason to ship a single-provider agent to production.

# Multi-tenant tracing with RequestContext

If you serve more than one customer through the same agent, you need per-request isolation that flows all the way down to your traces. Mastra's `RequestContext` (renamed from `RuntimeContext` in v1 — more on that below) carries typed, scoped values into every tool and onto every observability span:

```typescript
const ctx = new RequestContext<{ apiKeyId: string }>([['apiKeyId', apiKeyId]])

// inside a tool: refuse to run without tenant isolation
execute: async (input, { requestContext }) => {
  const apiKeyId = requireApiKeyId(requestContext)
  // …every downstream span is now stamped with apiKeyId
}
```

In the market-intelligence API, every agent run, model generation, tool call, and memory operation lands in the trace store tagged with the calling API key. When a customer reports a bad answer, I filter traces by their key and replay the exact run. That observability is automatic — spans for `AGENT_RUN`, `MODEL_GENERATION`, `TOOL_CALL`, `WORKFLOW_STEP`, and `MEMORY_OPERATION` — with 14 exporters (Langfuse, LangSmith, Datadog, Sentry, and friends) plus a `DefaultExporter` into Mastra's own storage and a `SensitiveDataFilter` that redacts secrets from spans.

One footgun worth stating plainly: **observability is silently off if you only register an external exporter and its keys are missing.** Early on, the autonomous-CMO product only wired `LangfuseExporter`, and only when Langfuse keys existed — so preview and local deploys had zero tracing and I did not notice until I went looking for a trace that was never recorded. Always run `DefaultExporter` too.

# Observability, deeper: spans, exporters, and the environment tag

The span taxonomy is worth understanding because each type answers a different production question. **AGENT_RUN** is the root span for a single `generate()` / `stream()` call — the trace you open when a customer says "the assistant gave me garbage at 14:03," holding the input, the resolved system prompt, the final output, total tokens, and every child span. **MODEL_GENERATION** is one LLM round-trip with model id, params, tokens, latency, and cost — and when you run `ModelWithRetries[]` fallback chains you get one per attempt, so a turn that fell over from the primary to a fallback shows *two* spans and you can see exactly where the first failed and what the second cost. **TOOL_CALL** captures one tool invocation's args, output, time, and errors — where I catch tools that "succeeded" but returned junk. **WORKFLOW_STEP** holds one step's input, output, and `retryCount`, and fires even when a step throws *before* emitting progress, which is precisely when you most need to know where it died. **MEMORY_OPERATION** captures recall, save, semantic search, and working/observational memory passes — invisible in most setups and the spans I lean on hardest, because memory is where the weird, hard-to-reproduce behavior hides.

The reason the taxonomy matters is replay. A single AGENT_RUN with its full child tree *is* the customer's exact run — same input, same resolved prompt, same model attempts, same tool outputs, same memory reads. I drop that into a dataset and re-run it against a new prompt or model and compare side by side, instead of reproducing a bug from a screenshot. Flat logs can't do that; a typed span tree can.

On exporters, Mastra fans out — register as many sinks as you want and every span goes to all of them. The lineup is broad: Mastra Cloud, Langfuse, LangSmith, Datadog, Sentry, PostHog, Braintrust, Arize and Phoenix, Laminar, a generic OTLP exporter (plus an `OtelBridge`), a Console exporter for dev, and the DefaultExporter. Two are non-negotiable in my config and they're the ones people skip:

```typescript

const observability = new Observability({
  processors: [new SensitiveDataFilter()], // redacts secrets from spans BEFORE export
  exporters: [
    new DefaultExporter(), // always on — persists into Mastra's own storage
    ...(process.env.LANGFUSE_PUBLIC_KEY ? [new LangfuseExporter({ /* keys */ })] : []),
  ],
})
```

**DefaultExporter always runs** — it writes traces into Mastra's own storage so I can browse every run in Studio with zero external services configured. That's the fix for the silent-tracing footgun: the local store is the floor, external sinks are additive. **SensitiveDataFilter is a processor, not an afterthought** — spans capture tool inputs verbatim, and tool inputs are exactly where API keys and bearer tokens live. Skip the filter and you've built a secret-exfiltration pipeline with a dashboard on it.

## The `environment` field: stamp the deployment, not every call

Here's a small-diff, large-value upgrade. As of Mastra 1.31+, the registry accepts a top-level `environment` field, and Mastra stamps that value onto **every span, every score, and every metric**. Before this, the only way to know whether a trace came from prod, staging, or a throwaway preview was to thread `tracingOptions.metadata.environment` through every call site — and miss one, and that trace becomes unattributable noise. One config field replaces all of it.

The only real decision is how to *resolve* the value, and the subtlety that bit me is that empty and whitespace-only env vars must **fall through**, not silently win:

```typescript
export function resolveEnvironment(): string {
  const explicit = process.env.APP_ENV?.trim()
  if (explicit) return explicit // 1. explicit override wins if non-blank

  const appName = process.env.PLATFORM_APP_NAME?.trim()
  if (appName) return appName === 'my-app' ? 'production' : appName // 2. platform app name

  const nodeEnv = process.env.NODE_ENV?.trim()
  if (nodeEnv) return nodeEnv // 3. runtime mode

  return 'development' // 4. fallback
}

const mastra = new Mastra({ /* … */, environment: resolveEnvironment(), observability })
```

Two details I pin with unit tests because they break silently: **blank values don't match** (`APP_ENV=""` must fall through, not stamp every span with an empty string — so I `.trim()` and treat empty as absent), and **source order is load-bearing** (reading `NODE_ENV` before `APP_ENV` would quietly demote my explicit override). The payoff: I filter dashboards by `environment = production` and preview-deploy noise vanishes, while preview machines that get their literal app name (`my-app-pr-42`) stay individually distinguishable. Scores and metrics carry the same tag, so my eval dashboards stop averaging staging experiments into production quality numbers.

# The server layer you get for free

The first time I wired up a Mastra deployment, I spent half a day building REST endpoints over my agents before I realized I didn't need to. `@mastra/server` already exposes a full HTTP API over every agent, workflow, and tool you've registered — list, generate, stream, resume, cancel — the same API Studio talks to. You get it for free the moment your `Mastra` instance is handed to a server adapter (`@mastra/hono` or `@mastra/express`). It's a genuinely good deal, and also the part of the stack I'd tell you to read the source on before you trust it in front of real traffic.

Start with the import model, because it will trip you up: **`@mastra/server` has no top-level export.** Everything is subpath imports, organized across 27 route-module groups — `AGENTS_ROUTES`, `WORKFLOWS_ROUTES`, `MCP_ROUTES`, `A2A_ROUTES`, `OBSERVABILITY_ROUTES`, and a long tail. Most do exactly what you'd guess. A handful are worth calling out because they're powerful and almost nobody mentions them:

- **`POST /workflows/:id/time-travel`** (and `-stream`) — replay a workflow run from a prior step with modified state and watch it re-derive. For debugging a multi-step agent that went off the rails three steps in, this is the difference between guessing and reproducing.
- **`POST /workflows/:id/restart-all-active-async`** — restarts every currently-active run, asynchronously. After a bad deploy that wedged a batch of in-flight runs, this is your recovery button.
- **`PUT /agents/:id/model/reorder`** — reorders an agent's model fallback list **live** over HTTP, no redeploy. When a provider starts throwing 529s, I demote it to the bottom of the chain in one request and promote it back when it recovers.
- **`GET /.well-known/agent.json`** — the A2A discovery card, published automatically so other agent systems can discover and call yours.

## The one default you need to know: stream redaction

Here's the detail I want burned into your memory: **stream redaction is on by default.** When an agent streams, the raw chunks contain a `metadata.request` field — your system prompt, your full tool definitions, and depending on provider, request-level credentials. Mastra strips it: `redactStreamChunk()` runs on every chunk before it leaves the server, handling both v1 and v2 stream formats, and it's wired to `streamOptions.redact` which **defaults to `true`**. I'm glad it's default-on — that's the right call. But know that **if you set `redact: false`, you are publishing your system prompts and tool schemas to every streaming client.** Treat it the way you'd treat `NODE_TLS_REJECT_UNAUTHORIZED=0` — a thing that exists for a narrow reason and leaks badly if it escapes into prod.

# Auth, RBAC, and the enterprise license line

Auth is where I see people make architecture decisions on incomplete information, so I'll be blunt about where the line is. Mastra ships a real, usable JWT auth layer in open source. It also ships RBAC — but RBAC is gated behind a commercial Enterprise Edition license. If you architect around role-based permissions assuming they're free, you'll hit that wall late, and late is expensive.

`@mastra/auth` gives you `MastraJwtAuth`, a complete JWT provider. Under the hood there are two verification paths, and which one you use determines your whole auth topology: **`verifyHmac(token, secret)`** is symmetric shared-secret verification — the right choice when your own service mints the tokens — and **`verifyJwks(token, jwksUri)`** is asymmetric verification against a JWKS endpoint, which is how you integrate with an external OAuth/OIDC provider (Auth0, Cognito, Okta, a corporate IdP). If you're doing SSO, the JWKS path is yours.

```typescript

const auth = new MastraJwtAuth({
  name: 'jwt',
  secret: process.env.JWT_AUTH_SECRET,
  mapUser: (payload) => ({ id: payload.sub, email: payload.email, name: payload.name }),
})
```

The **default claim mapping** is sensible: `id` from `sub` (falling back to `id`), `email` and `name` from same-named claims, and the avatar from `avatarUrl`, `avatar_url`, **or `picture`** — so standard OIDC `picture` claims just work without a custom mapper. The system is built on pluggable interfaces you implement yourself: `IUserProvider`, `ISSOProvider`, `ICredentialsProvider`, `ISessionProvider`, and `IRBACProvider`. The first four are usable in open source — bring your own user store, SSO, and session strategy.

Now the part I wish someone had told me directly: **`IRBACProvider` and the permission-enforcement machinery require a commercial Enterprise license.** The mechanism is nice when it's on — permissions follow a `{resource}:{action}` convention *derived automatically* from the path segment and HTTP method (`GET /agents/:id` → `agents:read`, `POST /agents/:id/generate` → `agents:execute`, `DELETE /workflows/:id/runs/:runId` → `workflows:delete`). But none of it engages without a `MASTRA_EE_LICENSE` env var. No license, no permission checks — `IRBACProvider` simply isn't part of the enforcement path. Authentication (who you are) is free; authorization at the role level (what you're allowed to do) is the EE line. This isn't a complaint — gated enterprise features are a normal way for an open-source framework to fund itself. It's a planning fact. If your product needs per-role permissions, budget for the license now or plan to enforce authorization in your own application layer. What you must not do is design your multi-tenant permission model assuming `requiresPermission` will just work in the OSS build, ship it, and discover at the eleventh hour that the checks were never running.

# The bugs that taught me the most

Glossy framework posts skip the bugs. The bugs are where the real knowledge is. Here are four that cost me real time, in rough order of how much they taught me.

![Editorial cut-paper collage of four silent production failures hiding under a calm running surface: a schema cog with subtly malformed teeth rejected at a strict gate, a temperature dial nudged to a default that trips a hard stop, a dropped flag falling unnoticed through a crack, and a migration gear grinding in an endless boot loop — one red marker singles out the one bug being patched by hand.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/silent-bugs.jpg)

## The leak that made me patch compiled `node_modules`

This is the one that most changed how I think about Mastra. Mastra converts your Zod tool schemas to JSON Schema for the model. Under OpenAI's **strict function-calling** mode, every sub-schema inside an `anyOf`/`oneOf`/`allOf` must carry a `type`. Mastra's converter does not guarantee that: a `z.any()` becomes `{}` in JSON Schema, and the post-processor wraps it into `anyOf: [{}, { type: "null" }]`. That bare `{}` has no `type`, and OpenAI rejects the entire request.

You cannot fix this from your own code, because the broken conversion happens deep inside `@mastra/schema-compat` and the `@mastra/core` OpenAI provider. So the autonomous-CMO product ships a postinstall script that **patches the compiled dist files in `node_modules`** — six patches across `@mastra/schema-compat` and two hashed `@mastra/core` chunks. It injects a recursive normalizer that, for every schema node, forces a `type` onto typeless `anyOf`/`oneOf`/`allOf` members, sets `additionalProperties: false` on every object, and strips `patternProperties` (also unsupported in strict mode). One of the six is a genuine upstream bug fix: a suspended tool's `resumeData` is typed `z.any()`, producing an untyped `{}`, so the patch rewrites it to `z.record(z.string(), z.any())`.

I am not proud of patching a framework's compiled output on every install. But it is the honest state of things: **Mastra's Zod-to-JSON-Schema layer is not safe for OpenAI strict mode out of the box**, and the hashed chunk filenames mean the patch is fragile and has to be re-verified on every upgrade. If you run agents on OpenAI with non-trivial tool schemas, test this path before you ship — the failure is a hard 400, not a degraded answer.

## `temperature: 0.7` that silently breaks newer models

During an outage I flipped a model env var to route around a provider blip. Every request started returning HTTP 400: `temperature is deprecated for this model`. I had not set a temperature anywhere — **Mastra defaults `temperature: 0.7`** when an agent definition does not override it, and the newer model I had just failed over to rejects any sampling param at all.

The fix is a small AI-SDK middleware (`specificationVersion: 'v3'`, a `transformParams` hook) that strips `temperature`/`topP`/`topK` for the models that hate them, wired into the model registry. But the lesson is the trap: a framework default that is invisible in your code, combined with a fast-moving model landscape where "send no sampling params" is increasingly the rule, means **your failover path can carry a default that the failover target rejects**. The thing that is supposed to save you during an incident becomes the thing that 400s during the incident. Set sampling params explicitly, or strip them explicitly — never inherit the framework default blind.

One related gotcha in the same family: the cost/runaway guard I built as a processor **silently no-ops unless you populate Mastra's reserved scope keys** (`MASTRA_RESOURCE_ID_KEY` / `MASTRA_THREAD_ID_KEY`) on the request context. Without them, the processor's scope filter resolves to `undefined` and the guard just... skips. A safety control that quietly does nothing is worse than no control, because you think you are covered. I now have a test that asserts those keys are present.

## The `Tool.background` silent drop (1.32.1)

I declared a couple of long-running tools — a web extractor and a paginated X search — as background tasks so they would not block the response stream. They blocked anyway. SSE clients stalled 20 to 60 seconds before the next token, and **no error fired**. No exception, no Sentry breadcrumb, nothing.

The cause: `createTool({...})` accepts `background` in its **type signature but silently drops it at runtime**. The field never lands on the `Tool` instance, so the dispatch loop reads `tool.background`, gets `undefined`, and falls back to foreground execution. A typed option that compiles cleanly and then does nothing is the worst kind of bug, because the types tell you it is fine.

The workaround that does persist: declare background execution at the **agent** level, not the tool level.

```typescript
const agent = new Agent({
  // …
  backgroundTasks: { tools: ['web_extract', 'x_search'] }, // this one sticks
})
```

I also added a regression tripwire — a test that probes the private `#backgroundTasks` field — so I find out the moment Mastra fixes the tool-level path and I can delete the workaround. When you ship around a framework bug, leave yourself a sensor that tells you when the workaround is obsolete.

## The v281 crash loop: never call `init()` at runtime

A deploy started crash-looping with `Connection terminated due to connection timeout` inside `@mastra/pg`'s `init`. The cause was the `PostgresStore` constructor **eagerly running `CREATE TABLE` migrations across every composite storage domain on boot** — datasets, experiments, scorers, evals, traces — which falls over under connection pressure exactly when you are scaling up and pool contention is highest.

The fix is a discipline, not a patch. Construct the runtime store with migrations disabled, and run them once from a dedicated script:

```typescript
// runtime: never migrates
const store = new PostgresStore({ connectionString: url, disableInit: true })

// migrate.ts: the ONLY place init() runs
const migrator = new PostgresStore({ id: 'migrate', connectionString: url })
await migrator.init()
```

The rule I now apply everywhere: **storage construction and storage migration are different lifecycle events.** Boot should never run DDL. If your framework makes that the default, override it.

# The whole 1.x line: a version-history field guide

I've run Mastra continuously from `v0.10`, through the `1.0` stable cut, up past `1.31`. My pins have walked `v0.10 → v1.5 → v1.21 → v1.24 → v1.25 → v1.31+`. That means I've eaten every breaking change in the 1.x line in production, usually on a deploy I thought was a routine minor bump. This is the structured companion to "Mastra moves fast": the actual renames, the actual signature changes, and the actual order in which they bite you.

First the floor: **Mastra 1.x requires Node.js 22.13.0+.** If you're on older Node, nothing else here matters until you fix that — the install won't even resolve cleanly. Get the runtime sorted first, then chase the API changes.

![Editorial timeline of a fast-moving version line: a long horizontal spine of release ticks accelerating left to right, several marked as breaking-change notches with rename arrows redirecting old call sites to new ones, a runtime-floor threshold gate near the start, and a descending bar tally of custom code deleted as primitives absorb it — one red notch flags the hardest manual migration.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/version-timeline.jpg)

## The renames and signature changes, in one table

These are the changes that actually broke my builds. Most have a codemod; the one that doesn't — the tool/step `execute` signature — is the one that costs you the most hours.

| Change | Before | After | Codemod | Pain |
|---|---|---|---|---|
| Tool/step `execute` signature | `async ({ asset }) => …` | `async (inputData, ctx) => { const { asset } = inputData }` | ❌ Largely manual | 🔴 ~30 tools in one build |
| RuntimeContext → RequestContext | `RuntimeContext` | `RequestContext` from `@mastra/core/di` | ✅ `v1/runtime-context` | 🟡 Medium |
| Context key config | `runtimeContextKeys` | `requestContextKeys` | ✅ `v1/runtime-context` | 🟡 Medium |
| Workflow registry getter | `getWorkflows()` | `listWorkflows()` | ✅ `v1/mastra-plural-apis` | 🟢 Low |
| Run creation | `createRunAsync()` | `createRun()` | ✅ `v1/workflow-create-run-async` | 🟢 Low |
| Step retry counter | `runCount` | `retryCount` | ✅ (rename rules) | 🟢 Low |
| `getInitData()` return type | `any` | `unknown` | ❌ type-only | 🟡 Medium |

The big one is the **`execute` signature**. In `v0.10` your tools and steps destructured a single argument; as of `v1.5` the contract is `execute: async (inputData, ctx) => { … }` — the input payload and the framework context are now two separate positional arguments. There's no clean codemod, because the rewrite depends on what each tool was pulling out of that first argument. In the reference multi-agent build that was ~30 tools rewritten by hand. It's mechanical, but the kind of mechanical that produces a typo on tool #24.

Everything else has a codemod. Run the lot in one shot, or stage it rule by rule:

```bash
npx @mastra/codemod@latest v1                          # everything
npx @mastra/codemod@latest v1/runtime-context          # RuntimeContext → RequestContext
npx @mastra/codemod@latest v1/mastra-plural-apis        # getWorkflows → listWorkflows
npx @mastra/codemod@latest v1/workflow-create-run-async # createRunAsync → createRun
```

The `RuntimeContext → RequestContext` rename fans out widest, because the context object threads through agents, tools, workflows, and processors; the codemod catches the imports and obvious call sites. `getInitData()` going from `any` to `unknown` won't break at runtime — it lights up your type-checker, which is the good kind of breakage. And remember **AgentNetwork was deprecated at v1.8.0 in favor of supervisors** (covered above): there's no codemod for that architectural shift, so anyone whose system is *built around* `.network()` is facing a second migration on top of the v1 renames. Move to supervisors before you're forced to.

## The ~1,100 lines I got to delete — the upside of the treadmill

The treadmill is real, but here's why I keep climbing it: every version that broke something also shipped a primitive that let me delete code I never wanted to own. The market-intelligence build tallied roughly **1,100 lines of custom code** that Mastra primitives replaced outright:

| What I deleted | LOC | What replaced it |
|---|---|---|
| Hand-rolled file memory store (JSON on disk) | 435 | Observational + working memory + semantic recall |
| Custom multi-agent router (`Promise.all()` + LLM routing) | 304 | Supervisor agent with `agents: {}` delegation |
| Custom in-memory trace spans | 102 | `@mastra/observability` auto-tracing, 14 exporters |
| Lazy-loaded eval glue (3 scorers) | 126 | 17+ built-in scorers + live sampling + datasets |

Every one of those lines was code I'd have to test, debug, and carry forever. Trading it for a framework primitive is the deal the treadmill is actually offering: you pay in occasional migrations, you get back the entire surface area you'd otherwise maintain alone. I'll take that trade every release.

## Installed-but-bypassed, and the dead-deps churn

The version history also teaches that *installed* and *used* are different numbers. One audit found **15 Mastra packages installed, 11 actually used** — four were dead weight: a voice package (we called the provider's HTTP API with raw `fetch`), a local-embeddings package (we embed with OpenAI in that service), an agent-filesystem package (no imports anywhere), and the JS client (the web app talks to our own API). This matters on the treadmill because every installed `@mastra/*` package is another version pin to bump, another changelog to read, another potential breaking change to chase across a package you aren't even using. My rule now: if a package isn't imported, it gets removed in the next cleanup pass. Keep the installed set equal to the used set and your upgrade days get materially shorter.

One more easy mistake in the same spirit: **register every agent in the `Mastra` instance.** In an early audit of the market-intelligence API, only 1 of 16 agents was actually registered. The unregistered ones still ran — but they lost Studio visibility, auto-routing, and consistent tracing. If an agent is not in the registry, it does not exist to the parts of Mastra you most want during an incident.

# What shipped while I was writing this: v1.33 → v1.46

I wrote everything above against pins that topped out around `v1.31/v1.32`. By the time this was ready to publish, Mastra had cut **twelve more minor releases** — `v1.33` through `v1.46`, plus alphas into `v1.47` — in roughly forty-five days. That is not hyperbole about a fast framework; it is a release every three or four days, and it is the single best proof of the treadmill thesis I have been making. Rather than pretend the post stops at `v1.32`, here is the operator's read on what actually landed, what it changes about the architecture, and which of my own claims above it dents.

The headline is a repositioning, not a feature: Mastra now calls itself an **agent meta-harness**. It was always "model-agnostic"; the bet now is "harness-agnostic" too. Three throughlines carry that bet.

![Architectural schematic of a harness that runs other harnesses: a central outer harness shell wrapping several interchangeable inner coding-agent engines slotted into a common socket, each swappable through the same uniform interface ring exposing generate, stream, resume, trace, and cost ports, with a signals side-rail and an event pub/sub bus threading along the bottom to feed every slot — one red marker on the uniform swap socket.](/post-images/2026-06-26-mastra-advanced-patterns-in-production/meta-harness.jpg)

## 1. You can now run other people's coding agents inside Mastra

As of `v1.38`, Mastra can run **Claude Code, Cursor, Codex, and the OpenAI Agents SDK** as first-class subagents. You install the harness package and wrap it:

```bash
npm install @mastra/claude @anthropic-ai/claude-agent-sdk   # requires @mastra/core@1.38.0+
```

```ts

export const claudeSDKAgent = new ClaudeSDKAgent({
  id: "claude-sdk-agent",
  name: "Claude SDK Agent",
  description: "Anthropic's Claude Agent SDK with file editing, shell, and codebase reasoning.",
  sdkOptions: { model: "claude-opus-4-8", cwd: process.cwd() },
});
```

The reason this matters past the novelty: those subagents **inherit Mastra's entire agent surface**. Same `.generate()` / `.stream()` / `.resumeGenerate()` / `.resumeStream()` calls, same per-run `requestContext`, `structuredOutput`, and `abortSignal`, same composition into workflow steps and supervisor delegation, same Studio traces with token usage and model cost. You can hand off from a Claude Code subagent to a deterministic workflow and back, write evals against it, and swap Cursor for Codex without touching the calling code. The harness you bring keeps its own file handling and permission controls; Mastra wraps a uniform skin around it. This is the part of the post I'd most flag as *changed since I drafted it*: when I wrote "Mastra is the harness" in the closing section, that was true. It is now more accurate to say Mastra is becoming a harness *of* harnesses.

## 2. Signals: a new cross-cutting primitive I did not have when I started

Between `v1.39` and `v1.42` Mastra grew a whole **Signals** framework, and it is the most consequential *architectural* addition in this window — more than any single integration. The mental model: signals are prompt-cacheable, memory-backed context that you wire onto an agent declaratively, delivered via webhooks, polling, or subscriptions through a `SignalProvider`.

The headline `SignalProvider` is **Task Lists** (`v1.42`), and it is exactly the planning primitive I used to hand-roll. You add it to an agent's `signals` array and it hands the model a checkable plan with four tools — `task_write`, `task_update`, `task_complete`, `task_check` — and states of `pending` / `in_progress` / `completed`:

```ts

export const chefAgent = new Agent({
  id: "chef-michael",
  name: "Chef Michael",
  instructions: "You are a research-driven recipe assistant…",
  model: "anthropic/claude-opus-4-8",
  tools: { webResearchTool },
  memory: new Memory(),
  signals: [new TaskSignalProvider()],   // requires @mastra/core@1.42.0+
});
```

Because the updates ride on **prompt-cacheable state signals** persisted by the agent's `Memory`, they compose with all three memory modes I covered earlier — message history, working memory, and Observational Memory — instead of being a bolted-on side channel. You read `task_` chunks off the stream to render progress as the agent checks items off. If you've been building agent to-do tracking by stuffing JSON into working memory by hand (I was), this deletes that code the same way the primitives in my "1,100 lines I got to delete" table did.

The same framework also added **notification signals with a persisted, thread-scoped inbox** (`v1.39`, still experimental) — durable notifications that survive across storage backends, which is the kind of thing I'd otherwise glue together with a side table.

## 3. A built-in event system — which dents my "no chat gateway" claim

This is the update that most directly revises something I wrote. In the deploy section below I say Mastra "is a runtime for agents, not a scheduler or a chat gateway… it has no cron and no Telegram," and that you bolt external pieces onto it over MCP. The scheduler half still stands. But as of `v1.34` Mastra ships a **built-in pub/sub event system**, and the gap I described is narrower than it was.

```bash
npm install @mastra/redis-streams      # requires @mastra/core@1.34.0+
```

```ts

export const mastra = new Mastra({
  pubsub: new RedisStreamsPubSub({ url: process.env.REDIS_URL! }),
});

await mastra.pubsub.publish("acme.user.signup", {
  type: "signup.created",
  runId: signupFormData.id,
  data: { /* … */ },
});
```

You can publish and subscribe to events — workflow step completion, pause, completion — and **clients that disconnect replay what they missed**, which is the durability property that makes this usable for real delivery rather than fire-and-forget. It is bidirectional: external systems can publish *into* Mastra, and Mastra agents and workflows can subscribe to *external* events. Transports are `UnixSocketPubSub` (single-machine default, one process elected broker), `@mastra/redis-streams`, or Google Cloud Pub/Sub. It is not a Telegram bridge and it is not a cron, so the architecture I describe — Mastra as brain, external body over MCP — is still a sane default. But "Mastra has no event plumbing of its own" is no longer true, and if I were starting the autonomous-CMO build today I'd evaluate this before reaching for an external bus.

## 4. The multi-tenant and enterprise surface got serious

If you run Mastra for more than one customer, this window is the one that matters most. The throughline is **fine-grained authorization (FGA) everywhere** — route-policy coverage and resolver hooks (`v1.35`), any-of permission checks and safer context keys (`v1.36`), WorkOS integration — layered on top of the JWT-is-free / RBAC-is-Enterprise split I describe in the auth section. Three additions stood out to me as an operator:

- **Trusted "system actor" execution** (`v1.42`) solves the problem I've cursed at most: a cron job or queue worker has no JWT, so it can't satisfy tenant-scoped authz. You can now run background work as a trusted `actor` across `workflow.execute()`, `tool.execute()`, and `agent.generate()` while *preserving* the per-tenant authorization checks. That's the missing piece for any scheduled multi-tenant workload.
- **Per-request Workspace sandboxes** (`v1.41`) give you real isolation between tenants, with a `sandboxCacheKey` so a background process can keep its continuity across requests.
- **A multi-session Harness** (`v1.46`) — and this is a **hard breaking change** if you're on the Harness API. The singleton `harness.session` is gone; you now `await harness.createSession()` (get-or-create by `resourceId`) and operate on the returned `Session`. Run control, event subscription (`session.subscribe()`), thread lifecycle, model/mode switching, permissions, OM accessors, and state all moved off `Harness` and onto `Session`. The `getState()` / `setState()` compatibility wrappers are removed. There's no codemod for this; it's an architectural move toward session-scoped isolation, and if your code drives the Harness directly, budget for it the way you'd budget for the `.network()` → supervisor migration.

A related `v1.42` rename in the same Harness family: `respondToQuestion(...)` became `respondToToolSuspension(...)`, the `ask_question` event became `tool_suspended` (read `event.suspendPayload`), and the plan-approval APIs collapsed into the same suspension path. Interactive tools are now agent-agnostic via native tool suspension — `ask_user` and `submit_plan` are the canonical examples.

## 5. The boring-but-important: storage, providers, streaming

The rest is the steady drip that makes the treadmill what it is. New first-party storage adapters: **MySQL** (`@mastra/mysql`, `v1.38`), **Google Cloud Spanner** (`@mastra/spanner@1.0.0`, `v1.37`), and **Aurora DSQL** (`v1.33`) — which is worth knowing if my earlier "pick storage with the deploy target in mind" rule sent you hunting. New providers and integrations: the **OpenAI Agents SDK** and **VoyageAI** embeddings/reranking (`v1.42`), **ACP coding agents** as tools or subagents (`@mastra/acp`, `v1.34`), realtime **xAI voice** (`v1.34`), and Bright Data tools (`v1.33`).

On the DX side, the one I'd actually use: **`untilIdle`** (`v1.41`) unifies "stream until the agent goes idle" across core, server, and client, and **deprecates the dedicated `*UntilIdle` methods** — a small rename now, a removed method later, so adopt the option form when you touch that code. Observability kept filling in too: `MODEL_INFERENCE` spans, eval-score unification, and OTEL logs (`v1.33`), a lightweight trace-listing API (`v1.34`), and end-to-end client-side tool observability (`v1.37`).

## What this window does to my conclusions

Nothing above changes the core verdict — if anything it sharpens it. The release cadence I described as "weekly" undersold it; twelve minors in forty-five days is the treadmill at full tilt, with two genuine architectural migrations (`v1.46` Harness→Session, `v1.42` suspension rename) on top of the v1 renames. But the same forty-five days shipped a planning primitive, an event bus, a meta-harness, and a system-actor model that I would otherwise be building and maintaining myself. That is the exact trade I argued for: you pay in migrations, you get back surface area you'd own alone. The one claim I'd now write differently is the "harness" framing — Mastra is no longer just *the* harness, it's a harness that runs other harnesses — and the "no event plumbing" implication in the deploy section, which the built-in pub/sub has quietly closed.

# The deploy targets are not interchangeable

"Built-in deployers" reads like portability. It is not, quite. The deployers make real, divergent assumptions, and the one that bit me is storage: **the Vercel and Netlify deployers actively error if they detect `@mastra/libsql`** — it is incompatible with their serverless runtimes, so a stack that works locally on LibSQL will refuse to build for Vercel. The Cloudflare deployer goes further and stubs out TypeScript, `execa`, and `readable-stream`, because Workers cannot run them. Only the Mastra Cloud deployer auto-configures storage and log transport for you.

The practical consequence: **pick your storage backend with the deploy target in mind, not the other way around.** LibSQL for local dev is lovely; if production is Vercel, you are on Postgres or a hosted Turso endpoint, and you want to know that on day one, not the first time CI fails. This is also why the Postgres `disableInit` discipline from earlier matters — on serverless targets, a constructor that runs DDL on every cold start is not a one-time boot cost, it is a recurring one.

And know what Mastra is *not*: it is a runtime for agents, not a scheduler or a chat gateway. It has no cron and no Telegram. (One caveat I added after drafting this: as of `v1.34` it *does* ship a built-in pub/sub event system with disconnect-replay — covered in the v1.33→v1.46 section above — so the "no event plumbing" half of this is no longer true, even though the scheduler and chat-delivery gaps remain.) In the autonomous-CMO product, Mastra is the brain and we bolt on external pieces for the body — a workflow/cron engine for scheduling, a thin bridge for chat delivery, connected back to Mastra over MCP. That is a perfectly good architecture, but if you expected "agent framework" to include "runs my agent every morning and messages the user," budget for the parts Mastra deliberately leaves out.

# What everyone else is saying — the sentiment, not the spec

Everything above is one operator's view from six codebases. To pressure-test it, I went back through the public record around the `1.0` launch — the [Show HN thread](https://news.ycombinator.com/item?id=46693959), where the team (Sam, Shane, and Abhi) fielded questions in the open, plus the reviews and migration write-ups that have piled up since. The striking thing is how cleanly the outside chatter maps onto the exact seams I hit. The praise is real, the criticism is sharper than the marketing copy admits, and the two together are a better buy-or-skip signal than either alone.

**The praise is specific, and it is about DX, not magic.** The recurring word is "framework," used as a compliment by people who have built the alternative. [@esperent's mental model](https://news.ycombinator.com/item?id=46699323) is the one I'd hand a newcomer: "Vercel AI SDK = library, low level / Mastra = framework," with AI Elements as the optional UI on top — a clean three-layer picture that matches how the registry actually composes. The migration crowd is even blunter about the math: a developer who [switched from LangGraph](https://dev.to/jim_l_efc70c3a738e9f4baa7/i-switched-from-langgraph-to-mastra-for-my-typescript-agents-18-hours-vs-41-nah) logged "18 hours total" against "41 hours" of Python-to-TypeScript bridge code for the same agent, and the independent reviews land in the same place — [Developers Digest](https://www.developersdigest.tech/blog/mastra-review-setup-2026) and [MakerStack](https://makerstack.co/reviews/mastra-review/) both rate it in the 8-plus range and both single out the local playground as the thing that makes it stick. That tracks with my own experience: the studio is the feature that turns "I'll try it this weekend" into "this is in production now."

**The most useful praise is the kind that comes with a wart attached.** The comment I trust most in the whole thread is [@dataviz1000's](https://news.ycombinator.com/item?id=46699990): "I worked with Mastra for three months and it is awesome... Otherwise, Mastra is best in class for working with TypeScript" — but sandwiched in the middle, "it felt clunky working with workflows and branching logic with non-LLM agents," to the point that "after a couple weeks of frustration, I started using my own custom branching workflows." That is the same friction I documented in the workflows section, surfaced independently by someone who shipped for a quarter. The deterministic-first instinct behind it is a theme of its own: [@brap put the principle plainly](https://news.ycombinator.com/item?id=46702391) — "every step that can be solved reasonably without an LLM, should be solved without an LLM. Reliability, cost, performance" — which is exactly why I gate on required inputs and absorb tool errors instead of letting the model improvise around them.

**The criticism worth taking seriously is strategic, not cosmetic.** Two threads recur. The first is lock-in, reframed: [@mrcwinn's](https://news.ycombinator.com/item?id=46697582) "you're not locked into a model, but you likely are locked in to a platform. This DX and convenience just shifts within the stack where the lock-in occurs." He is right, and the deploy-target section above is the receipt — your storage backend and your deployer are coupled, and "built-in deployers" is not the portability it sounds like. The second is the existential one, and it came as a question nobody fully answered: since Vercel's AI SDK is "moving towards being a framework as well," [@esperent pressed](https://news.ycombinator.com/item?id=46701999) on how Mastra differentiates "if Vercel makes moves to eat your lunch... following their highly successful Next.js playbook." When the founder's reply read as boilerplate, esperent called it "corporate and wishy washy... I guess you just came here to do marketing" — and the friction in that exchange is itself a data point about how young and unsettled this category is. [@orliesaurus](https://news.ycombinator.com/item?id=46701008) compressed the same worry to seven words: "the framework is great, but how are you gonna make real money?"

**And then the small, human note that says more than any benchmark.** A self-described happy user, [@avaer](https://news.ycombinator.com/item?id=46701640), offered "heartfelt advice for the Mastra devrel team": "shut up about Gatsby... linking it to an unrelated project is only going to matter to non-technical CXOs who choose technology based on names not merits." It's a throwaway line, but it captures the actual adoption dynamic better than the press: good dev tools "trickle from the bottom up in engineering organizations," and the names on the homepage — Brex's CTO [endorsed the stack on Latent Space](https://www.latent.space/p/brex), and the team cites Workday, PayPal, and Sanity shipping on it — matter less than whether the n-th engineer on your team can get an agent running in an afternoon. On that test, the sentiment is near-unanimous: they can. The open questions are all downstream — platform gravity, the Vercel overhang, and the revenue model under a framework you're about to bet a product on. None of that shows up in the API surface. All of it should be in your decision.

# When I reach for it — and when I do not

I reach for Mastra when the stack is TypeScript and the thing is a real product: multiple agents, memory that has to survive restarts, MCP in or out, guardrails on actions that matter, traces I can replay per customer. That is most of what I build, which is why it is in six of my repos.

I do **not** reach for it when the stack is Python — LangGraph and CrewAI live there and the comparison is pointless — or when the task is a single model call with no memory and no tools, where the Vercel AI SDK that Mastra sits on top of is all you need. Mastra is the harness; if you do not need a harness, do not install one.

The honest summary: this is a young, fast-moving framework with a wide surface and a real upgrade tax, run by a team (ex-Gatsby) that ships relentlessly and has the funding to keep doing it. The bugs I hit were real and several were silent — a schema layer that 400s on OpenAI strict mode, a default temperature that breaks newer models, a dropped `background` flag, a migration that crash-loops on boot. I patched compiled `node_modules` to ship one of them. None of that is comfortable, and I am not going to pretend it is.

But the patterns above — working memory, supervisors with real delegation controls, MCP both ways, processor guardrails, fallback chains, per-tenant tracing — are the ones I would otherwise have to build and maintain myself, and I have now watched Mastra ship past my hand-rolled versions of every one of them. The bugs are the cost of living on a framework that is still being written underneath you; the leverage is that you stop writing the harness by hand. For a TypeScript team building agents that have to stay up, with eyes open about the rough edges, that trade is worth making.

## Sources

- [GitHub — mastra-ai/mastra](https://github.com/mastra-ai/mastra)
- [Mastra — Documentation](https://mastra.ai/docs)
- [Mastra — Memory overview](https://mastra.ai/docs/memory/overview)
- [Mastra — Agent Networks & supervisors](https://mastra.ai/docs/agents/agent-networks)
- [Mastra — MCP overview](https://mastra.ai/docs/mcp/overview)
- [Mastra — Processors (guardrails)](https://mastra.ai/docs/agents/processors)
- [Mastra — Observability](https://mastra.ai/docs/observability/overview)
- [Mastra — Workflows overview](https://mastra.ai/docs/workflows/overview)
- [Mastra — RAG overview](https://mastra.ai/docs/rag/overview)
- [Mastra — Scorers & evals](https://mastra.ai/docs/evals/overview)
- [Mastra — Server & deployment](https://mastra.ai/docs/deployment/server)
- [Mastra — Auth](https://mastra.ai/docs/auth/overview)
- [Mastra — v1 codemods](https://mastra.ai/docs/getting-started/upgrading)
- [Model Context Protocol — specification](https://modelcontextprotocol.io)
- [LongMemEval (ICLR 2025)](https://arxiv.org/abs/2410.10813)
- [Viv Trivedy — The Anatomy of an Agent Harness (LangChain)](https://blog.langchain.com/the-anatomy-of-an-agent-harness)
- [Mastra — Claude Code, Cursor, and Codex agents (SDK subagents)](https://mastra.ai/blog/introducing-sdk-subagents)
- [Mastra — Task Lists for Agents](https://mastra.ai/blog/introducing-task-lists)
- [Mastra — Built-In Event System (pub/sub)](https://mastra.ai/blog/introducing-mastras-event-system)
- [Mastra — Signals & state signals](https://mastra.ai/docs/agents/signals)
- [Mastra — pub/sub server reference](https://mastra.ai/docs/server/pubsub)
- [Show HN: Mastra 1.0 — launch thread (Sam, Shane, Abhi + community Q&A)](https://news.ycombinator.com/item?id=46693959)
- [@dataviz1000 on HN — 'three months and it is awesome' + the non-LLM workflow clunk](https://news.ycombinator.com/item?id=46699990)
- [@mrcwinn on HN — 'not locked into a model, but locked into a platform'](https://news.ycombinator.com/item?id=46697582)
- [@esperent on HN — the Vercel AI SDK 'eat your lunch' challenge](https://news.ycombinator.com/item?id=46701999)
- [@orliesaurus on HN — 'the framework is great, but how are you gonna make real money?'](https://news.ycombinator.com/item?id=46701008)
- [@avaer on HN — 'shut up about Gatsby' (bottom-up adoption)](https://news.ycombinator.com/item?id=46701640)
- [Latent Space — Brex CTO on Mastra in their AI engineering stack](https://www.latent.space/p/brex)
- [DEV — I switched from LangGraph to Mastra: 18 hours vs 41](https://dev.to/jim_l_efc70c3a738e9f4baa7/i-switched-from-langgraph-to-mastra-for-my-typescript-agents-18-hours-vs-41-nah)
- [Developers Digest — Mastra review & setup guide (2026)](https://www.developersdigest.tech/blog/mastra-review-setup-2026)
- [MakerStack — Mastra review (8.4/10)](https://makerstack.co/reviews/mastra-review/)

---

Canonical: https://www.thedeepfeed.ai/posts/2026-06-26-mastra-advanced-patterns-in-production/
Site: https://www.thedeepfeed.ai
Full corpus: https://www.thedeepfeed.ai/llms-full.txt