Error handling, what to surface, what to swallow
isError responses that let the LLM recover, never leak stack traces, never 500 on user-controllable input. The error-shape contract that keeps your support inbox empty.
Error handling, what to surface, what to swallow
Three error categories matter in an MCP server: input the LLM got wrong, the world's broken (DB down, third-party API rate-limited), and your bug. Each needs a different response. Get this wrong and the LLM loops, your logs fill with stack traces, or your support inbox fills with "it just doesn't work." Get it right and most "errors" become recoverable mid-conversation.
Schritt 1: The MCP error response shape
The MCP protocol gives you isError: true on the tool response. Use it. Don't throw, the LLM sees thrown errors as "the tool exploded" and stops trying.
// Anti-pattern: throw
async function handleGetNote(args, ctx) {
if (!args.id) throw new Error('id required'); // LLM sees protocol error
}
// Pattern: structured error response
async function handleGetNote(args, ctx) {
if (!args.id) {
return {
isError: true,
content: [{
type: 'text',
text: JSON.stringify({
error: 'INVALID_INPUT',
message: 'id is required',
hint: 'Pass { id: "<uuid>" } as the tool arguments.',
}),
}],
};
}
// ...
}
Three fields, every error response:
error, a machine-readable code (INVALID_INPUT,NOT_FOUND,RATE_LIMITED,INTERNAL). Stable across versions.message, a human-ish sentence the LLM can paraphrase to the user.hint, what the LLM should do next. This is the field that turns "fail" into "recover".
Schritt 2: Category 1. User input the LLM got wrong
Most errors. The LLM passed a bad email format, a non-existent ID, an unsupported enum value. Recoverable in one extra turn.
function inputError(message: string, hint?: string) {
return {
isError: true,
content: [{
type: 'text',
text: JSON.stringify({ error: 'INVALID_INPUT', message, hint }),
}],
};
}
// Usage:
const parsed = NoteInput.safeParse(args);
if (!parsed.success) {
return inputError(
'Input validation failed',
parsed.error.issues.map((i) => `field "${i.path.join('.')}", ${i.message}`).join('; '),
);
}
The LLM reads hint, fixes the call, retries. Done in one extra turn. No human ticket.
Schritt 3: Category 2. Resource not found
The user asked about something that doesn't exist. Don't 500 it, that signals "your tool is broken" to both the LLM and the user.
const r = await db.query(`SELECT * FROM notes WHERE id = $1 AND tenant_id = $2`, [id, ctx.tenantId]);
if (r.rows.length === 0) {
return {
isError: true,
content: [{ type: 'text', text: JSON.stringify({
error: 'NOT_FOUND',
message: `No note with id "${id}"`,
hint: 'Did you mean to call list_notes to find existing notes?',
})}],
};
}
The hint points at a sibling tool. The LLM either calls list_notes itself or asks the user "I don't see that note, want me to list yours?". Smooth recovery.
Schritt 4: Category 3. World is broken
DB connection refused, third-party API down, rate-limited. Do log this with full context. Do not leak the stack trace to the LLM, it will paraphrase it to the user as "the system threw a connection error at line 42 of pg-pool.ts". Bad UX.
import { logger } from '../lib/logger.js';
try {
const r = await db.query(/* ... */);
return success(r.rows);
} catch (err) {
logger.error('list_notes db error', {
tenantId: ctx.tenantId,
err: err instanceof Error ? err.message : String(err),
stack: err instanceof Error ? err.stack : undefined,
});
return {
isError: true,
content: [{ type: 'text', text: JSON.stringify({
error: 'TEMPORARY',
message: 'A backend service is temporarily unavailable. Please try again in a moment.',
// No hint, the LLM should not retry immediately, it should tell the user.
})}],
};
}
The user sees a clean "try again in a moment" message. Your logs have the full context to debug. Stack trace stays on your side of the wire.
Schritt 5: Category 4. Rate limits
Special case of "world is broken", but recoverable with a delay. Tell the LLM how long.
if (!rateLimitOk(ctx.tenantId)) {
return {
isError: true,
content: [{ type: 'text', text: JSON.stringify({
error: 'RATE_LIMITED',
message: 'Too many requests. Limit is 60/minute per tenant.',
hint: 'Retry after 30 seconds, or upgrade to remove the limit.',
retryAfterSec: 30,
})}],
};
}
Some clients respect retryAfterSec programmatically. Even if not, the LLM reads the hint and waits or escalates to upgrade.
Schritt 6: Never leak
Three things never go in an MCP error response:
- Stack traces, leaks file paths, function names, dependency versions. Useful to attackers, useless to the LLM.
- Database error strings. Postgres errors include schema and column names ("ERROR: column "secret_field" does not exist"). Schema disclosure.
- Internal IDs. Stripe customer IDs, internal user UUIDs from other tenants. Use opaque error codes.
Filter at the boundary:
function safeError(err: unknown): string {
// Only return generic categories to the LLM
if (err instanceof Error) {
if (err.message.includes('ECONNREFUSED')) return 'A backend service is unavailable.';
if (err.message.includes('timeout')) return 'The request timed out.';
}
return 'An unexpected error occurred.';
}
The full err.message + err.stack go to your logger only.
cat package.json 2>/dev/null | python3 -c "import json,sys; p=json.load(sys.stdin); deps=list((p.get(\"dependencies\") or {}).keys()); print(\"sdk:\", \"@modelcontextprotocol/sdk\" in deps); print(\"bin:\", bool(p.get(\"bin\"))); print(\"main:\", bool(p.get(\"main\")))" 2>/dev/null || echo "no package.json in cwd"Schritt 7: Verify
Run academy_validate_step. The validator checks package.json is wired. The actual error-handling shape is something you verify in tests:
it('returns INVALID_INPUT for missing field', async () => {
const r = await handleGetNote({}, { tenantId: 't1' });
expect(r.isError).toBe(true);
const body = JSON.parse(r.content[0].text);
expect(body.error).toBe('INVALID_INPUT');
expect(body.hint).toBeDefined();
});
it('does not leak stack traces on db error', async () => {
// Mock a db that throws
const r = await handleGetNote({ id: 'x' }, { tenantId: 't1', db: { query: () => { throw new Error('PG: schema "secrets" not found'); }}});
expect(r.isError).toBe(true);
const body = JSON.parse(r.content[0].text);
expect(body.message).not.toContain('schema');
expect(body.message).not.toContain('PG:');
});
Both should pass. Add similar tests for every error path.
A separate trap: outputSchema and Claude Code
The MCP 2025-11-25 spec adds an outputSchema field on tool definitions to declare the shape of successful responses. As of Q2 2026, the Claude Code client has a bug (tracked as anthropics/claude-code#25081, confirm against the latest issue list before relying on it) where tools that declare outputSchema crash the client on first call.
Until the bug is fixed in your installed Claude Code version: don't add outputSchema to your tool definitions. Stick to the input-schema-only shape from 6.2. Cursor and Codex handle outputSchema fine; the issue is Claude Code-specific.
When you do enable it, also start emitting structuredContent in your responses, that's the runtime side of the same feature.
Common traps
- Throwing instead of returning
isError, kills LLM recovery, logs as an unhandled rejection. - Leaking stack traces in
message, schema disclosure + bad UX. error: "Error"with no code, useless for branching logic on the client side.- No
hint. LLM has nothing to act on, just apologizes and stops. - Returning
isError: trueAND a 500 HTTP status, pick one. Tool errors are app-level (isError: true, HTTP 200). HTTP 5xx is for transport-level failures (Bearer auth missing, malformed JSON-RPC). - Catching
errthen re-throwing without logging, you lose all forensic info.
What good looks like
Every tool handler has a typed error return. Four error categories: INVALID_INPUT (LLM fixable), NOT_FOUND (user asked for missing thing), TEMPORARY (world is broken), RATE_LIMITED (recoverable with delay). Stack traces never reach the wire. Tests cover at least one error path per tool.
When the LLM hits an error, it either fixes the input and retries (INVALID_INPUT), or tells the user something specific and useful (NOT_FOUND, TEMPORARY, RATE_LIMITED). Never "the tool exploded, sorry."