Recipe-Inhalt ist auf Englisch. Englisches Original lesen →
← Alle Recipes
Phase 6 · Build Your Own MCP Server·6 steps

Testing MCP tools, vitest + in-memory transport

How to write tests for MCP tools without spawning subprocesses. Unit tests for handlers, integration tests via in-memory transport, smoke tests for stdio mode.

6 steps0%
Du liest ohne Account. Mit Login speichern wir Step-Fortschritt + Notes.

Testing MCP tools, vitest + in-memory transport

Tests for MCP servers fall into three layers. Layer 1 is unit tests on your handlers, fast, easy, the bulk of your tests. Layer 2 is integration tests through an in-memory MCP transport, slow enough to be real, fast enough to run in CI. Layer 3 is one smoke test that actually spawns the binary. This recipe covers all three.

Schritt 1: Make your handlers exportable + pure

The handler should not assume the SDK shape:

// src/tools/create-contact.ts
import { z } from 'zod';

export const CreateContactInput = z.object({
  email: z.string().email().toLowerCase(),
  name: z.string().min(1),
});

export interface ToolResponse {
  content: Array<{ type: 'text'; text: string }>;
  isError?: boolean;
}

export async function handleCreateContact(
  args: unknown,
  ctx: { db: Db },
): Promise<ToolResponse> {
  const parsed = CreateContactInput.safeParse(args);
  if (!parsed.success) {
    return {
      isError: true,
      content: [{ type: 'text', text: JSON.stringify({ error: 'INVALID_INPUT' }) }],
    };
  }
  // ... do the work
  return { content: [{ type: 'text', text: JSON.stringify({ id: '...' }) }] };
}

The router just dispatches:

// src/server.ts
case 'crm_create_contact':
  return handleCreateContact(req.params.arguments, { db });

This separation makes the handler trivially testable. No SDK mocks needed.

Schritt 2: Layer 1, unit tests on handlers

// tests/create-contact.test.ts
import { describe, it, expect, vi } from 'vitest';
import { handleCreateContact } from '../src/tools/create-contact.js';

describe('handleCreateContact', () => {
  it('rejects missing email', async () => {
    const r = await handleCreateContact({}, { db: mockDb() });
    expect(r.isError).toBe(true);
    expect(r.content[0].text).toContain('INVALID_INPUT');
  });

  it('normalizes email to lowercase', async () => {
    const db = mockDb();
    await handleCreateContact({ email: '[email protected]', name: 'Foo' }, { db });
    expect(db.lastCall.params[0]).toBe('[email protected]'); // .toLowerCase() applied
  });

  it('is idempotent on email', async () => {
    const db = mockDb();
    const r1 = await handleCreateContact({ email: '[email protected]', name: 'X' }, { db });
    const r2 = await handleCreateContact({ email: '[email protected]', name: 'X' }, { db });
    expect(JSON.parse(r1.content[0].text).id).toBe(JSON.parse(r2.content[0].text).id);
  });
});

function mockDb() {
  const calls: Array<{ sql: string; params: unknown[] }> = [];
  return {
    query: vi.fn(async (sql: string, params: unknown[]) => {
      calls.push({ sql, params });
      return { rows: [{ id: 'mock-id', email: params[0], created: true }] };
    }),
    get lastCall() { return calls[calls.length - 1]; },
  };
}

Run with npx vitest run. These tests are pure, no subprocess, no MCP wire protocol, just function calls.

Schritt 3: Layer 2, integration via in-memory transport

For end-to-end coverage of the MCP protocol (capabilities, tool listing, call dispatch), the SDK ships in-memory transports:

// tests/server.integration.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import { buildServer } from '../src/server.js';

describe('MCP server integration', () => {
  let client: Client;

  beforeAll(async () => {
    const server = buildServer({ db: mockDb() });
    const [serverTransport, clientTransport] = InMemoryTransport.createLinkedPair();
    await server.connect(serverTransport);

    client = new Client({ name: 'test-client', version: '0.0.1' }, { capabilities: {} });
    await client.connect(clientTransport);
  });

  afterAll(async () => {
    await client.close();
  });

  it('lists tools', async () => {
    const r = await client.listTools();
    const names = r.tools.map((t) => t.name);
    expect(names).toContain('crm_create_contact');
  });

  it('calls a tool end-to-end', async () => {
    const r = await client.callTool({
      name: 'crm_create_contact',
      arguments: { email: '[email protected]', name: 'X' },
    });
    expect(r.isError).toBeFalsy();
    const body = JSON.parse(r.content[0].text);
    expect(body.id).toBeDefined();
  });
});

The buildServer factory is your full server minus the transport:

// src/server.ts
export function buildServer(deps: { db: Db }) {
  const server = new Server({ name: 'my-mcp', version: '0.1.0' }, { capabilities: { tools: {} } });
  // ... register tools
  return server;
}

// stdio entry point at the bottom of the file:
if (import.meta.url === `file://${process.argv[1]}`) {
  const server = buildServer({ db: realDb() });
  await server.connect(new StdioServerTransport());
}

Factory + entry guard, same pattern Anthropic uses internally. Integration tests get the factory, production gets the entry.

Schritt 4: Layer 3, one smoke test that spawns

You want one test that actually runs your dist/server.js to catch packaging bugs (missing shebang, wrong bin entry, runtime imports that fail under Node ESM):

// tests/smoke.test.ts
import { describe, it, expect } from 'vitest';
import { spawn } from 'node:child_process';

describe('stdio smoke', () => {
  it('spawns + responds to initialize within 2s', async () => {
    const proc = spawn('node', ['dist/server.js'], { stdio: ['pipe', 'pipe', 'pipe'] });
    const initRequest = JSON.stringify({
      jsonrpc: '2.0', id: 1, method: 'initialize',
      // protocolVersion: use the spec date your installed @modelcontextprotocol/sdk
      // ships with, current as of mid-2026 is "2025-11-25". Older clients still
      // accept "2024-11-05"; the SDK negotiates whichever is supported on both ends.
      params: { protocolVersion: '2025-11-25', capabilities: {}, clientInfo: { name: 't', version: '0' } },
    }) + '\n';
    proc.stdin.write(initRequest);

    const response = await new Promise<string>((resolve, reject) => {
      const t = setTimeout(() => reject(new Error('timeout')), 2000);
      proc.stdout.once('data', (chunk) => { clearTimeout(t); resolve(chunk.toString()); });
    });
    proc.kill();

    expect(JSON.parse(response.split('\n')[0]).result).toBeDefined();
  });
});

This catches:

  • Missing #!/usr/bin/env node shebang
  • Imports that fail at runtime (often missing .js extension under Node ESM)
  • Server hanging instead of responding to initialize
  • Anything you log to stdout that corrupts the wire (the most common bug, see 6.5)

Schritt 5: package.json scripts

{
  "scripts": {
    "build": "tsc",
    "test": "vitest run",
    "test:watch": "vitest",
    "test:smoke": "npm run build && vitest run tests/smoke.test.ts"
  }
}

Smoke after build, regular tests on every change. CI runs npm test && npm run test:smoke.

Schritt 6: Verify

Run academy_validate_step. The validator checks package.json has @modelcontextprotocol/sdk plus a bin or main entry. If you also added a scripts.test field, you're production-ready.

What to test, what to skip

Test: input validation paths (Layer 1), idempotency (Layer 1), tool listing (Layer 2), one happy path per tool (Layer 2), the smoke (Layer 3).

Skip: mocking every Stripe/Supabase response (test against staging instead), perfect coverage chasing (60-70% on critical paths beats 100% on getters), tests that just re-implement the type checker.

The point of MCP tests is to catch regressions before users do. Six well-chosen tests beat sixty trivial ones.

Input validation. Zod patternsLogging pattern, stderr-only,