Phase 6 · Build Your Own MCP Server·6 steps

Testing MCP tools, vitest + in-memory transport

How to write tests for MCP tools without spawning subprocesses. Unit tests for handlers, integration tests via in-memory transport, smoke tests for stdio mode.

6 steps0%

Du liest ohne Account. Mit Login speichern wir Step-Fortschritt + Notes.

Testing MCP tools, vitest + in-memory transport

Tests for MCP servers fall into three layers. Layer 1 is unit tests on your handlers, fast, easy, the bulk of your tests. Layer 2 is integration tests through an in-memory MCP transport, slow enough to be real, fast enough to run in CI. Layer 3 is one smoke test that actually spawns the binary. This recipe covers all three.

Schritt 1: Make your handlers exportable + pure

The handler should not assume the SDK shape:

// src/tools/create-contact.ts
import { z } from 'zod';

export const CreateContactInput = z.object({
  email: z.string().email().toLowerCase(),
  name: z.string().min(1),
});

export interface ToolResponse {
  content: Array<{ type: 'text'; text: string }>;
  isError?: boolean;
}

export async function handleCreateContact(
  args: unknown,
  ctx: { db: Db },
): Promise<ToolResponse> {
  const parsed = CreateContactInput.safeParse(args);
  if (!parsed.success) {
    return {
      isError: true,
      content: [{ type: 'text', text: JSON.stringify({ error: 'INVALID_INPUT' }) }],
    };
  }
  // ... do the work
  return { content: [{ type: 'text', text: JSON.stringify({ id: '...' }) }] };
}

The router just dispatches:

// src/server.ts
case 'crm_create_contact':
  return handleCreateContact(req.params.arguments, { db });

This separation makes the handler trivially testable. No SDK mocks needed.

Schritt 2: Layer 1, unit tests on handlers

// tests/create-contact.test.ts
import { describe, it, expect, vi } from 'vitest';
import { handleCreateContact } from '../src/tools/create-contact.js';

describe('handleCreateContact', () => {
  it('rejects missing email', async () => {
    const r = await handleCreateContact({}, { db: mockDb() });
    expect(r.isError).toBe(true);
    expect(r.content[0].text).toContain('INVALID_INPUT');
  });

  it('normalizes email to lowercase', async () => {
    const db = mockDb();
    await handleCreateContact({ email: '[email protected]', name: 'Foo' }, { db });
    expect(db.lastCall.params[0]).toBe('[email protected]'); // .toLowerCase() applied
  });

  it('is idempotent on email', async () => {
    const db = mockDb();
    const r1 = await handleCreateContact({ email: '[email protected]', name: 'X' }, { db });
    const r2 = await handleCreateContact({ email: '[email protected]', name: 'X' }, { db });
    expect(JSON.parse(r1.content[0].text).id).toBe(JSON.parse(r2.content[0].text).id);
  });
});

function mockDb() {
  const calls: Array<{ sql: string; params: unknown[] }> = [];
  return {
    query: vi.fn(async (sql: string, params: unknown[]) => {
      calls.push({ sql, params });
      return { rows: [{ id: 'mock-id', email: params[0], created: true }] };
    }),
    get lastCall() { return calls[calls.length - 1]; },
  };
}

Run with npx vitest run. These tests are pure, no subprocess, no MCP wire protocol, just function calls.

Schritt 3: Layer 2, integration via in-memory transport

For end-to-end coverage of the MCP protocol (capabilities, tool listing, call dispatch), the SDK ships in-memory transports:

// tests/server.integration.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import { buildServer } from '../src/server.js';

describe('MCP server integration', () => {
  let client: Client;

  beforeAll(async () => {
    const server = buildServer({ db: mockDb() });
    const [serverTransport, clientTransport] = InMemoryTransport.createLinkedPair();
    await server.connect(serverTransport);

    client = new Client({ name: 'test-client', version: '0.0.1' }, { capabilities: {} });
    await client.connect(clientTransport);
  });

  afterAll(async () => {
    await client.close();
  });

  it('lists tools', async () => {
    const r = await client.listTools();
    const names = r.tools.map((t) => t.name);
    expect(names).toContain('crm_create_contact');
  });

  it('calls a tool end-to-end', async () => {
    const r = await client.callTool({
      name: 'crm_create_contact',
      arguments: { email: '[email protected]', name: 'X' },
    });
    expect(r.isError).toBeFalsy();
    const body = JSON.parse(r.content[0].text);
    expect(body.id).toBeDefined();
  });
});

The buildServer factory is your full server minus the transport:

// src/server.ts
export function buildServer(deps: { db: Db }) {
  const server = new Server({ name: 'my-mcp', version: '0.1.0' }, { capabilities: { tools: {} } });
  // ... register tools
  return server;
}

// stdio entry point at the bottom of the file:
if (import.meta.url === `file://${process.argv[1]}`) {
  const server = buildServer({ db: realDb() });
  await server.connect(new StdioServerTransport());
}

Factory + entry guard, same pattern Anthropic uses internally. Integration tests get the factory, production gets the entry.

Schritt 4: Layer 3, one smoke test that spawns

You want one test that actually runs your dist/server.js to catch packaging bugs (missing shebang, wrong bin entry, runtime imports that fail under Node ESM):

// tests/smoke.test.ts
import { describe, it, expect } from 'vitest';
import { spawn } from 'node:child_process';

describe('stdio smoke', () => {
  it('spawns + responds to initialize within 2s', async () => {
    const proc = spawn('node', ['dist/server.js'], { stdio: ['pipe', 'pipe', 'pipe'] });
    const initRequest = JSON.stringify({
      jsonrpc: '2.0', id: 1, method: 'initialize',
      // protocolVersion: use the spec date your installed @modelcontextprotocol/sdk
      // ships with, current as of mid-2026 is "2025-11-25". Older clients still
      // accept "2024-11-05"; the SDK negotiates whichever is supported on both ends.
      params: { protocolVersion: '2025-11-25', capabilities: {}, clientInfo: { name: 't', version: '0' } },
    }) + '\n';
    proc.stdin.write(initRequest);

    const response = await new Promise<string>((resolve, reject) => {
      const t = setTimeout(() => reject(new Error('timeout')), 2000);
      proc.stdout.once('data', (chunk) => { clearTimeout(t); resolve(chunk.toString()); });
    });
    proc.kill();

    expect(JSON.parse(response.split('\n')[0]).result).toBeDefined();
  });
});

This catches:

Missing #!/usr/bin/env node shebang
Imports that fail at runtime (often missing .js extension under Node ESM)
Server hanging instead of responding to initialize
Anything you log to stdout that corrupts the wire (the most common bug, see 6.5)

Schritt 5: package.json scripts

{
  "scripts": {
    "build": "tsc",
    "test": "vitest run",
    "test:watch": "vitest",
    "test:smoke": "npm run build && vitest run tests/smoke.test.ts"
  }
}

Smoke after build, regular tests on every change. CI runs npm test && npm run test:smoke.

Schritt 6: Verify

Run academy_validate_step. The validator checks package.json has @modelcontextprotocol/sdk plus a bin or main entry. If you also added a scripts.test field, you're production-ready.

What to test, what to skip

Test: input validation paths (Layer 1), idempotency (Layer 1), tool listing (Layer 2), one happy path per tool (Layer 2), the smoke (Layer 3).

Skip: mocking every Stripe/Supabase response (test against staging instead), perfect coverage chasing (60-70% on critical paths beats 100% on getters), tests that just re-implement the type checker.

The point of MCP tests is to catch regressions before users do. Six well-chosen tests beat sixty trivial ones.

← Input validation. Zod patterns Logging pattern, stderr-only, →