MCPAI DevelopmentProtocolFastMCPAI AgentsLangChain

How MCP actually works—and why FastMCP is the easiest way to use it

Breaking down how the Model Context Protocol works, why it's structured the way it is, and why FastMCP is the best way to implement it in practice.

Ryan Brandt

AI Agent Consultant & Founder of Vunda AI

July 3, 2025•9 min read

I've been spending a lot of time lately building agentic workflows, and a big part of that involves letting LLMs interact cleanly with external code. That's exactly what the Model Context Protocol (MCP) is designed for.

Disclosure: I love the em dash (—) and refuse to give it up just because AI also spits it up.

But before we dive deeper, let's start at the beginning:

First, What Exactly is MCP?

MCP is just a standardized protocol for letting Large Language Model (LLM) agents discover, describe, and invoke external tools (functions, validators, APIs, etc.) in a structured, reliable way.

Think of it like a handshake or formal agreement between your LLM and the external functions it can call. Under the hood, MCP is just using JSON-RPC 2.0, meaning each request looks like:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "validate_script",
    "arguments": {"code": "print('hello')"}
  }
}

Wait, wait, wait.

This just seems exactly like an API call, but with extra steps.

Exactly. That's the point.

MCP is fundamentally a very simple idea—just an API call wrapped inside a standardized JSON-RPC structure, plus a couple extra steps to negotiate capabilities, maintain context (statelessly), and discover available functions dynamically.

It really boils down to solving two problems that traditional APIs don't handle well:

Discoverability: Traditional APIs require upfront documentation. MCP allows your LLM to ask, "Hey, what tools are actually available right now?" at runtime. You don't have to write and maintain separate docs or schemas.

Stateless multi-step communication: HTTP itself is stateless. But interactions between LLMs and your backend tools often span multiple steps: initialize, discover tools, call tools, handle streaming outputs. MCP neatly solves this by providing a simple header-based session system (Mcp-Session-Id) that works anywhere—even behind serverless setups, load balancers, or proxies.

All the hype around MCP has focused on what it could unlock—things like agents dynamically discovering and chaining tools. But underneath that hype, MCP itself is actually really simple: it's literally just JSON-RPC calls, explicit session negotiation, and automatic discoverability.

FastMCP: the simplest way to implement MCP

FastMCP is a Python framework created by Jeremiah Lowin that makes exposing MCP endpoints effortless. You literally just decorate your existing Python functions:

@mcp.tool
def add(a: int, b: int) -> int:
    return a + b

Boom. That's now discoverable, callable, and fully schema-described via MCP. No extra YAML, no OpenAPI specs, no manual JSON Schema boilerplate.

But to understand why FastMCP is good, let's go a bit deeper on how MCP itself works.

The MCP handshake explained

Every MCP session starts with a handshake. It goes like this:

initialize message: The client (e.g., an LLM agent) sends an initialization call to the MCP server. It declares its protocol version and capabilities.
Server replies with capabilities and a session ID: The server acknowledges initialization and provides a unique mcp-session-id. This ID will tie future calls to this specific conversation.
notifications/initialized: The client tells the server, "Yep, got it, I'm ready." Without this step, the server won't respond to further requests.

This three-step dance might seem redundant, but there's a good reason: it explicitly negotiates capabilities and versions upfront. This means if a client and server are incompatible, they'll fail gracefully early instead of mid-operation.

Why a session ID (and why not cookies?)

HTTP is stateless by nature, but MCP needs context—when you list tools and invoke them, you want the server to remember which session you're talking about.

Cookies could do this—but they're brittle. They often break with serverless architectures, API Gateways, or behind reverse proxies.

Instead, MCP uses a simple header:

Mcp-Session-Id: <your-session-id>

It's clean, explicit, and stateless. Easy to forward through Lambdas, proxies, or even cached layers. Miss this header, and the server rejects your request immediately (Bad Request: Missing session ID).

Why JSON-RPC over REST?

RESTful APIs work well when you're managing resources. But tools feel more like RPC (Remote Procedure Calls)—they take structured inputs, perform an action, and return structured results.

JSON-RPC was designed for exactly this use case. It allows for small, clearly structured messages and easy batching or streaming.

This choice lets MCP keep its messages compact, readable, and directly focused on tool invocation.

Why Server-Sent Events (SSE)?

MCP supports streaming responses. For example, you might have a long-running validation tool or an LLM generating tokens in real-time. WebSockets could work for this—but they're heavy-handed, requiring additional handshakes, dedicated infra, and are often blocked by corporate firewalls or API Gateways.

SSE is simpler:

One-directional server push
No handshake overhead
Native browser and curl support
Easy to debug (curl -N ...)

MCP leverages SSE as the default streaming transport, making it easy to implement token-by-token streaming of results. If your client says Accept: application/json, text/event-stream, you'll automatically get streamed results.

Why the `Accept` header matters

FastMCP checks for the Accept header explicitly:

Accept: application/json, text/event-stream

If you miss this header, you get Not Acceptable: Client must accept both application/json and text/event-stream. Annoying at first glance—but intentional.

This check prevents subtle bugs where your client hangs indefinitely because it didn't realize the server was streaming responses instead of returning static JSON.

Why you can't send multiple JSON objects in one HTTP request (by default)

This is something that tripped me up initially.

Content-Type: application/json explicitly means exactly one JSON object per request. If you send multiple JSON envelopes without explicit delimiters, MCP returns a parse error (Parse error: Extra data).

The correct way to stream multiple calls over a single connection is to explicitly switch to ND-JSON format:

Content-Type: application/x-ndjson

Then each JSON object is separated by newlines, and MCP treats them as distinct messages.

Minimal example (using FastMCP)

Here's a super-simple implementation that shows how clean FastMCP is:

Server (`server.py`):

from fastmcp import FastMCP

mcp = FastMCP("DemoAgent")

@mcp.tool
def multiply(a: int, b: int) -> int:
    return a * b

mcp.run(host="0.0.0.0", port=8000, path="/mcp/")

Client (using curl):

# initialize and capture session ID
SESSION=$(curl -sD - \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{"jsonrpc":"2.0","id":0,"method":"initialize",
       "params":{"protocolVersion":"2025-06-18",
                 "capabilities":{"tools":{}},
                 "clientInfo":{"name":"curl","version":"1"}}}' \
  http://localhost:8000/mcp/ |
  grep -i mcp-session-id | awk '{print $2}' | tr -d '\r')

# send initialized notification
curl -s \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -H "Mcp-Session-Id: $SESSION" \
  -d '{"jsonrpc":"2.0","method":"notifications/initialized"}' \
  http://localhost:8000/mcp/

# call the multiply tool
curl -s \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -H "Mcp-Session-Id: $SESSION" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
       "params":{"name":"multiply","arguments":{"a":4,"b":5}}}' \
  http://localhost:8000/mcp/

Result:

event: message
data: {"jsonrpc":"2.0","id":1,"result":20}

Clean and simple.

Using the FastMCP Client (deterministic, code‑first access)

While curl is great for demonstrating how MCP works at a bare-bones level, FastMCP includes a Client class that lets you talk to any MCP server from Python without writing JSON‑RPC plumbing. It handles protocol details, transport selection, and connection lifecycle. (FastMCP)

When to use it

Testing servers locally or in CI
Writing deterministic scripts or services that call MCP tools/resources/prompts directly
Building the base layer for a higher‑level agent or UI

This client is not agentic. You call functions explicitly and stay in full control. (FastMCP)

Minimal example

import asyncio
from fastmcp import Client, FastMCP

# 1. Spin up an in-memory server (great for tests)
server = FastMCP("TestServer")

# 2. Point a client at it
client = Client(server)

async def main():
    async with client:                      # manages connect/close
        await client.ping()                 # sanity check
        tools = await client.list_tools()   # discover ops
        result = await client.call_tool("multiply", {"a": 5, "b": 3})
        print(result.data)                  # -> 15

asyncio.run(main())

All client ops must run inside async with client: so the connection opens and closes correctly. (FastMCP)

Pointing the client at different servers

Client(FastMCP("TestServer"))          # in‑memory (fastest for tests)
Client("./server.py")                  # local Python stdio server
Client("https://example.com/mcp")      # HTTP/SSE server
Client({
    "mcpServers": {
        "weather": {"url": "https://weather-api.example.com/mcp"},
        "assistant": {"command": "python", "args": ["./assistant_server.py"]}
    }
})

The client infers the transport automatically from what you pass in (FastMCP instance, file path, URL, or config dict). (FastMCP)

Core operations available

Inside the context you can:

await client.list_tools()
await client.call_tool("tool_name", {"arg": "value"})

await client.list_resources()
await client.read_resource("file:///config/settings.json")

await client.list_prompts()
msgs = await client.get_prompt("analyze_data", {"data": [1, 2, 3]})

These methods cover tools, resources, and prompts, plus basics like ping(). (FastMCP)

Multi‑server prefixing

When you pass a multi‑server config, tool names and resource URIs are prefixed with the server name (for example, weather_get_forecast). (FastMCP)

Final takeaways

MCP isn't magic. It's just good, practical engineering:

Explicit capability negotiation
Stateless session management via headers
JSON-RPC to structure tool calls
SSE for streaming responses

But MCP alone is verbose to implement. FastMCP makes it effortless, which is why it's my go-to recommendation.

If you're building LLM-driven workflows or exposing agentic tooling, MCP + FastMCP will save you hours of engineering and debugging pain. Check out the FastMCP documentation and Jeremiah Lowin's work to get started. Let me know if you're implementing this—I've spent plenty of time in the trenches and can probably save you some headaches.

Ready to Build Production AI Agents?

Let's discuss how AI agents can transform your business operations

Book a Strategy Call