How MCP actually works—and why FastMCP is the easiest way to use it
Breaking down how the Model Context Protocol works, why it's structured the way it is, and why FastMCP is the best way to implement it in practice.

I've been spending a lot of time lately building agentic workflows, and a big part of that involves letting LLMs interact cleanly with external code. That's exactly what the Model Context Protocol (MCP) is designed for.
Disclosure: I love the em dash (—) and refuse to give it up just because AI also spits it up.
But before we dive deeper, let's start at the beginning:
First, What Exactly is MCP?
MCP is just a standardized protocol for letting Large Language Model (LLM) agents discover, describe, and invoke external tools (functions, validators, APIs, etc.) in a structured, reliable way.
Think of it like a handshake or formal agreement between your LLM and the external functions it can call. Under the hood, MCP is just using JSON-RPC 2.0, meaning each request looks like:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "validate_script",
"arguments": {"code": "print('hello')"}
}
}
Wait, wait, wait.
This just seems exactly like an API call, but with extra steps.
Exactly. That's the point.
MCP is fundamentally a very simple idea—just an API call wrapped inside a standardized JSON-RPC structure, plus a couple extra steps to negotiate capabilities, maintain context (statelessly), and discover available functions dynamically.
It really boils down to solving two problems that traditional APIs don't handle well:
Discoverability: Traditional APIs require upfront documentation. MCP allows your LLM to ask, "Hey, what tools are actually available right now?" at runtime. You don't have to write and maintain separate docs or schemas.
Stateless multi-step communication: HTTP itself is stateless. But interactions between LLMs and your backend tools often span multiple steps: initialize, discover tools, call tools, handle streaming outputs. MCP neatly solves this by providing a simple header-based session system (Mcp-Session-Id) that works anywhere—even behind serverless setups, load balancers, or proxies.
All the hype around MCP has focused on what it could unlock—things like agents dynamically discovering and chaining tools. But underneath that hype, MCP itself is actually really simple: it's literally just JSON-RPC calls, explicit session negotiation, and automatic discoverability.
FastMCP: the simplest way to implement MCP
FastMCP is a Python framework created by Jeremiah Lowin that makes exposing MCP endpoints effortless. You literally just decorate your existing Python functions:
@mcp.tool
def add(a: int, b: int) -> int:
return a + b
Boom. That's now discoverable, callable, and fully schema-described via MCP. No extra YAML, no OpenAPI specs, no manual JSON Schema boilerplate.
But to understand why FastMCP is good, let's go a bit deeper on how MCP itself works.
The MCP handshake explained
Every MCP session starts with a handshake. It goes like this:
-
initialize
message: The client (e.g., an LLM agent) sends an initialization call to the MCP server. It declares its protocol version and capabilities. -
Server replies with capabilities and a session ID: The server acknowledges initialization and provides a unique
mcp-session-id
. This ID will tie future calls to this specific conversation. -
notifications/initialized
: The client tells the server, "Yep, got it, I'm ready." Without this step, the server won't respond to further requests.
This three-step dance might seem redundant, but there's a good reason: it explicitly negotiates capabilities and versions upfront. This means if a client and server are incompatible, they'll fail gracefully early instead of mid-operation.
Why a session ID (and why not cookies?)
HTTP is stateless by nature, but MCP needs context—when you list tools and invoke them, you want the server to remember which session you're talking about.
Cookies could do this—but they're brittle. They often break with serverless architectures, API Gateways, or behind reverse proxies.
Instead, MCP uses a simple header:
Mcp-Session-Id: <your-session-id>
It's clean, explicit, and stateless. Easy to forward through Lambdas, proxies, or even cached layers. Miss this header, and the server rejects your request immediately (Bad Request: Missing session ID
).
Why JSON-RPC over REST?
RESTful APIs work well when you're managing resources. But tools feel more like RPC (Remote Procedure Calls)—they take structured inputs, perform an action, and return structured results.
JSON-RPC was designed for exactly this use case. It allows for small, clearly structured messages and easy batching or streaming.
This choice lets MCP keep its messages compact, readable, and directly focused on tool invocation.
Why Server-Sent Events (SSE)?
MCP supports streaming responses. For example, you might have a long-running validation tool or an LLM generating tokens in real-time. WebSockets could work for this—but they're heavy-handed, requiring additional handshakes, dedicated infra, and are often blocked by corporate firewalls or API Gateways.
SSE is simpler:
- One-directional server push
- No handshake overhead
- Native browser and curl support
- Easy to debug (
curl -N ...
)
MCP leverages SSE as the default streaming transport, making it easy to implement token-by-token streaming of results. If your client says Accept: application/json, text/event-stream
, you'll automatically get streamed results.
Why the Accept
header matters
FastMCP checks for the Accept
header explicitly:
Accept: application/json, text/event-stream
If you miss this header, you get Not Acceptable: Client must accept both application/json and text/event-stream
. Annoying at first glance—but intentional.
This check prevents subtle bugs where your client hangs indefinitely because it didn't realize the server was streaming responses instead of returning static JSON.
Why you can't send multiple JSON objects in one HTTP request (by default)
This is something that tripped me up initially.
Content-Type: application/json
explicitly means exactly one JSON object per request. If you send multiple JSON envelopes without explicit delimiters, MCP returns a parse error (Parse error: Extra data
).
The correct way to stream multiple calls over a single connection is to explicitly switch to ND-JSON format:
Content-Type: application/x-ndjson
Then each JSON object is separated by newlines, and MCP treats them as distinct messages.
Minimal example (using FastMCP)
Here's a super-simple implementation that shows how clean FastMCP is:
Server (server.py
):
from fastmcp import FastMCP
mcp = FastMCP("DemoAgent")
@mcp.tool
def multiply(a: int, b: int) -> int:
return a * b
mcp.run(host="0.0.0.0", port=8000, path="/mcp/")
Client (using curl):
# initialize and capture session ID
SESSION=$(curl -sD - \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":0,"method":"initialize",
"params":{"protocolVersion":"2025-06-18",
"capabilities":{"tools":{}},
"clientInfo":{"name":"curl","version":"1"}}}' \
http://localhost:8000/mcp/ |
grep -i mcp-session-id | awk '{print $2}' | tr -d '\r')
# send initialized notification
curl -s \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-H "Mcp-Session-Id: $SESSION" \
-d '{"jsonrpc":"2.0","method":"notifications/initialized"}' \
http://localhost:8000/mcp/
# call the multiply tool
curl -s \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-H "Mcp-Session-Id: $SESSION" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
"params":{"name":"multiply","arguments":{"a":4,"b":5}}}' \
http://localhost:8000/mcp/
Result:
event: message
data: {"jsonrpc":"2.0","id":1,"result":20}
Clean and simple.
Using the FastMCP Client (deterministic, code‑first access)
While curl is great for demonstrating how MCP works at a bare-bones level, FastMCP includes a Client
class that lets you talk to any MCP server from Python without writing JSON‑RPC plumbing. It handles protocol details, transport selection, and connection lifecycle. (FastMCP)
When to use it
- Testing servers locally or in CI
- Writing deterministic scripts or services that call MCP tools/resources/prompts directly
- Building the base layer for a higher‑level agent or UI
This client is not agentic. You call functions explicitly and stay in full control. (FastMCP)
Minimal example
import asyncio
from fastmcp import Client, FastMCP
# 1. Spin up an in-memory server (great for tests)
server = FastMCP("TestServer")
# 2. Point a client at it
client = Client(server)
async def main():
async with client: # manages connect/close
await client.ping() # sanity check
tools = await client.list_tools() # discover ops
result = await client.call_tool("multiply", {"a": 5, "b": 3})
print(result.data) # -> 15
asyncio.run(main())
All client ops must run inside async with client:
so the connection opens and closes correctly. (FastMCP)
Pointing the client at different servers
Client(FastMCP("TestServer")) # in‑memory (fastest for tests)
Client("./server.py") # local Python stdio server
Client("https://example.com/mcp") # HTTP/SSE server
Client({
"mcpServers": {
"weather": {"url": "https://weather-api.example.com/mcp"},
"assistant": {"command": "python", "args": ["./assistant_server.py"]}
}
})
The client infers the transport automatically from what you pass in (FastMCP instance, file path, URL, or config dict). (FastMCP)
Core operations available
Inside the context you can:
await client.list_tools()
await client.call_tool("tool_name", {"arg": "value"})
await client.list_resources()
await client.read_resource("file:///config/settings.json")
await client.list_prompts()
msgs = await client.get_prompt("analyze_data", {"data": [1, 2, 3]})
These methods cover tools, resources, and prompts, plus basics like ping()
. (FastMCP)
Multi‑server prefixing
When you pass a multi‑server config, tool names and resource URIs are prefixed with the server name (for example, weather_get_forecast
). (FastMCP)
Final takeaways
MCP isn't magic. It's just good, practical engineering:
- Explicit capability negotiation
- Stateless session management via headers
- JSON-RPC to structure tool calls
- SSE for streaming responses
But MCP alone is verbose to implement. FastMCP makes it effortless, which is why it's my go-to recommendation.
If you're building LLM-driven workflows or exposing agentic tooling, MCP + FastMCP will save you hours of engineering and debugging pain. Check out the FastMCP documentation and Jeremiah Lowin's work to get started. Let me know if you're implementing this—I've spent plenty of time in the trenches and can probably save you some headaches.
Ready to Build Production AI Agents?
Let's discuss how AI agents can transform your business operations
Book a Strategy Call