Figure 1. Boundary Protocol Versus Native Runtime
The architecture is intentionally split. MCP remains the outer compatibility contract; ZCP changes the native execution contract inside the same backend.
1. Problem Statement
MCP is an interoperability protocol. Its job is to let tools, resources, prompts, and transports be described in a shared way across hosts and clients. The official Python SDK reflects that goal directly: it builds tool contracts from Python function signatures and serializes them as JSON Schema.
That solves the boundary problem, but it does not solve the model execution problem. The model still pays for every visible tool, every repeated schema field, every prompt-visible result replay, and every loop where runtime state is simulated inside natural language or repeated tool polling.
The right comparison is therefore not 'which protocol can express a tool call?' Both can. The right comparison is 'what does the model need to reason over per turn?' ZCP wins only when the answer to that question becomes smaller.
2. Why ZCP Uses Fewer Tokens
Token cost comes from four recurring sources. First, the model is shown too many tools. Second, each tool is described with too much schema detail relative to the task at hand. Third, large results are replayed into later turns. Fourth, background or long-running state is not held by the runtime, so the model keeps reconstructing it through repeated calls and explanations.
The MCP default path amplifies those four costs because the public tool contract is also the default model-facing contract. `Tool.from_function(...)` creates `parameters = arg_model.model_json_schema(by_alias=True)`, `list_tools()` returns every registered tool with `input_schema` and `output_schema`, and `_handle_call_tool()` turns outputs back into `CallToolResult(content=..., structured_content=...)`.
ZCP reduces those costs by moving policy into the runtime. Tool discovery can be cut down before the first turn. Result values can be represented as `scalar` or as `handle + summary` rather than replaying full payloads. Task state can live in `TaskManager` instead of being re-encoded into prompt-visible loops. The benchmark does not need mystery once that chain is visible in code.
- Fewer visible tools means lower branch factor.
- Smaller registry subsets mean less repeated schema payload.
- Handles keep large artifacts out of subsequent turns.
- Tasks keep long-running state out of the prompt.
3. Canonical Runtime And Context Contract
The decisive ZCP move is architectural: the public MCP-compatible surface is not the native runtime. The native runtime is defined around canonical objects such as `ToolDefinition`, `SessionState`, `CallRequest`, `CallResult`, and `HandleRef` in `src/zcp/canonical_protocol.py`.
Those types store information that matters for model execution but is not central in a schema-first design: `output_mode`, `handle_kind`, `defaults`, `flags`, registry hashes, current tool subset, and live handle references. This is not a naming change. It is a different execution contract.
Because the runtime is canonical first, the same backend can be projected outward in two directions. `/mcp` preserves compatibility. `/zcp` preserves the same business logic but changes discovery, calling discipline, result shape, and state handling. That is why ZCP can keep compatibility without forcing native clients to inherit all of the compatibility surface cost.
4. JSON Schema At The Edge, Not At The Center
ZCP does not literally delete JSON Schema. It still validates arguments and can still compile strict schemas for providers such as OpenAI. The key change is that JSON Schema stops being the primary native planning artifact.
In MCP, schema generation is upstream and central. `func_metadata(...)` builds Pydantic models, `model_json_schema()` becomes the tool contract, and the default `list_tools()` response exposes those schemas directly. In other words, the same rich schema object acts as registration metadata, transport payload, and the model-facing description.
In ZCP, schema becomes one field inside a richer canonical object. `ToolDefinition` still keeps `input_schema`, but the native runtime can reason in terms of tool ids, subsets, handles, and output modes. `OpenAIStrictSchemaCompiler` is then used at the adapter boundary to compile the currently selected `RegistryView` into provider-specific strict function tools only when that provider needs it.
That is the precise meaning of 'de-centering JSON Schema'. The schema is retained for validation and adapters, but it is no longer the sole object around which the whole runtime is organized.
5. How To Read The Figures And Tables
Figure 1 is an architecture boundary diagram. It shows where compatibility lives and where optimization lives. The point of that figure is to make clear that ZCP does not fork business logic; it forks the model-facing execution contract.
Figure 2 is a causal token diagram. It traces where token cost is created: full schema exposure, broad planning, and result replay on the MCP-compatible path; filtered discovery, staged planning, and compact result propagation on the native path.
Table 1 is a token-cost source map. It is not a benchmark table. It tells you which mechanism removes which cost. Table 2 is a code-level mapping between official MCP implementation files and ZCP implementation files. Tables 3 and 4 are empirical: they show the benchmark and the tier breakdown that follow from those architectural choices.
6. Causal Mechanism
The benchmark only makes sense if the following mechanism chain is true. Each step removes one class of prompt-visible waste.
1
Step 1. Discovery is narrowed before planning starts
MCP-style servers usually expose a flat tool inventory. ZCP lets the native client request `profile="semantic-workflow"` and also filter by `groups` and `stages`. The model therefore begins planning inside a smaller action space.
2
Step 2. Call policy matches discovery policy
A filtered `tools/list` is meaningless if `tools/call` can still invoke the whole registry. ZCP keeps `enforce_tool_visibility_on_call`, so the model cannot silently escape the current exposure policy.
3
Step 3. Schema compilation is delayed and scoped
In MCP, JSON Schema is generated at registration time and then travels with every tool definition. In ZCP, the OpenAI adapter compiles strict schemas only for the selected `RegistryView` and only when the provider requires them.
4
Step 4. Results stop replaying whole artifacts
The canonical runtime checks `output_mode`, `inline_ok`, and value size. Small values remain `scalar`; larger values become `HandleRef + summary`. That changes the next prompt turn from 'repeat the full object' to 'continue from a compact reference'.
5
Step 5. Long-running state leaves the prompt loop
Tasks, handles, progress, and status updates become runtime state. The model no longer needs to keep reconstructing partially completed work by re-reading large tool outputs or repeatedly polling generic tools.
6
Step 6. Semantic tools compress primitive plans
Once a server also offers workflow-level tools, the model no longer has to plan at the lowest possible mutation granularity. That is why the biggest gains appear in Tier B, C, and D rather than in one-shot Tier A calls.
7. Code-Level Comparison
This table compares the official MCP Python SDK implementation style with the local ZCP runtime implementation. The point is not rhetoric; it is where each design places state, schemas, and planning constraints.
| Concern | MCP implementation | ZCP implementation | Token consequence |
|---|
| Primary contract object | `src/mcp/types/_types.py::Tool` centers the public contract on `input_schema`, `output_schema`, and `CallToolResult(content, structured_content)`. | `src/zcp/canonical_protocol.py::ToolDefinition` and `SessionState` center the runtime on tool ids, subset hashes, output modes, handles, defaults, flags, and metadata. | More state is held by the runtime instead of being reconstructed by the model every turn. |
| Schema generation | `src/mcp/server/mcpserver/tools/base.py::Tool.from_function` calls `arg_model.model_json_schema(by_alias=True)` at registration time. | `src/zcp/adapters/openai.py::compile_openai_tools` compiles strict schemas only for the selected `RegistryView`, and only when the adapter needs them. | The model is not forced to see the whole schema-bearing registry on every native turn. |
| Discovery | `src/mcp/server/mcpserver/tool_manager.py::list_tools()` returns the whole tool map; `src/mcp/server/mcpserver/server.py::list_tools()` serializes all tools with schemas. | `src/zcp/server.py::_select_tools(...)` filters by profile, groups, excludeGroups, and stages before returning the list. | Branch factor falls before planning begins. |
| Call discipline | `ToolManager.call_tool(...)` checks only that the name exists and then runs it. | `src/zcp/server.py::_tool_is_exposed(...)` plus `enforce_tool_visibility_on_call` keeps calls inside the active subset. | Filtered discovery does not widen back into a broad execution surface. |
| Result shape | `src/mcp/server/mcpserver/server.py::_handle_call_tool()` wraps outputs into `CallToolResult(content, structured_content)` and keeps those payloads prompt-visible. | `src/zcp/canonical_runtime.py::_build_result()` chooses `scalar` or `handle + summary` via `HandleStore`. | Later turns replay less payload. |
| Native model grammar | The public contract is schema-bearing JSON objects and content blocks. | `src/zcp/profiles/native.py::format_registry()` emits compact `TOOL @id alias(param:type) -> output_mode` lines. | Native planners can operate over compact signatures instead of full JSON Schema trees. |
| Long-running state | Tasks exist, but the generic tool surface still naturally gravitates toward prompt-visible `CallToolResult` loops. | `TaskManager`, `TaskExecutionContext`, progress notifications, and handle refs keep state durable and out of the prompt by default. | Repair loops and polling loops become smaller and less repetitive. |
Table 1. Principle-Level Comparison
| Token cost source | MCP default shape | ZCP countermeasure | Why it matters |
|---|
| Repeated tool-schema exposure | A broad `tools/list` returns full JSON Schema-bearing tool definitions. | Native discovery can return only the active profile/stage subset. | Fewer visible schemas means fewer prompt tokens and less planning entropy. |
| Schema as the planning surface | JSON Schema stays central from registration through transport. | JSON Schema is compiled only at the adapter edge from a selected registry view. | The runtime stops forcing the model to reason over the whole schema object graph. |
| Large result replay | Tool results commonly re-enter the next turn as content or structured content. | Large values become handles plus short summaries. | The next turn carries references instead of full artifacts. |
| Prompt-visible background state | Intermediate state tends to leak back into tool loops and explanations. | Tasks, handles, progress, and session state live in the runtime. | Long-running workflows stay smaller and more stable. |
| Discovery / execution mismatch | A model may list one surface and still wander to any registered tool. | Call visibility is checked against the active exposure policy. | The action space remains narrow after the first decision. |
8. Key Code Snippets
These snippets are the shortest path to the real argument. They compare the official MCP code path to the local ZCP code path without relying on the Excel benchmark implementation itself.
MCP tool registration is schema-first
modelcontextprotocol/python-sdk/src/mcp/server/mcpserver/tools/base.py
The official MCP server path converts Python function metadata into a Pydantic model and immediately serializes it to JSON Schema. That schema becomes the tool contract.
class Tool(BaseModel):
fn: Callable[..., Any] = Field(exclude=True)
name: str = Field(description="Name of the tool")
parameters: dict[str, Any] = Field(description="JSON schema for tool parameters")
fn_metadata: FuncMetadata = Field(...)
@classmethod
def from_function(cls, fn: Callable[..., Any], ...):
func_arg_metadata = func_metadata(fn, ...)
parameters = func_arg_metadata.arg_model.model_json_schema(by_alias=True)
return cls(
fn=fn,
name=func_name,
parameters=parameters,
fn_metadata=func_arg_metadata,
)
MCP returns prompt-visible content objects
modelcontextprotocol/python-sdk/src/mcp/server/mcpserver/server.py
The default call path converts results into `CallToolResult(content, structured_content)`. This is correct for compatibility, but it keeps large results close to the prompt loop.
async def _handle_call_tool(self, ctx, params) -> CallToolResult:
result = await self.call_tool(params.name, params.arguments or {}, context)
if isinstance(result, CallToolResult):
return result
if isinstance(result, tuple) and len(result) == 2:
unstructured_content, structured_content = result
return CallToolResult(
content=list(unstructured_content),
structured_content=structured_content,
)
return CallToolResult(content=list(result))
ZCP canonical contract carries runtime state explicitly
zero-context-protocol-python/src/zcp/canonical_protocol.py
ZCP does not make schema disappear. It makes schema one field inside a richer runtime contract that also tracks subsets, handles, defaults, and output modes.
@dataclass
class ToolDefinition:
tool_id: str
alias: str
description_short: str
input_schema: dict[str, Any]
output_schema: dict[str, Any] | None = None
output_mode: Literal["handle", "scalar"] = "handle"
handle_kind: str = "generic"
defaults: dict[str, Any] = field(default_factory=dict)
flags: frozenset[str] = field(default_factory=frozenset)
metadata: dict[str, Any] = field(default_factory=dict)
@dataclass
class SessionState:
session_id: str
registry_hash: str = ""
tool_subset: tuple[str, ...] = ()
handles: dict[str, HandleRef] = field(default_factory=dict)
ZCP narrows discovery before the first turn
zero-context-protocol-python/src/zcp/server.py
Profile and stage filtering are runtime rules, not prompt conventions. The subset is enforced before the model plans.
def _select_tools(app: FastZCP, params: dict[str, Any]) -> list[Any]:
tools = app.tool_registry.subset().tools
profile = _effective_tool_profile(app, params)
include_groups = _normalize_filter_values(params.get("groups"))
stages = _normalize_filter_values(params.get("stages"))
if profile == app.semantic_workflow_profile:
workflow_tools = [tool for tool in tools if app.semantic_group in _tool_groups(tool)]
if workflow_tools:
tools = workflow_tools
if include_groups:
tools = [tool for tool in tools if _tool_groups(tool) & include_groups]
if stages:
tools = [tool for tool in tools if _tool_stages(tool) & stages]
return tools
ZCP keeps JSON Schema at the adapter boundary
zero-context-protocol-python/src/zcp/adapters/openai.py
Strict JSON Schema is still available, but it is compiled from the current `RegistryView`, not treated as the permanent native planning surface.
def compile_openai_tools(self, session: SessionState, *, tool_subset=None, strict_mode=True):
subset_tuple = tuple(tool_subset or ())
registry_view = self.registry.subset(list(subset_tuple) if subset_tuple else None, limit=self.tool_limit)
session.registry_hash = registry_view.hash
session.tool_subset = subset_tuple
if key not in self._tool_cache:
tools = self.compiler.compile_registry(registry_view)
self._tool_cache[key] = tools
return self._tool_cache[key]
ZCP can present a compact native registry grammar
zero-context-protocol-python/src/zcp/profiles/native.py
The native profile compresses each tool to `id + alias + compact param types + output mode`. This is the clearest expression of schema de-centering.
def format_registry(tools: list[ToolDefinition]) -> str:
entries = []
for tool in tools:
params = ",".join(
f"{name}:{_compact_type(schema)}"
for name, schema in tool.input_schema.get("properties", {}).items()
)
entries.append(f"TOOL @{tool.tool_id} {tool.alias}({params}) -> {tool.output_mode}")
return "\n".join(entries)
ZCP compacts results into scalar or handle
zero-context-protocol-python/src/zcp/canonical_runtime.py
This is the second major token-saving mechanism after filtered discovery. Big results stop re-entering every subsequent turn.
if tool.output_mode == "scalar" and (tool.inline_ok or is_scalar_value(value)):
return CallResult(
cid=request.cid,
status="ok",
scalar=value,
summary=summary,
meta=meta,
)
handle = self.handle_store.create(
kind=handle_kind,
data=value,
summary=summary,
meta=meta,
)
return CallResult(
cid=request.cid,
status="ok",
handle=handle,
summary=handle.summary,
meta=meta,
)
Figure 2. Where The Token Savings Come From
The token gain is causal: smaller registry subset, tighter calling discipline, compact result propagation, and runtime-held state.
MCP schema-first surface
full tool list + full JSON Schema
broad planning over many branches
content / structured_content replay into later turns
ZCP canonical surface
profile-filtered subset + compact contract
planning inside a constrained subset
scalar inline, large values behind handles and task state
smaller next-turn context and fewer repair loops
Table 2. Overall Benchmark
| Path | Answer | Workbook | Tool | Avg total tokens | Avg turns |
|---|
| `zcp_client_to_native_zcp` | 100.0% | 97.3% | 100.0% | 8027.9 | 2.8 |
| `mcp_client_to_zcp_mcp_surface` | 97.3% | 91.9% | 73.0% | 30723.7 | 4.1 |
Table 3. Tier Breakdown
| Tier | What changed structurally | Native ZCP | MCP surface | Advantage |
|---|
| A | Little room for planning policy to help | 15979.4 | 17613.2 | 1.10x |
| B | Short chains collapse into semantic chain tools | 1826.6 | 29239.4 | 16.01x |
| C | Workflow tools remove long primitive plans | 2091.1 | 72113.9 | 34.49x |
| D | Autonomous planning gets the smallest search space | 2018.3 | 19375.7 | 9.60x |
9. Why The Tier Results Look Like This
Tier A
Small gain is expected
One-shot tool calls do not contain much planning waste. They are useful as a sanity check, but they should not be the headline for a runtime-efficiency claim.
Tier B
Semantic chains begin to matter
The first large jump appears when the model would otherwise need to plan across several tightly coupled primitive calls. Narrower discovery plus semantic chain tools reduce internal branching sharply.
Tier C
Workflow compression dominates
This tier proves the gain is not mostly wire-format trivia. The model is no longer planning every low-level mutation, so the savings become structural rather than incremental.
Tier D
Autonomous planning is the real stress test
Tier D is where broad surfaces typically explode into repair loops, repeated reads, and status churn. ZCP wins because the runtime constrains the search space and keeps state outside the prompt before those loops expand.
10. Limits And Scope
- The `3.83x` headline is a published result on the current Excel workflow benchmark, not a universal theorem for every domain or model.
- ZCP's largest gains depend on using the native runtime features that make schemas peripheral rather than central: profile-based discovery, handles, tasks, and semantic tools.
- This report argues that ZCP has a stronger architectural position for model execution. It does not argue that MCP becomes useless for ecosystem interoperability.
- The fairest formulation is therefore: MCP remains the compatibility contract; ZCP becomes the more efficient execution contract.
11. Conclusion
ZCP is stronger than MCP on planning-heavy workloads because it changes the model-facing execution contract, not because it changed the transport or rewrote the backend business logic.
The official MCP code path is schema-first and compatibility-first. The ZCP code path is canonical-runtime-first: schemas remain available, but they are compiled at the edge, while the native runtime is organized around subsets, handles, output modes, and task state.
That design directly explains the benchmark. Fewer tools are visible, less schema text is repeated, large payloads stop replaying into later turns, and long-running state stops leaking back into prompt-visible loops. The result is lower token use and lower planning entropy for the same backend logic.
Read next
Semantic Workflow Profile, Benchmark Methodology, Capability Matrix.