Benchmark reports

This page publishes the benchmark artifacts that back the docs claims. The runner itself still lives in the Python SDK repository.

The docs site ships curated benchmark artifacts so deployed builds can render the same evidence that was generated in the SDK repository.

Latest snapshot

Model: deepseek-chat. Repeats: 1.

zcp_client_to_native_zcp vs mcp_client_to_zcp_mcp_surface: Advantage 3.83x. Token delta: 22695.8.

Artifacts: benchmark_reports/full_semantic_compare_v5/semantic_benchmark_summary.json

Backend	Answer	Workbook	Tool	Avg Total	Avg Turns	Avg Tool Calls
zcp_client_to_native_zcp	100.0%	97.3%	100.0%	8027.9	2.1	1.1
mcp_client_to_zcp_mcp_surface	97.3%	91.9%	73.0%	30723.7	3.9	3.0

Tier	Native ZCP Avg Total	MCP Surface Avg Total	Ratio	Native Quality
A	15979.4	17613.2	1.10x	100.0% / 93.8% / 100.0%
B	1826.6	29239.4	16.01x	100.0% / 100.0% / 100.0%
C	2091.1	72113.9	34.49x	100.0% / 100.0% / 100.0%
D	2018.3	19375.7	9.60x	100.0% / 100.0% / 100.0%

Artifacts: benchmark_reports/zcp_mcp_tool_call_benchmark.json

Protocol	Runs	Answer Accuracy	Tool Compliance	Avg Prompt	Avg Completion	Avg Total
mcp	8	100.0%	100.0%	4136.1	367.8	4503.9
zcp	8	100.0%	100.0%	2577.5	255.5	2833.0

Case	ZCP Avg Total	MCP Avg Total	MCP / ZCP	Token Delta
warmer_city_delta	2821.0	4579.5	1.62x	1758.5
shanghai_temp_f_and_humidity	2565.0	3834.5	1.49x	1269.5
average_three_city_temperature	3116.0	5237.5	1.68x	2121.5
more_humid_city_delta	2830.0	4364.0	1.54x	1534.0