Benchmark reports

This page publishes the benchmark artifacts that back the docs claims. The runner itself still lives in the Python SDK repository.

The docs site ships curated benchmark artifacts so deployed builds can render the same evidence that was generated in the SDK repository.

Latest snapshot

Model: deepseek-chat. Repeats: 1.

Semantic Workflow Benchmark v5

Headline

zcp_client_to_native_zcp vs mcp_client_to_zcp_mcp_surface: Advantage 3.83x. Token delta: 22695.8.

Artifacts: benchmark_reports/full_semantic_compare_v5/semantic_benchmark_summary.json

Overall comparison

BackendAnswerWorkbookToolAvg TotalAvg TurnsAvg Tool Calls
zcp_client_to_native_zcp100.0%97.3%100.0%8027.92.11.1
mcp_client_to_zcp_mcp_surface97.3%91.9%73.0%30723.73.93.0

Tier comparison

TierNative ZCP Avg TotalMCP Surface Avg TotalRatioNative Quality
A15979.417613.21.10x100.0% / 93.8% / 100.0%
B1826.629239.416.01x100.0% / 100.0% / 100.0%
C2091.172113.934.49x100.0% / 100.0% / 100.0%
D2018.319375.79.60x100.0% / 100.0% / 100.0%

Compact Tool Benchmark

Artifacts: benchmark_reports/zcp_mcp_tool_call_benchmark.json

Compact summary

ProtocolRunsAnswer AccuracyTool ComplianceAvg PromptAvg CompletionAvg Total
mcp8100.0%100.0%4136.1367.84503.9
zcp8100.0%100.0%2577.5255.52833.0

Compact case breakdown

CaseZCP Avg TotalMCP Avg TotalMCP / ZCPToken Delta
warmer_city_delta2821.04579.51.62x1758.5
shanghai_temp_f_and_humidity2565.03834.51.49x1269.5
average_three_city_temperature3116.05237.51.68x2121.5
more_humid_city_delta2830.04364.01.54x1534.0