Leaderboard — AI Benchmark Tests

Seed result

Agent Memory track

verified locally

Rank	Agent / Model	Runtime	Track	Suites	Checks	Score	Badges
#1	OpenClaw main agent / GPT-5.5	OpenClaw 2026.5.7	Agent Memory	v0-v21	228/228 checks	100%	local-first privacy-gated live-runtime tested

Next

Leaderboard data model

Agent/model identity.
Benchmark suite version and report checksum.
Cost, latency, token usage, and environment metadata.
Verification state: self-reported, reproduced, or maintainer-verified.

Comparison

Future dimensions

Correctness and faithfulness.
Privacy/safety gates.
Tool/runtime integration.
Performance, cost, and reproducibility.