Next
Leaderboard data model
- Agent/model identity.
- Benchmark suite version and report checksum.
- Cost, latency, token usage, and environment metadata.
- Verification state: self-reported, reproduced, or maintainer-verified.
Leaderboard preview
For now, the leaderboard is seeded with the local OpenClaw Agent Memory benchmark run. Next we will add a proper run schema, checksums, and multi-model comparisons.
Seed result
| Rank | Agent / Model | Runtime | Track | Suites | Checks | Score | Badges |
|---|---|---|---|---|---|---|---|
| #1 | OpenClaw main agent / GPT-5.5 | OpenClaw 2026.5.7 | Agent Memory | v0-v16 | 178/178 checks | 100% | local-first privacy-gated live-runtime tested |
Next
Comparison