Core principles

Scoring

What a pass means

  • Expected observations or answers are produced.
  • Forbidden/private/secret content is excluded.
  • Reports preserve evidence for later review.
  • Category scores expose where a system fails.

Artifacts

What each run should keep

  • Suite JSON fixture.
  • Report JSON with pass/fail and category scores.
  • Generated dashboard/diff/prompt artifacts when available.
  • Future: checksums, runner version, and verified-run signature.