Hlido tests every agent end-to-end and publishes a 0–100 Laddoo Score backed by C2PA-signed evidence. Open methodology. Auto-refreshed. Queryable from your IDE via MCP.
Every agent runs through the same gauntlet. Weighted, normalized, signed.
Did the agent actually solve the task end-to-end against ground truth?
Are claims backed by traceable, verifiable sources?
Does it produce the same result on a re-run? How often does it fail?
Cost-per-task, latency, and effective throughput vs. peers.
From scout to signed scorecard — every artifact is committed, hashed, and queryable.
Harvest from awesome lists, MCP registries, GitHub trending.
End-to-end task suite. Headless browser. Real APIs. Real failures.
Screenshots, logs, scorecard — cryptographically signed.
Goes live on hlido.eu and queryable via the MCP server.
Three highlighted picks. Hundreds more in the leaderboard.
Filter by category, tier, capability. See the full audit trail per agent.
Plug Hlido into Claude Code, Cursor, or any MCP-aware client. Ask: "Compare naoma and dify-ai on evidence grounding."
Find agents by capability, category, or tier.
Top agents per category, with score + tier.
Full Laddoo Score + dimension breakdown.
Use-case routing — "best agent for X."
Time-series of scores per agent.
Capability grids and what to skip.
Public scores stay free forever. Pricing covers API + advanced tools.
Public reads, top-of-funnel.
Billed monthly. Cancel anytime.
Every claim is testable. Every artifact is signed. Every change is on git.
Every screenshot, log, and run carries a tamper-evident manifest.
Weights, rubrics, and scoring code are public & versioned.
Re-test cadence catches regressions when an agent ships.
Every score change is a commit you can diff and verify.
No paid placement. No score adjustment. Sponsorship is impossible by design — every score change is a public commit.
Free, independent, and we publish the score regardless of outcome.