weval.org

weval.org — Collective Intelligence Project

weval is an open, public-domain platform for evaluating AI model behaviour: anyone can write an eval, share it, and run it against many models. It’s the eval-side counterpart to OFL’s mission — where OFL makes facilitation methods an open commons, weval makes the evals that judge AI behaviour an open commons. CIP run it, and have been a proposed validation partner for the OFL eval suite.

How it works

A weval eval is a blueprint — a portable, CC0 YAML file pairing prompts with a rubric:

Prompts — single or multi-turn conversations.
Rubric — should / should_not criteria, each scored by an LLM judge, or checked deterministically with functions (contains/regex). Criteria carry weights and can define alternative (OR-logic) paths.
Scoring — a graded 5-point scale (unmet → fully met, 0.0–1.0), should_not inverted, weighted into a coverage score.

Blueprints live in a public repo and spread by being forked and adapted — the same “open, forkable spec” pattern OFL uses for methods.

Why it matters for OFL

Two layers connect weval to OFL’s evaluation framework:

Format. weval’s blueprint rubric grammar is a ready model for the eval half of an OFL method spec — express a Why-How-Who facilitation eval as a blueprint and it runs on weval-shaped infrastructure, not just one platform.
Judge calibration. weval doesn’t trust a single judge. It runs multiple judges in consensus (different models and framings, averaged) and measures their agreement with Krippendorff’s α, flagging evals where judges disagree too much to trust. That rigour — grounded in CIP’s research, LLM Judges Are Unreliable — is the missing piece in most LLM-as-judge setups.

Why-How-Who Framework — OFL’s facilitation eval dimensions; the natural content for weval blueprints
Facilitation in the LLM Era — the survey that maps what to measure; weval is how to measure it, in the open
WHoW Framework — academic moderation-analysis framework

OFL

Explorer

weval.org

How it works

Why it matters for OFL

Links

Graph View

Table of Contents

Backlinks

OFL

Explorer

weval.org

How it works

Why it matters for OFL

Related

Links

Graph View

Table of Contents

Backlinks