The Equivalence Principle: How GenLayer Validators Agree on AI Outputs
A visual breakdown of the three types of equivalence checks — and how to pick the right one for your contract.
The core challenge
Imagine two validators independently ask an LLM to classify a product review as positive or negative. One gets "POSITIVE". The other gets "This review expresses satisfaction." Both are correct answers to the same question — but they're completely different strings.
On a traditional blockchain, consensus requires identical outputs. These two results would never match, and the transaction would fail.
GenLayer's solution to this is the Equivalence Principle — a developer-defined rule that tells validators what "equivalent" means for a given contract.
The three types at a glance
{"sentiment":"POSITIVE"} · Validator: {"sentiment":"POSITIVE"}{"sentiment":"POSITIVE"} · Validator: {"sentiment":"positive"}{"sentiment":"POSITIVE"} · Validator: {"Sentiment":"POSITIVE"}
What the Equivalence Principle actually is
It's not a fixed rule baked into the protocol. It's a mechanism that lets you, the developer, define the acceptance criteria for your contract's non-deterministic outputs.
Every time your contract calls gl.nondet.exec_prompt() or gl.nondet.web.*, the result goes through an equivalence check before validators can agree. You choose which type of check fits your use case.
Option 1 — strict_eq
The strictest and fastest option. All validators must produce byte-identical results.
def run():
result = gl.nondet.exec_prompt(
f'Classify as POSITIVE, NEGATIVE, or NEUTRAL. '
f'Text: "{text}". '
f'Respond ONLY with JSON: {{"sentiment": "..."}}'
response_format="json"
)
return json.dumps(result, sort_keys=True) # sort_keys is critical!
raw = gl.eq_principle.strict_eq(run)
The sort_keys=True part is important — without it, different validators might serialize the same JSON object with keys in different orders, breaking consensus even when the content is identical.
When it works: The output space is small and well-defined. Sentiment classification with three possible values, yes/no decisions, numeric calculations, fixed-schema JSON responses.
When it breaks: Anything open-ended. Prices that fluctuate between the Leader's fetch and a validator's fetch. Summaries where two validators phrase things differently.
From benchmarks, strict_eq adds about 6 seconds of overhead over a pure-Python baseline. It's the most efficient option when applicable.
Option 2 — prompt_comparative
A softer check. Each validator runs the contract independently, then uses a second LLM call to judge whether their output and the Leader's output are semantically equivalent.
def run():
return gl.nondet.exec_prompt(
'Summarize this in one sentence: ' + text
)
result = gl.eq_principle.prompt_comparative(
run,
principle="The summaries are equivalent if they convey the same main idea."
)
The principle parameter is your semantic equivalence rule — the LLM uses it to decide if the outputs are close enough.
When it works: Open-ended text where you expect variation in wording but consistency in meaning. Summaries, analyses, explanations.
A real-world gotcha: We tried using strict_eq for a price feed contract that fetched live crypto prices. It failed constantly — validators would get slightly different prices depending on when exactly they made the API call. The right approach for volatile data is prompt_comparative with a tolerance rule.
prompt_comparative adds about 12 seconds of overhead vs baseline — double the cost of strict_eq.
Option 3 — prompt_non_comparative
The most flexible option. Validators don't re-run the contract at all — they evaluate whether the Leader's output meets a set of predefined criteria.
def run():
return gl.nondet.exec_prompt(
'Write a one-paragraph summary of this article: ' + article
)
result = gl.eq_principle.prompt_non_comparative(
run,
principle="A valid summary: (1) accurately reflects main points, "
"(2) is under 100 words, (3) uses neutral tone."
)
When it works: Subjective or creative output where validators judging quality makes more sense than validators replicating the work.
The tradeoff: Your criteria need to be specific enough that different validators applying them will consistently reach the same verdict. Vague criteria lead to inconsistent votes.
Choosing the right one
Can your output be constrained to a fixed schema with a small set of possible values? Use strict_eq. It's the fastest and most predictable.
Is your output open-ended text where meaning matters more than exact wording? Use prompt_comparative. Write a clear, specific principle.
Is your output something where re-running doesn't make sense, and validators should just judge the Leader's result? Use prompt_non_comparative. Be very precise with your criteria.
The developer responsibility
The Equivalence Principle is your responsibility to get right. The protocol gives you the tools, but the correctness of your consensus logic depends entirely on how well you define equivalence for your specific use case.
A poorly defined equivalence principle leads to two failure modes: too strict → validators disagree constantly, transactions keep failing. Too loose → validators accept outputs that are actually wrong.
The good news is that GenLayer Studio's validator logs make it reasonably easy to debug. When validators disagree, you can see exactly what each one produced and why the comparison failed.
This is part of a series of visual guides to GenLayer's core concepts. Previous: How Optimistic Democracy Works. Next: GenLayer Architecture explained layer by layer.