Validator Logo

dhozil

Crypto Blog & Insights

← Back to Blog

The Equivalence Principle: How GenLayer Validators Agree on AI Outputs

Posted on April 7, 2026 • Tags: GenLayer Equivalence Principle Consensus

A visual breakdown of the three types of equivalence checks — and how to pick the right one for your contract.


The core challenge

Imagine two validators independently ask an LLM to classify a product review as positive or negative. One gets "POSITIVE". The other gets "This review expresses satisfaction." Both are correct answers to the same question — but they're completely different strings.

On a traditional blockchain, consensus requires identical outputs. These two results would never match, and the transaction would fail.

GenLayer's solution to this is the Equivalence Principle — a developer-defined rule that tells validators what "equivalent" means for a given contract.


The three types at a glance

Equivalence Principle — How Validators Agree on Non-Deterministic Outputs
strict_eq
Byte-identical results
⚡ Fastest
prompt_comparative
Semantic similarity check
⏱ Medium
prompt_non_comparative
Criteria-based evaluation
🔍 Flexible
👑 Leader validator
1
Executes the contract fully — LLM call, web fetch, or computation
2
Returns a tightly-constrained output: {"sentiment":"POSITIVE"}
3
Always uses json.dumps(result, sort_keys=True) to ensure consistent key ordering
🔵 Other validators
1
Re-execute the same contract independently
2
Compare their output with the Leader's byte-for-byte
3
Must match exactly — even one character difference = disagree
def run(): result = gl.nondet.exec_prompt( 'Classify as POSITIVE, NEGATIVE, or NEUTRAL. ' f'Text: "{text}". Respond ONLY with JSON: {{"sentiment": "..."}}', response_format="json" ) return json.dumps(result, sort_keys=True) # sort_keys critical! raw = gl.eq_principle.strict_eq(run)
Consensus examples
Leader: {"sentiment":"POSITIVE"} · Validator: {"sentiment":"POSITIVE"}
AGREE
Leader: {"sentiment":"POSITIVE"} · Validator: {"sentiment":"positive"}
DISAGREE
Leader: {"sentiment":"POSITIVE"} · Validator: {"Sentiment":"POSITIVE"}
DISAGREE (key case)
✓ Use when
Output is constrained: yes/no, fixed enum, numeric value, sorted JSON
✗ Avoid when
Output is open-ended text, prices that drift, or creative content
👑 Leader validator
1
Generates an open-ended output: a summary, classification, or analysis
2
Output doesn't need to be constrained — natural language is fine
🔵 Other validators
1
Run the same contract and get their own output
2
Run a second LLM call to compare their output vs Leader's output
3
LLM decides: are these semantically equivalent based on the principle?
def run(): return gl.nondet.exec_prompt( 'Summarize this in one sentence: ' + text ) result = gl.eq_principle.prompt_comparative( run, principle="Summaries are equivalent if they convey the same main idea." )
Consensus examples
Leader: "Bitcoin is a decentralized currency" · Validator: "BTC operates without a central bank"
AGREE (same idea)
Leader: "Bitcoin is rising" · Validator: "Bitcoin is falling rapidly"
DISAGREE
Note: Each validator makes 2 LLM calls (execute + compare), adding ~6s latency vs strict_eq.
✓ Use when
Output is text that can vary in wording but should convey the same meaning
✗ Avoid when
Output can be constrained to fixed schema — use strict_eq instead (faster)
👑 Leader validator
1
Generates output — could be a summary, analysis, creative content
2
Output is evaluated against predefined criteria, not compared to others
🔵 Other validators
1
Do NOT re-execute the contract — no second LLM call needed
2
Evaluate whether the Leader's output meets the criteria in the principle
3
Vote: does it satisfy accuracy, relevance, format requirements?
def run(): return gl.nondet.exec_prompt( 'Write a one-paragraph summary of this article: ' + article ) result = gl.eq_principle.prompt_non_comparative( run, principle="A valid summary: (1) accurately reflects main points, " "(2) is under 100 words, (3) uses neutral tone." )
Evaluation criteria check
Accurately reflects main points of the article?
YES
Under 100 words?
YES (87 words)
Uses neutral tone?
YES
→ Leader's output ACCEPTED
✓ Use when
Output is subjective/creative and validators should judge quality, not compare wording
✗ Avoid when
Criteria are vague — validators need clear, specific rules to evaluate against
Click the cards or tabs above to switch between equivalence types

The Equivalence Principle

What the Equivalence Principle actually is

It's not a fixed rule baked into the protocol. It's a mechanism that lets you, the developer, define the acceptance criteria for your contract's non-deterministic outputs.

Every time your contract calls gl.nondet.exec_prompt() or gl.nondet.web.*, the result goes through an equivalence check before validators can agree. You choose which type of check fits your use case.


Option 1 — strict_eq

The strictest and fastest option. All validators must produce byte-identical results.

def run():
    result = gl.nondet.exec_prompt(
        f'Classify as POSITIVE, NEGATIVE, or NEUTRAL. '
        f'Text: "{text}". '
        f'Respond ONLY with JSON: {{"sentiment": "..."}}'
        response_format="json"
    )
    return json.dumps(result, sort_keys=True)  # sort_keys is critical!

raw = gl.eq_principle.strict_eq(run)

The sort_keys=True part is important — without it, different validators might serialize the same JSON object with keys in different orders, breaking consensus even when the content is identical.

When it works: The output space is small and well-defined. Sentiment classification with three possible values, yes/no decisions, numeric calculations, fixed-schema JSON responses.

When it breaks: Anything open-ended. Prices that fluctuate between the Leader's fetch and a validator's fetch. Summaries where two validators phrase things differently.

From benchmarks, strict_eq adds about 6 seconds of overhead over a pure-Python baseline. It's the most efficient option when applicable.


Option 2 — prompt_comparative

A softer check. Each validator runs the contract independently, then uses a second LLM call to judge whether their output and the Leader's output are semantically equivalent.

def run():
    return gl.nondet.exec_prompt(
        'Summarize this in one sentence: ' + text
    )

result = gl.eq_principle.prompt_comparative(
    run,
    principle="The summaries are equivalent if they convey the same main idea."
)

The principle parameter is your semantic equivalence rule — the LLM uses it to decide if the outputs are close enough.

When it works: Open-ended text where you expect variation in wording but consistency in meaning. Summaries, analyses, explanations.

A real-world gotcha: We tried using strict_eq for a price feed contract that fetched live crypto prices. It failed constantly — validators would get slightly different prices depending on when exactly they made the API call. The right approach for volatile data is prompt_comparative with a tolerance rule.

⏱ Since validators each make two LLM calls (execute + compare), prompt_comparative adds about 12 seconds of overhead vs baseline — double the cost of strict_eq.

Option 3 — prompt_non_comparative

The most flexible option. Validators don't re-run the contract at all — they evaluate whether the Leader's output meets a set of predefined criteria.

def run():
    return gl.nondet.exec_prompt(
        'Write a one-paragraph summary of this article: ' + article
    )

result = gl.eq_principle.prompt_non_comparative(
    run,
    principle="A valid summary: (1) accurately reflects main points, "
              "(2) is under 100 words, (3) uses neutral tone."
)

When it works: Subjective or creative output where validators judging quality makes more sense than validators replicating the work.

The tradeoff: Your criteria need to be specific enough that different validators applying them will consistently reach the same verdict. Vague criteria lead to inconsistent votes.


Choosing the right one

Can your output be constrained to a fixed schema with a small set of possible values? Use strict_eq. It's the fastest and most predictable.

Is your output open-ended text where meaning matters more than exact wording? Use prompt_comparative. Write a clear, specific principle.

Is your output something where re-running doesn't make sense, and validators should just judge the Leader's result? Use prompt_non_comparative. Be very precise with your criteria.


The developer responsibility

The Equivalence Principle is your responsibility to get right. The protocol gives you the tools, but the correctness of your consensus logic depends entirely on how well you define equivalence for your specific use case.

A poorly defined equivalence principle leads to two failure modes: too strict → validators disagree constantly, transactions keep failing. Too loose → validators accept outputs that are actually wrong.

The good news is that GenLayer Studio's validator logs make it reasonably easy to debug. When validators disagree, you can see exactly what each one produced and why the comparison failed.


This is part of a series of visual guides to GenLayer's core concepts. Previous: How Optimistic Democracy Works. Next: GenLayer Architecture explained layer by layer.