The Equivalence Principle: How GenLayer Validators Agree on AI Outputs

Posted on April 7, 2026 • Tags: GenLayer Equivalence Principle Consensus

A visual breakdown of the three types of equivalence checks — and how to pick the right one for your contract.

The core challenge

Imagine two validators independently ask an LLM to classify a product review as positive or negative. One gets "POSITIVE". The other gets "This review expresses satisfaction." Both are correct answers to the same question — but they're completely different strings.

On a traditional blockchain, consensus requires identical outputs. These two results would never match, and the transaction would fail.

GenLayer's solution to this is the Equivalence Principle — a developer-defined rule that tells validators what "equivalent" means for a given contract.

The three types at a glance

Equivalence Principle — How Validators Agree on Non-Deterministic Outputs

strict_eq

Byte-identical results

⚡ Fastest

prompt_comparative

Semantic similarity check

⏱ Medium

prompt_non_comparative

Criteria-based evaluation

🔍 Flexible

👑 Leader validator

Executes the contract fully — LLM call, web fetch, or computation

Returns a tightly-constrained output: {"sentiment":"POSITIVE"}

Always uses json.dumps(result, sort_keys=True) to ensure consistent key ordering

🔵 Other validators

Re-execute the same contract independently

Compare their output with the Leader's byte-for-byte

Must match exactly — even one character difference = disagree

def run():
    result = gl.nondet.exec_prompt(
        'Classify as POSITIVE, NEGATIVE, or NEUTRAL. '
        f'Text: "{text}". Respond ONLY with JSON: {{"sentiment": "..."}}',
        response_format="json"
    )
    return json.dumps(result, sort_keys=True)  # sort_keys critical!

raw = gl.eq_principle.strict_eq(run)

Consensus examples

✓

Leader: {"sentiment":"POSITIVE"} · Validator: {"sentiment":"POSITIVE"}

AGREE

✗

Leader: {"sentiment":"POSITIVE"} · Validator: {"sentiment":"positive"}

DISAGREE

✗

Leader: {"sentiment":"POSITIVE"} · Validator: {"Sentiment":"POSITIVE"}

DISAGREE (key case)

✓ Use when

Output is constrained: yes/no, fixed enum, numeric value, sorted JSON

✗ Avoid when

Output is open-ended text, prices that drift, or creative content

👑 Leader validator

Generates an open-ended output: a summary, classification, or analysis

Output doesn't need to be constrained — natural language is fine

🔵 Other validators

Run the same contract and get their own output

Run a second LLM call to compare their output vs Leader's output

LLM decides: are these semantically equivalent based on the principle?

def run():
    return gl.nondet.exec_prompt(
        'Summarize this in one sentence: ' + text
    )

result = gl.eq_principle.prompt_comparative(
    run,
    principle="Summaries are equivalent if they convey the same main idea."
)

Consensus examples

✓

Leader: "Bitcoin is a decentralized currency" · Validator: "BTC operates without a central bank"

AGREE (same idea)

✗

Leader: "Bitcoin is rising" · Validator: "Bitcoin is falling rapidly"

DISAGREE

⏱ Note: Each validator makes 2 LLM calls (execute + compare), adding ~6s latency vs strict_eq.

✓ Use when

Output is text that can vary in wording but should convey the same meaning

✗ Avoid when

Output can be constrained to fixed schema — use strict_eq instead (faster)

👑 Leader validator

Generates output — could be a summary, analysis, creative content

Output is evaluated against predefined criteria, not compared to others

🔵 Other validators

Do NOT re-execute the contract — no second LLM call needed

Evaluate whether the Leader's output meets the criteria in the principle

Vote: does it satisfy accuracy, relevance, format requirements?

def run():
    return gl.nondet.exec_prompt(
        'Write a one-paragraph summary of this article: ' + article
    )

result = gl.eq_principle.prompt_non_comparative(
    run,
    principle="A valid summary: (1) accurately reflects main points, "
              "(2) is under 100 words, (3) uses neutral tone."
)

Evaluation criteria check

✓

Accurately reflects main points of the article?

YES

✓

Under 100 words?

YES (87 words)

✓

Uses neutral tone?

YES

→ Leader's output ACCEPTED

✓ Use when

Output is subjective/creative and validators should judge quality, not compare wording

✗ Avoid when

Criteria are vague — validators need clear, specific rules to evaluate against

Click the cards or tabs above to switch between equivalence types

What the Equivalence Principle actually is

It's not a fixed rule baked into the protocol. It's a mechanism that lets you, the developer, define the acceptance criteria for your contract's non-deterministic outputs.

Every time your contract calls gl.nondet.exec_prompt() or gl.nondet.web.*, the result goes through an equivalence check before validators can agree. You choose which type of check fits your use case.

Option 1 — strict_eq

The strictest and fastest option. All validators must produce byte-identical results.

def run():
    result = gl.nondet.exec_prompt(
        f'Classify as POSITIVE, NEGATIVE, or NEUTRAL. '
        f'Text: "{text}". '
        f'Respond ONLY with JSON: {{"sentiment": "..."}}'
        response_format="json"
    )
    return json.dumps(result, sort_keys=True)  # sort_keys is critical!

raw = gl.eq_principle.strict_eq(run)

The sort_keys=True part is important — without it, different validators might serialize the same JSON object with keys in different orders, breaking consensus even when the content is identical.

When it works: The output space is small and well-defined. Sentiment classification with three possible values, yes/no decisions, numeric calculations, fixed-schema JSON responses.

When it breaks: Anything open-ended. Prices that fluctuate between the Leader's fetch and a validator's fetch. Summaries where two validators phrase things differently.

From benchmarks, strict_eq adds about 6 seconds of overhead over a pure-Python baseline. It's the most efficient option when applicable.

Option 2 — prompt_comparative

A softer check. Each validator runs the contract independently, then uses a second LLM call to judge whether their output and the Leader's output are semantically equivalent.

def run():
    return gl.nondet.exec_prompt(
        'Summarize this in one sentence: ' + text
    )

result = gl.eq_principle.prompt_comparative(
    run,
    principle="The summaries are equivalent if they convey the same main idea."
)

The principle parameter is your semantic equivalence rule — the LLM uses it to decide if the outputs are close enough.

When it works: Open-ended text where you expect variation in wording but consistency in meaning. Summaries, analyses, explanations.

A real-world gotcha: We tried using strict_eq for a price feed contract that fetched live crypto prices. It failed constantly — validators would get slightly different prices depending on when exactly they made the API call. The right approach for volatile data is prompt_comparative with a tolerance rule.

⏱ Since validators each make two LLM calls (execute + compare), prompt_comparative adds about 12 seconds of overhead vs baseline — double the cost of strict_eq.

Option 3 — prompt_non_comparative

The most flexible option. Validators don't re-run the contract at all — they evaluate whether the Leader's output meets a set of predefined criteria.

def run():
    return gl.nondet.exec_prompt(
        'Write a one-paragraph summary of this article: ' + article
    )

result = gl.eq_principle.prompt_non_comparative(
    run,
    principle="A valid summary: (1) accurately reflects main points, "
              "(2) is under 100 words, (3) uses neutral tone."
)

When it works: Subjective or creative output where validators judging quality makes more sense than validators replicating the work.

The tradeoff: Your criteria need to be specific enough that different validators applying them will consistently reach the same verdict. Vague criteria lead to inconsistent votes.

Choosing the right one

Can your output be constrained to a fixed schema with a small set of possible values? Use strict_eq. It's the fastest and most predictable.

Is your output open-ended text where meaning matters more than exact wording? Use prompt_comparative. Write a clear, specific principle.

Is your output something where re-running doesn't make sense, and validators should just judge the Leader's result? Use prompt_non_comparative. Be very precise with your criteria.

The developer responsibility

The Equivalence Principle is your responsibility to get right. The protocol gives you the tools, but the correctness of your consensus logic depends entirely on how well you define equivalence for your specific use case.

A poorly defined equivalence principle leads to two failure modes: too strict → validators disagree constantly, transactions keep failing. Too loose → validators accept outputs that are actually wrong.

The good news is that GenLayer Studio's validator logs make it reasonably easy to debug. When validators disagree, you can see exactly what each one produced and why the comparison failed.

This is part of a series of visual guides to GenLayer's core concepts. Previous: How Optimistic Democracy Works. Next: GenLayer Architecture explained layer by layer.

dhozil

The Equivalence Principle: How GenLayer Validators Agree on AI Outputs

The core challenge

The three types at a glance

What the Equivalence Principle actually is

Option 1 — strict_eq

Option 2 — prompt_comparative

Option 3 — prompt_non_comparative

Choosing the right one

The developer responsibility