Introduction

Frontend and backend

The POD2 system consists of a frontend and a backend, connected by a middleware. This page outlines some design principles for deciding which components go where.

user -- frontend -- middleware -- backend -- ZK circuit

The frontend is what we want the user to see; the backend is what we want the circuit to see.

Circuit and proving system

The first implementation of POD2 uses Plonky2 as its proving system. In principle, a future implementation could use some other proving system. The frontend and middleware should not be aware of what proving system is in use: anything specific to the proving system belongs to the backend.

User-facing types versus in-circuit types

The frontend type system exposes human-readable types to POD developers: strings, ints, bools, and so forth. On the backend, all types are build out of field elements. The middleware should handle the conversion.

POD value types

From the frontend perspective, POD values may be one of the following1 types: four atomic types

  • Integer
  • Bool
  • String
  • Raw

and three compound types

  • Dictionary
  • Array
  • Set.

From the backend perspective, however, these types will all be encoded as a fixed number of field elements, the number being chosen so as to accommodate the Integer type as well as hashes to represent the String and compound types with the appropriate level of security.

In the case of the Plonky2 backend with 100 bits of security, all of these types are represented as 4 field elements, the output of the Poseidon hash function used there being

Integer

In the frontend, this type is none other than u642. In the backend, it will be appropriately embedded into the codomain of the canonical hash function.

Bool

In the frontend, this is a simple bool. In the backend, it will have the same encoding as an Integer 0 (for false) or 1 (for true).

String

In the frontend, this type corresponds to the usual String. In the backend, the string will be mapped to a sequence of field elements and hashed with the hash function employed there, thus being represented by its hash.

Raw

"Raw" is short for "raw value". A Raw exposes a backend Value on the frontend.

With the plonky2 backend, a Raw is a tuple of 4 elements of the Goldilocks field.

Dictionary, array, set

The array, set and dictionary types are similar types. While all of them use a merkletree under the hood, each of them uses it in a specific way:

  • dictionary: the user original keys and values are hashed to be used in the leaf.
    • leaf.key=hash(original_key)
    • leaf.value=hash(original_value)
  • array: the elements are placed at the value field of each leaf, and the key field is just the array index (integer)
    • leaf.key=i
    • leaf.value=original_value
  • set: both the key and the value are set to the hash of the value.
    • leaf.key=hash(original_value)
    • leaf.value=hash(original_value)

In the three types, the merkletree under the hood allows to prove inclusion & non-inclusion of the particular entry of the {dictionary/array/set} element.

A concrete implementation of dictionary, array, set can be found at pod2/src/middleware/containers.rs.




1

TODO In POD 1, there is the cryptographic type, which has the same type of the output of the hash function employed there. It is useful for representing arbitrary hashes. Do we want to expand our type list to include a similar type, which would correspond to the HashOut type in the case of Plonky2? This would not have a uniform representation in the frontend if we continue to be backend agnostic unless we fix the number of bits to e.g. 256, in which case we would actually need one more field element in the case of Plonky2.

2

TODO Replace this with i64 once operational details have been worked out.

3

Definitions of drop and take may be found here and here.

Anchored keys

Rather than dealing with just keys, we introduce the notion of an anchored key, which is a pair consisting of an origin specifier and a key, i.e.

type AnchoredKey = (Origin, Key)
type Key = String

An origin is a triple consisting of a numeric identifier called the origin ID, a string called the origin name (omitted in the backend) and another numeric identifier called the gadget ID, which identifies the means by which the value corresponding to a given key is produced.

The origin ID is defined to be 0 for 'no origin' and 1 for 'self origin', otherwise it is the content ID1 of the POD to which it refers. The origin name is not cryptographically significant and is merely a convenience for the frontend.

The gadget ID takes on the values in the following table:

Gadget IDMeaning
0no gadget
1SignedPOD gadget: The key-value pair was produced in the construction of a SignedPOD.
2MainPOD gadget: The key-value pair was produced in the construction of a MainPOD.

For example, a gadget ID of 1 implies that the key-value pair in question was produced in the process of constructing a SignedPOD.

1

TODO Refer to this when it is documented.

Backend types

On the backend, there is only a single type: Value.

A Value is simply a tuple of field elements. With the plonky2 backend, a Value is a tuple of 4 field elements. In general, the backend will expose a constant VALUE_SIZE, and a Value will be a tuple of VALUE_SIZE field elements.

Integers and booleans

The backend encoding stores integers in such a way that arithmetic operations (addition, multiplication, comparison) are inexpensive to verify in-circuit.

In the case of the Plonky2 backend, an integer is decomposed as with and represented as where is the canonical projection.

On the backend, a boolean is stored as an integer, either 0 or 1; so logical operations on booleans are also inexpensive.

Strings

The backend encoding stores strings as hashes, using a hash function that might not be zk-friendly. For this reason, string operations (substrings, accessing individual characters) are hard to verify in-circuit. The POD2 system does not provide methods for manipulating strings.

In other words: As POD2 sees it, two strings are either equal or not equal. There are no other relationships between strings.

In the case of the Plonky2 backend, a string is converted to a sequence of bytes with the byte 0x01 appended as padding, then the bytes are split into 7-byte chunks starting from the left, these chunks then being interpreted as integers in little-endian form, each of which is naturally an element of GoldilocksField, whence the resulting sequence may be hashed via the Poseidon hash function. Symbolically, given a string , its hash is defined by

where poseidon is the Poseidon instance used by Plonky2, is as above, is defined such that[^aux]

the mapping is given by

and is the canonical mapping of a string to its UTF-8 representation.

Compound types

The three front-end compound types (Dictionary, Array, Set) are all represented as Merkle roots on the backend. The details of the representation are explained on a separate Merkle tree page.

MerkleTree

In the POD2 backend, a MerkleTree is used to store an unordered set of key-value pairs. The frontend compound types Array, Set, and Dictionary are all represented as MerkleTrees on the backend.

From the high level, we can think of a MerkleTree as a 'hashmap' storage, that allows us to generate proofs of inclusion and non-inclusion of the key-values stored into it.

A MerkleTree is represented in-circuit as its Merkle root; in the Plonky2 backend, this root is a tuple of four field elements. This makes a MerkleTree the same size in-circuit as the atomic types Integer and String. (In general, regardless of the proof system used on the backend, all three types are represented in-circuit by the same number of field elements; this number is determined by the security requirement of the hash function.)

The encoding of the MerkleTree is a recursive process:

  • Encode all keys and values in the MerkleTree.
  • Put all keys and values into a sparse Merkle tree.
  • The MerkleTree is encoded in-circuit as the root of this sparse Merkle tree.

This document explains the construction of the sparse Merkle tree.

The branching rule

A sparse Merkle tree is implemented as a binary tree. The insertion path of any key is given by a deterministic rule: given key and a nonnegative integer depth, the rule determines that key belongs to either the left or right branch at depth depth.

The precise rule is as follows. In-circuit, compute a Poseidon hash of key to obtain a 4-tuple of field elements

Poseidon(key) = (k_0, k_1, k_2, k_3).

Write the field elements in binary (in little-endian order):

k_0 = b_0 b_1 ... b_63
k_1 = b_64 b_65 ... b_127
....

At the root, key goes to the left subtree if b_0 = 0, otherwise the right subtree. At depth 1, key goes to the left subtree if b_1 = 0, otherwise the right subtree, and similarly for higher depth.

The tree structure

A Merkle tree with no entry at all is represented by the hash value root = hash(0). (With the Plonky2 backend, the hash function hash will output a 4-tuple of field elements.)

A Merkle tree with a single entry (key, value) is called a "leaf". It is represented by the hash value root = hash((key, value, 1)).

A Merkle tree tree with more than one entry is required to have two subtrees, left and right. It is then represented by the hash value root = hash((left_root, right_root, 2)).

(The role of the constants 1 and 2 is to prevent collisions between leaves and non-leaf Merkle roots. If the constants were omitted, a large Merkle tree could be dishonestly interpreted as a leaf, leading to security vulnerabilities.)

Example 1

Suppose we want to build a Merkle tree from the following Dictionary with three key-value pairs:

{
    4: "even",
    5: "odd",
    6: "even"
}

The keys are integers, so the are represented in-circuit by themselves. Let's suppose that in little-endian order, the first three bits of the hashes of the keys are:

hash(4) = 000...
hash(5) = 010...
hash(6) = 001...

The resulting tree looks like:

                        root
                         /\   
                        /  \  
                       /    \ 
                      /      \
                   L_root   R_root = hash(0)
                     /\       
                    /  \                             
                   /    \     
                  /      \    
               LL_root  LR_root = hash((4, "even", 1))
                 /\           
                /  \          
               /    \         
              /      \        
        LLL_root   LLR_root = hash((5, "odd", 1))
          ||          
  hash((6, "even", 1))

The intermediate roots are computed as hashes of their subroots:

LL_root = hash((LLL_root, LLR_root, 2))
L_root = hash((LL_root, LR_root, 2))
root = hash((L_root, R_root, 2)).

The full Dictionary is then represented in the backend as root (four field elements in the Plonky2 backend).

Example 2

So for example, imagine we have 8 key-pairs, where the keys are just an enumeration from 0 to 7, then the tree leaves positions would look like:

Now let's change the key of the leaf key=1, and set it as key=13. Then, their respective leaf paths will be the same until they diverge in the 4-th bit:

Example 3

Suppose we have 4 key-values, where the first four bits of the hashes of the keys are 0000, 0100, 1010 and 1011. The tree would look like:

To iterate this example, suppose we have the following data in a POD:

{
	id: "11000...",
	kvs : {
		idNumber: "424242",
		dateOfBirth: 1169909384,
		userPk: 9876543210, // target user of this POD
		_signerPk: 1234567890, // signer of the POD
	},
	// ...
}

The merkletree will contain the key values from the kvs field.

Suppose that the binary representation of the hash of the key userPk is 1011.... This uniquely defines the leaf position that contains the public key of the authenticated user. Similarly for the other key-values:

Proofs of inclusion and non-inclusion

Merkle proofs contain the siblings along the path from the leaf to the root, where the leaf position is determined by the key binary representation.

Since leaf positions are deterministic based on the key, the same approach is used for non-inclusion proofs, where it can be proven that a key is not in the tree, and furthermore, that a value is not in the tree (although the key exists):

  1. Proving that the key does not exist in the tree is achieved by generating the merkle-proof for the specific key, and showing that the (virtual) leaf is empty - this is, showing that going down the path of the non-existing key, there is a leaf with a different key, meaning that the non-existing key has not been inserted in the tree.
  2. Proving that a value is not in the tree (although the key exists) is achieved by generating the merkle-proof for the specific key, and showing that the leaf exists but it has a different value than the one being proved.

For the current use cases, we don't need to prove that the key exists but the value is different on that leaf, so we only use the option 1.

There are 2 cases to have into account when dealing with non-inclusion proofs:

  • case i) the expected leaf does not exist.
  • case ii) the expected leaf does exist in the tree, but it has a different key.

Encoding

TODO: how key-values, nodes, merkle-proofs, ... are encoded.

Interface

#![allow(unused)]
fn main() {
impl MerkleTree {
    /// builds a new `MerkleTree` where the leaves contain the given key-values
    fn new(max_depth: usize, kvs: &HashMap<Value, Value>) -> Result<Self>;

    /// returns the root of the tree
    fn root(&self) -> Hash;
    
    /// returns the max_depth parameter from the tree
    fn max_depth(&self) -> usize;

    /// returns the value at the given key
    fn get(&self, key: &Value) -> Result<Value>;
    
    /// returns a boolean indicating whether the key exists in the tree
    fn contains(&self, key: &Value) -> Result<bool>;

    /// returns a proof of existence, which proves that the given key exists in
    /// the tree. It returns the `value` of the leaf at the given `key`, and the
    /// `MerkleProof`.
    fn prove(&self, key: &Value) -> Result<(Value, MerkleProof)>;

    /// returns a proof of non-existence, which proves that the given
    /// `key` does not exist in the tree. The return value specifies
    /// the key-value pair in the leaf reached as a result of
    /// resolving `key` as well as a `MerkleProof`.
    fn prove_nonexistence(&self, key: &Value) -> Result<MerkleProof>;

    /// verifies an inclusion proof for the given `key` and `value`
    fn verify(max_depth: usize, root: Hash, proof: &MerkleProof,
                        key: &Value, value: &Value,) -> Result<()>;

    /// verifies a non-inclusion proof for the given `key`, that is, the given
    /// `key` does not exist in the tree
    fn verify_nonexistence( max_depth: usize, root: Hash,
                        proof: &MerkleProof, key: &Value,) -> Result<()>;

    /// returns an iterator over the leaves of the tree
    fn iter(&self) -> Iter;
}
}

Development plan

  • short term: merkle tree as a 'precompile' in POD operations, which allows to directly verify proofs
  • long term exploration:

Resources

Signature

Current signature scheme used is proof-based signatures using Plonky2 proofs, following https://eprint.iacr.org/2024/1553 and https://jdodinh.io/assets/files/m-thesis.pdf. This comes from Polygon Miden's RPO STARK-based signatures.

In future iterations we may replace it by other signature schemes (either elliptic curve based scheme on a Golilocks-prime friendly curve, or a lattice based scheme).

generate_params()

: plonky2 circuit prover params
: plonky2 circuit verifier params
return

keygen()

secret key:
public key: 1
return

sign(pp, sk, m)




return

verify(vp, sig, pk, m)



return

Plonky2 circuit

private inputs:
public inputs:



1

The 2024/1553 paper uses to have as input (to the hash) 8 field elements, to be able to reuse the same instance of the RPO hash as the one they use later in the signature (where it hashes 8 field elements).

Deductions

Statements

A statement is any sort of claim about the values of entries: for example, that two values are equal, or that one entry is contained in another.

Statements come in two types: built-in and custom. There is a short list of built-in statements (see below). 1 In addition, users can freely define custom statements.

From the user (front-end) perspective, a statement represents a claim about the values of some number of entries -- the statement can only be proved if the claim is true. On the front end, a statement is identified by its name (ValueOf, Equal, etc.).

From the circuit (back-end) perspective, a statement can be proved either:

  • by direct in-circuit verification, or
  • by an operation (aka deduction rule). On the back end, a statement is identified by a unique numerical identifier.

Built-in statements

The POD system has several builtin statements. These statements are associated to a reserved set of statement IDs.

Backend statements

A statement is a code (or, in the frontend, string identifier) followed by 0 or more arguments. These arguments may consist of up to three anchored keys and up to one POD value.

The following table summarises the natively-supported statements, where we write value_of(ak) for 'the value anchored key ak maps to', which is of type PODValue, and key_of(ak) for the key part of ak:

CodeIdentifierArgsMeaning
0Noneno statement, always true (useful for padding)
1Falsealways false (useful for padding disjunctions)
2ValueOfak, valuevalue_of(ak) = value
3Equalak1, ak2value_of(ak1) = value_of(ak2)
4NotEqualak1, ak2value_of(ak1) != value_of(ak2)
5LtEqak1, ak2value_of(ak1) <= value_of(ak2)
6Ltak1, ak2value_of(ak1) < value_of(ak2)
7Containsak1, ak2(key_of(ak2), value_of(ak2)) ∈ value_of(ak1) (Merkle inclusion)
8NotContainsak1, ak2(key_of(ak2), value_of(ak2)) ∉ value_of(ak1) (Merkle exclusion)
9SumOfak1, ak2, ak3value_of(ak1) = value_of(ak2) + value_of(ak3)
10ProductOfak1, ak2, ak3value_of(ak1) = value_of(ak2) * value_of(ak3)
11MaxOfak1, ak2, ak3value_of(ak1) = max(value_of(ak2), value_of(ak3))
12HashOfak1, ak2, ak3value_of(ak1) = hash(value_of(ak2), value_of(ak3))

Frontend statements

The frontend also exposes the following syntactic sugar predicates. These predicates are not supported by the backend. The frontend compiler is responsible for translating these predicates into the predicates above.

CodeIdentifierArgs and desugaring
1000DictContainsDictContains(root, key, val) -> Contains(root, key, val)
1001DictNotContainsDictNotContains(root, key, val) -> NotContains(root, key, val)
1002SetContainsSetContains(root, val) -> Contains(root, val, val)
1003SetNotContainsSetNotContains(root, val) -> Contains(root, val, val)
1004ArrayContainsArrayContains(root, idx, val) -> Contains(root, idx, val)
1005GtEqGtEq(a, b) -> LtEq(b, a)
1006GtGt(a, b) -> Lt(b, a)

In the future, we may also reserve statement IDs for "precompiles" such as:

EcdsaPrivToPubOf(A.pubkey, B.privkey),

as well as for low-level operations on Merkle trees and compound types. NOTE Merkle trees and compound types explained in a separate markdown file ./merklestatements.md which is no longer part of these docs, but saved in the github repo in case we need to restore it in the future.

Built-in statements for entries of any type

A ValueOf statement asserts that an entry has a certain value.

ValueOf(A.name, "Arthur") 

An Equal statement asserts that two entries have the same value. (Technical note: The circuit only proves equality of field elements; no type checking is performed. For strings or Merkle roots, collision-resistance of the hash gives a cryptographic guarantee of equality. However, note both Arrays and Sets are implemented as dictionaries in the backend; the backend cannot type-check, so it is possible to prove an equality between an Array or Set and a Dictionary.)

Equal(A.name, B.name)

An NotEqual statement asserts that two entries have different values.

NotEqual   (for arbitrary types)
Built-in Statements for Numerical Types

An Gt(x, y) statement asserts that x is an entry of type Integer, y is an entry or constant of type Integer, and x > y.

Gt    (for numerical types only)
Gt(A.price, 100)
Gt(A.price, B.balance)

The statements Lt, GEq, Leq are defined analogously.

SumOf(x, y, z) asserts that x, y, z are entries of type Integer, and 2

ProductOf and MaxOf are defined analogously.

The two items below may be added in the future:

poseidon_hash_of(A.hash, B.preimage) // perhaps a hash_of predicate can be parametrized by an enum representing the hash scheme; rather than having a bunch of specific things like SHA256_hash_of and poseidon_hash_of etc.
ecdsa_priv_to_pub_of(A.pubkey, B.privkey)
1

TODO List of built-in statements is not yet complete.

2

TODO Does sum mean x+y = z or x = y+z?

Operations

The mechanism by which statements are derived is furnished by operations. Roughly speaking, with few exceptions, an operation deduces a statement from one or more existing statements according to some relation that must be satisfied between these statements. For example, if Equal(ak1, ak2) holds true, then the operation SymmetricEq applied to this statement yields Equal(ak2, ak1).

More precisely, an operation is a code (or, in the frontend, string identifier) followed by 0 or more arguments. These arguments may consist of up to three statements, up to one key-value pair and up to one Merkle proof.

The following table summarises the natively-supported operations:

CodeIdentifierArgsConditionOutput
0NoneNone
1NewEntry1(key, value)ValueOf(ak, value), where ak has key key and origin ID 1
2CopyStatements
3EqualFromEntriess1, s2s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), value1 = value2Equal(ak1, ak2)
4NotEqualFromEntriess1, s2s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), value1 != value2NotEqual(ak1, ak2)
5LtEqFromEntriess1, s2s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), value1 <= value2LtEq(ak1, ak2)
6LtFromEntriess1, s2s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), value1 < value2Lt(ak1, ak2)
7TransitiveEqualFromStatementss1, s2s1 = Equal(ak1, ak2), s2 = Equal(ak3, ak4), ak2 = ak3Equal(ak1, ak4)
8LtToNotEqualss = Lt(ak1, ak2)NotEqual(ak1, ak2)
9ContainsFromEntriess1, s2, s3, proofs1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), s3 = ValueOf(ak3, value3), merkle_includes(value1, value2, value3, proof) = trueContains(ak1, ak2, ak3)
10NotContainsFromEntriess1, s2, proofs1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), merkle_excludes(value1, value2, proof) = trueNotContains(ak1, ak2)
11SumOfs1, s2, s3s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), s3 = ValueOf(ak3, value3), value1 = value2 + value3SumOf(ak1, ak2, ak3)
12ProductOfs1, s2, s3s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), s3 = ValueOf(ak3, value3), value1 = value2 * value3ProductOf(ak1, ak2, ak3)
13MaxOfs1, s2, s3s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), s3 = ValueOf(ak3, value3), value1 = max(value2, value3)MaxOf(ak1, ak2, ak3)
14HashOfs1, s2, s3s1 = ValueOf(ak1, value1), s2 = ValueOf(ak2, value2), s3 = ValueOf(ak3, value3), value1 = hash(value2, value3)HashOf(ak1, ak2, ak3)



The following table summarizes "syntactic sugar" operations. These operations are not supported by the backend. The frontend compiler is responsible for translating these operations into the operations above.

CodeIdentifierArgs and desugaring
1001DictContainsFromEntriesDictContainsFromEntries(dict_st, key_st, value_st) -> ContainsFromEntries(dict_st, key_st, value_st)
1002DictNotContainsFromEntriesDictNotContainsFromEntries(dict_st, key_st, value_st) -> NotContainsFromEntries(dict_st, key_st, value_st)
1003SetContainsFromEntriesSetContainsFromEntries(set_st, value_st) -> ContainsFromEntries(set_st, value_st, value_st)
1004SetNotContainsFromEntriesSetNotContainsFromEntries(set_st, value_st) -> NotContainsFromEntries(set_st, value_st, value_st)
1005ArrayContainsFromEntriesArrayContainsFromEntries(array_st, index_st, value_st) -> ContainsFromEntries(array_st, index_st, value_st)
1006GtEqFromEntriesGtEqFromEntries(s1, s2) -> LtEqFromEntries(s2, s1)
1007GtFromEntriesGtFromEntries(s1, s2) -> LtFromEntries(s2, s1)
1008GtToNotEqualGtToNotEqual(s1, s2) -> LtToNotEqual(s1, s2)



WIP. The following table defines more operations that are not yet implemented.
Issue keeping track of the operations: #108.

CodeIdentifierArgsConditionOutput
SymmetricEqss = Equal(ak1, ak2)Eq(ak2, ak1)
SymmetricNEqss = NotEqual(ak1, ak2)NEq(ak2, ak1)
RenameSintainss1, s2s1 = Sintains(ak1, ak2), s2 = Equal(ak3, ak4), ak1 = ak3Sintains(ak4, ak2)
TransitiveEqs1, s2s1 = Equal(ak1, ak2), s2 = Equal(ak3, ak4), ak2 = ak3Eq(ak1, ak4)
TransitiveGts1, s2s1 = Gt(ak1, ak2), s2 = Gt(ak3, ak4), ak2 = ak3Gt(ak1, ak4)
TransitiveLEqs1, s2s1 = LEq(ak1, ak2), s2 = LEq(ak3, ak4), ak2 = ak3LEq(ak1, ak4)
LEqToNEqss = LEq(ak1, ak2)NEq(ak1, ak2)
1

Since new key-value pairs are not constrained, this operation will have no arguments in-circuit.

Simple example

Circuit structure, two-column proof

A "proof" is a table that looks like

STATEMENTREASON
STATEMENT1REASON1
STATEMENT2REASON2
...

In other words:

A "proof" is an ordered list of 100 proof-rows.

Each "row" is a pair (statement, reason).

The statement is the statement.

The reason is everything the circuit needs to verify that the statement is true.

Example:

STATEMENT1 = Equals(oldpod, "name", otherpod, "field")

STATEMENT2 = Equals(otherpod, "field", newpod, "result")

STATEMENT3 = Equals(oldpod, "name", newpod, "result")

The reasons in human-readable simplified format:

REASON1 -- "came from previous pod"

REASON2 -- "came from previous pod"

REASON3 -- "use transitive property on STATEMENT1 and STATEMENT2"

What does the reason look like in circuit?

It won't be so simple. I'll just explain what REASON3 has to look like.

First, the operation (deduction rule).

A simple example of a deduction rule

Here is the transitive property of equality, in human-readable form.

if
Equals(a, b) and Equals(b, c)
then
Equals(a, c)

First, we need to decompose all the anchored keys as (key, origin) pairs. This is the frontend description of the deduction rule.

IF
Equals(a_or, a_key, b_or, b_key)
AND
Equals(b_or, b_key, c_or, c_key)
THEN
Equals(a_or, a_key, c_or, c_key)

In-circuit, all these identifiers are replaced with wildcards, which come in numerical order (because they will be used as array indices). So the backend representation is:

IF
Equals( *1, *2, *3, *4 ) and Equals ( *3, *4, *5, *6 )
THEN
Equals( *1, *2, *5, *6 )

What does REASON3 need to look like in-circuit?

  • Repeat deduction rule
IF
Equals( *1, *2, *3, *4 ) and Equals ( *3, *4, *5, *6 )
THEN
Equals( *1, *2, *5, *6 )
  • Say what the wildcards are
*1 -- oldpod
*2 -- "name"
*3 -- otherpod
...
  • Substitute the wildcards into the deduction rule
IF
Equals( oldpod, "name", ... ) ...
Equals( otherpod, "value")
THEN
Equals( oldpod, "name", newpod, ... )
...
  • Say where to find the previous statements (indices in the list), and check that they are above this one.
Statement1
Statement2
  • Check that the input statements match. Check that the output statement matches.

Decomposing anchored keys

Sometimes a deduction rule requires different anchored keys to come from the same POD. Here's an example from Ethdos.

The wildcard system handles this very naturally, since origin ID and key are two separate wildcards.

eth_friend(src_or, src_key, dst_or, dst_key) = and<
    // there is an attestation pod that's a SIGNATURE POD
    ValueOf(attestation_pod, "type", SIGNATURE)     
    
    // the attestation pod is signed by (src_or, src_key)
    Equal((attestation_pod, "signer"), (src_or, src_key))  

    // that same attestation pod has an "attestation"
    Equal((attestation_pod, "attestation"), (dst_or, dst_key))
>

In terms of anchored keys, it would be a little more complicated. five anchored keys show up in this deduction rule:

AK1 = src
AK2 = dst
AK3 = (attestation_pod, "type")
AK4 = (attestation_pod, "signer")
AK5 = (attestation_pod, "attestation")

and we need to force AK3, AK4, AK5 to come from the same origin.

WILDCARD matching takes care of it.

eth_friend(*1, *2, *3, *4) = and<
    // there is an attestation pod that's a SIGNATURE POD
    ValueOf(*5, "type", SIGNATURE)     
    
    // the attestation pod is signed by (src_or, src_key)
    Equal((*5, "signer"), (*1, *2))  

    // that same attestation pod has an "attestation"
    Equal((*5, "attestation"), (*3, *4))
>

Custom statements and custom operations

Users of the POD system can introduce custom predicates (previously called custom statements) to express complex logical relations not available in the built-in predicates. Every custom predicate is defined as the conjunction (AND) or disjunction (OR) of a small number of other statements.

When a custom predicate is introduced in a MainPod, it becomes available for use in that POD and all PODs that inherit1 from it.

On the frontend, a custom predicate is defined as a collection of conjunctions and disjunctions of statements. The definition can be recursive: the definition of a predicate can involve the predicate itself, or the definitions of several predicates can depend on each other.

At the backend level, every definition of a predicate is either a conjunction or a disjunction of statements. To convert a frontend custom predicate to the backend, the middleware may need to introduce sub-predicates.

On the backend, custom predicates are defined in groups. A group can contain one or more custom predicates and their associated sub-predicates. Recursive definition is only possible within a group: the definition of a predicate in a group can only depend on previously existing predicates, itself, and other predicates in the same group.

Arguments of custom predicates

The definition of a custom predicate might also be called an operation or deduction rule. It includes two (or, potentially, say, five) statement templates as conditions. The arguments to the statement templates are decomposed as (origin, key) pairs: if statements are allowed to have arity at most 4, then the statement templates in a deduction rule will have at most 8 arguments (4 origins and 4 keys). The same holds for the output statement.

Each argument (origin or key) to an statement template is either a wildcard or a literal. In the backend, the wildcard arguments will be identified as *1, *2, *3, ....

Examples

See examples

Hashing and predicate IDs

Each custom predicate is defined as part of a group of predicates. The definitions of all statements in the group are laid out consecutively (see examples) and hashed. For more details, see the pages on hashing custom statements and custom predicates.

How to prove an application of an operation

The POD proof format is inspired by "two-column proofs" (for an example, see Wikipedia). A POD contains a "tabular proof", in which each row includes a "statement" and an "operation". The "operation" is the "reason" that justifies the statement: it is everything the circuit needs as a witness to verify the statement.

For a custom statement, the "reason" includes the following witnesses and verifications:

  • the definition of the statement, serialized (see examples)
    • if the statement is part of a group, the definition of the full group, serialized
  • verify that the hash of the definition is the statement ID
  • the definition will have some number of "wildcards" (*1, *2, ...) as arguments to statement templates; a value for each wildcard must be provided as a witness (each will be either an origin ID or key)
  • the circuit must substitute the claimed values for the wildcards, and the resulting statements (true statements with origins and keys) will appear as witnesses
  • the circuit must verify that all the input statement templates (with origins and keys) appear in the previous statements (in higher rows of the table)
  • the circuit also substitutes the claimed values for the wildcards in the output statement, and verifies that it matches the claimed output statement
1

What to call this? One POD "inherits" from another?

Defining custom predicates

(Note: At the moment, we consider a "custom operation" to be exactly the same thing as the "definition of a custom predicate.")

A custom operation 1 is a rule that allows one to deduce a custom statement from one or more existing statements according to a logical rule, described below.

Note: Unlike built-in operations, it is not possible to perform arbitrary calculations inside a custom operation.

The syntax of a custom operation is best explained with an example.

Original example with anchored keys, origins, and keys.

ArgsConditionOutput
pod: Origin,
good_boy_issuers: AnchoredKey::MerkleRoot,
receiver: AnchoredKey
ValueOf(AK(pod, "_type"), SIGNATURE),
Contains(good_boy_issuers, AK(pod,"_signer")),
Equals(AK(pod, "friend"), receiver)
GoodBoy(receiver, good_boy_issuers)

Compiled example with only origins and keys.

ArgsConditionOutput
pod: Origin,
good_boy_issuers_origin: Origin,
good_boy_issuers_key: Key::MerkleRoot,
receiver_origin: Origin,
receiver_key: Key
ValueOf(AK(pod, "_type"), SIGNATURE),
Contains(AK(good_boy_issuers_origin, good_boy_issuers_key), AK(pod,"_signer")),
Equals(AK(pod, "friend"), AK(receiver_origin, receiver_key))
GoodBoy(AK(receiver_origin, receiver_key), AK(good_boy_issuers_origin, good_boy_issuers_key))

A custom operation accepts as input a number of statements (the Condition); each statement has a number of arguments, which may be constants or anchored keys; and an anchored key in turn can optionally be decomposed as a pair of an Origin and a Key.

In the "original example" above, the anchored keys good_boy_issuers and receiver are not broken down, but AK(pod, "_type"), AK(pod, "_signer"), AK(pod, "friend") are. The purpose of breaking them down, in this case, is to force the three anchored keys to come from the same pod.

In the "compiled example", all the anchored keys have been broken down into origins and keys.

In general, in the front-end language, the "arguments" to an operation define a list of identifiers with types. Every statement in the "condition" must have valid arguments of the correct types: either constants, or identifiers defined in the "arguments".

In order to apply the operation, the user who wants to create a POD must give acceptable values for all the arguments. The POD prover will substitute those values for all the statements in the "Condition" and check that all substituted statements previously appear in the POD. If this check passes, the output statement is then a valid statement.

What applying the operation looks like on the back end

On the back end the "compiled example" deduction rule is converted to a sort of "template":

ArgsConditionOutput
*1 (pod),
*2 (good_boy_issuers_origin),
*3 (good_boy_issuers_key),
*4 (receiver_origin),
*5 (receiver_key)
ValueOf(AK(*1, "_type"), SIGNATURE),
Contains(AK(*2, *3), AK(*1,"_signer")),
Equals(AK(*1, "friend"), AK(*4, *5))
GoodBoy(AK(*4, *5), AK(*2, *3))

If you want to apply this deduction rule to prove a GoodBoy statement, you have to provide the following witnesses in-circuit.

  • Copy of the deduction rule
  • Values for *1, *2, *3, *4, *5.
  • Copy of the three statements in the deduction rule with *1, *2, *3, *4, *5 filled in
  • Indices of the three statements ValueOf, Contains, Equals in the list of previous statements.

And the circuit will verify:

  • *1, *2, *3, *4, *5 were correctly substituted into the statements
  • The three statements ValueOf, Contains, Equals do indeed appear at the claimed indices.
1

In previous versions of these docs, "operations" were called "deduction rules".

Ethdos custom predicate, using binary AND and OR: example of a recursive group

eth_dos_distance(src_or, src_key, dst_or, dst_key, distance_or, distance_key) = or<
   eth_dos_distance_ind_0(src_or, src_key, dst_or, dst_key, distance_or, distance_key),
   eth_dos_distance_base(src_or, src_key, dst_or, dst_key, distance_or, distance_key)
   >

eth_dos_distance_base(src_or, src_key, dst_or, dst_key, distance_or, distance_key) = and<
   eq(src_or, src_key, dst_or, dst_key),
   ValueOf(distance_or, distance_key, 0)
   >


eth_dos_distance_ind_0(src_or, src_key, dst_or, dst_key, distance_or, distance_key) = and<
   eth_dos_distance(src_or, src_key, intermed_or, intermed_key, shorter_distance_or, shorter_distance_key)

   // distance == shorter_distance + 1
   ValueOf(one_or, one_key, 1)
   SumOf(distance_or, distance_key, shorter_distance_or, shorter_distance_key, one_or, one_key)

   // intermed is a friend of dst
   eth_friend(intermed_or, intermed_key, dst_or, dst_key)
   >

This group includes three statements.

When the definition is serialized for hashing, the statements are renamed to SELF.1, SELF.2, SELF.3.

With this renaming and the wildcards, the first of the three definitions becomes:

SELF.1( *1, *2, *3, *4, *5, *6 ) = or<
   SELF.2( *1, *2, *3, *4, *5, *6 ) ,
   SELF.3( *1, *2, *3, *4, *5, *6 ) 
   >

and similarly for the other two definitions.

The above definition is serialized in-circuit and hashed with a zk-friendly hash to generate the "group hash", a unique cryptographic identifier for the group.

Then the individual statements in the group are identified as:

eth_dos_distance = groupHASH.1
eth_dos_distance_base = groupHASH.2
eth_dos_distance_ind = groupHASH.3

How to hash a custom predicate

Every predicate, native or custom, is identified on the backend by a predicate ID.

The native predicates are numbered with small integers, sequentially. The ID of a custom predicate is a hash of its definition; this guarantees that two different predicates cannot have the same ID (aside from the miniscule probability of a hash collision).

This document explains in some detail how the definition of a custom predicate is serialized and hashed.

Custom predicates are defined in groups (also known as batches); see an example. The definition of a custom predicate in a group involves other predicates, which may include:

  • native predicates
  • previously-defined custom predicates
  • other predicates in the same group.

Predicate hashing is recursive: in order to hash a group of custom predicates, we need to know IDs for all the previously-defined custom predicates it depends on.

The definition of the whole group of custom predicates is serialized (as explained below), and that serialization is hashed (using a zk-friendly hash -- in the case of the plonky2 backend, Poseidon) to give a group ID. Each predicate in the group is then referenced by

predicate_ID = (group_ID, idx)

(here idx is simply the index of the predicate in the group).

POD types

  • SignedPod
  • MainPod

SignedPod

A SignedPod consists of the following fields:

  • mt: key-values storage, internally it's a MerkleTree so that we can generate proofs on the key-values
    • it can only contain ValueOf statements.
    • the Signer's public key is one of the key-values in the kvs.
  • id: the Root of the kvs MerkleTree
  • signature: a signature over the id
  • signer: the public key attached to the digital signature signature
  • type: the constant SIGNATURE

Example of SignedPod:

{
    id: "05f7a6de...",
    kvs : {
        _signerKey: "5e1c0b3c...",
        idNumber: "94e328a7...",
        dateOfBirth: 1169909384,
        socialSecurityNum: "1111"
    },
    signature: "68b615f7..."
}

MainPOD

Examples

Examples of POD2 use cases

EthDos

Check also the custom statement example section.

Original in prolog https://gist.github.com/ludns/f84b379ec8c53c97b7f95630e16cc39c#file-eth_dos-pl

An EthDos Pod exposes a single custom statement with two custom deduction rules, the inductive case and the base case.

statement eth_dos_distance(src: PubKey, dst: PubKey, distance: Int):
   - OR():
      - AND(attestation_pod: Pod, intermediate: PubKey, n: int):
         - eq(attetation_pod.attestation, dst)
         - eq(attetation_pod.type, SIGNATURE)
         - sum_of(distance, 1, n)
         - eth_dos_distance(src, attetation_pod.signer, n)
      - AND():
         - eq(src, dst)
         - eq(distance, 0)

ZuKYC (classic)

Original using GPC https://github.com/proofcarryingdata/zukyc

Authority public keys:

  • ZOO_GOV: PubKey, issues IDs
  • ZOO_DEEL: PubKey, issues bank statements Authority lists:
  • SANCTION_LIST: Hash, Merkle Tree Root of set of sanctioned public keys
    • values: ["G2345678", "G1987654", "G1657678"] Date values:
  • NOW_MINUS_18Y: Int, 18 years ago
  • NOW_MINUS_1Y: Int, 1 year ago
  • NOW_MINUS_7D: Int, 7 days ago

A ZuKYC Pod exposes a single custom statement with one custom deduction rule.

statement loan_check(receiver: PubKey):
   - OR():
      - AND(gov_id: Pod, paystub: Pod):
         - eq(gov_id.pk, receiver)
         # Not in the sanction list
         - does_not_contain(SANCTION_LIST, gov_id.pk)
         # Valid government-issued ID
         - eq(gov_id.signer, ZOO_GOV)
         - eq(gov_id.type, SIGNATURE)
         # At least 18 years old
         - lt(gov_id.date_of_birth, NOW_MINUS_18Y) # date_of_birdth is more than 18y old
         - eq(paystub.signer, ZOO_DEEL)
         - eq(paystub.type, SIGNATURE)
         - eq(paystub.ssn, gov_id.ssn)
         # At least one year of consistent employment with your current employer
         - lt(paystub.start_date, NOW_MINUS_1Y) # start_date is more than 1y old
         - gt(paystub.issue_date, NOW_MINUS_7D) # issue_date is less than 7d old
         # Annual salary is at least $20,000
         - gt(paystub.annual_salary, 20000)
         # Private key knowledge
         - hash(0, sk, gov_id.pk)
         # Nullifier
         - hash("ZooKyc", sk, nullifier)

ZuKYC (simplified for P1)

This simplified version uses less statements but requires a very similar set of features.

Authority lists:

  • SANCTION_LIST: Hash, Merkle Tree Root of set of sanctioned public keys
    • values: ["G2345678", "G1987654", "G1657678"] Date values:
  • NOW_MINUS_18Y: Int, 18 years ago
  • NOW_MINUS_1Y: Int, 1 year ago

A ZuKYC Pod exposes a single custom statement with one custom deduction rule.

statement loan_check(receiver: string):
   - OR():
      - AND(gov_id: Pod, paystub: Pod):
         - eq(gov_id.id_number, receiver)
         # Not in the sanction list
         - does_not_contain(SANCTION_LIST, gov_id.id_number)
         # Valid government-issued ID
         - reveal(gov_id.signer)
         - eq(gov_id.type, SIGNATURE)
         # At least 18 years old
         - lt(gov_id.date_of_birth, NOW_MINUS_18Y) # date_of_birdth is more than 18y old
         - reveal(paystub.signer)
         - eq(paystub.type, SIGNATURE)
         - eq(paystub.ssn, gov_id.ssn)
         # At least one year of consistent employment with your current employer
         - lt(paystub.start_date, NOW_MINUS_1Y) # start_date is more than 1y old

GreatBoy

A Good Boy Pod exposes one custom statement with one custom deduction rule.

statement is_good_boy(user: PubKey, good_boy_issuers: MerkleTree):
   - OR():
      - AND(pod: Pod, age: Int):
         - eq(pod.type, SIGNATURE)
         - contains(good_boy_issuers, pod.signer)
         # A good boy issuer says this user is a good boy
         - eq(pod.user, user)
         - eq(pod.age, age)

A Friend Pod exposes one custom statement with one custom deduction rule.

statement is_friend(good_boy: PubKey, friend: PubKey, good_boy_issuers: MerkleTree):
   - OR():
      - AND(friend_pod: Pod):
         - eq(pod.type, SIGNATURE)
         # The issuer is a good boy
         - is_good_boy(good_boy, good_boy_issuers)
         # A good boy says this is their friend
         - eq(pod.signer, good_boy)
         - eq(pod.friend, friend)

A Great Boy Pod exposes (in addition to the above) one new custom statement with one custom deduction rule.

statement is_great_boy(great_boy: PubKey, good_boy_issuers: MerkleTree):
   - OR():
      - AND(friend_pod_0: Pod, friend_pod_1: Pod):
         # Two good boys consider this user their friend
         - is_friend(friend_pod_0.signer, great_boy)
         - is_friend(friend_pod_1.signer, great_boy)
         # good boy 0 != good boy 1
         - neq(friend_pod_0.signer, friend_pod_1.signer)

Attested GreatBoy

An Attested Great Boy Pod is like a Great Boy Pod, but the names of the signers are revealed.

statement is_great_boy(great_boy: PubKey, friend0: String, friend1: String, good_boy_issuers: MerkleTree):
   - OR():
      - AND(friend_pod_0: Pod, friend_pod_1: Pod):
         # Two good boys consider this user their friend
         - is_friend(friend_pod_0.signer, great_boy)
         - is_friend(friend_pod_1.signer, great_boy)
         # good boy 0 != good boy 1
         - neq(friend_pod_0.signer, friend_pod_1.signer)
         # publicize signer names
         - value_of(friend_pod_0.name, friend0)
         - value_of(friend_pod_1.name, friend1)

To produce a Great Boy Pod, you need two Friend Pods, friend_pod0 and friend_pod1, each of which reveals its signer.

Tracking PodIDs: Posts and comments

The goal of this example is to model a social network, where posts and comments are pods.

A Post is a signature pod with the following fields:

content: String
poster: String
signer: PubKey
timestamp: Int

A Comment is a signature pod with the following fields:

content: String
referenced_post: PodID
signer: PubKey
timestamp: Int

A post is popular if it has at least two comments from different signers.

statement is_popular(post: PodID):
   - AND():
      - IsEqual(comment1.referenced_post, post)
      - IsEqual(comment2.referenced_post, post)
      - NotEqual(comment1.signer, comment2.signer)

Multiple people over 18

Suppose I want to prove that two different people are over 18, and a third person is under 18, using the custom predicates over_18 and under_18.

statement over_18(age):
   - AND():
      - ValueOf(eighteen, 18)
      - GEq(age, eighteen)
statement under_18(age):
   - AND():
      - ValueOf(eighteen, 18)
      - Lt(age, eighteen)

With wildcards:

statement over_18(*1, *2):
   - AND():
      - ValueOf(*3, *4, 18)
      - GEq(*1, *2, *3, *4)

Maybe I have two input pods gov_id1 and gov_id2, and I want to prove that these pods refer to two different people, both of whom are over 18; and a third pods gov_id3 refers to someone under 18. So in my public output statements, I want to have:

IsUnequal(gov_id1.name, gov_id2.name)
over_18(gov_id1.age)
over_18(gov_id2.age)
under_18(gov_id3.age).

I would prove this with the following sequence of deductions:

StatementReason
ValueOf(local_eighteen, 18)(new entry)
over_18(gov_id1.age)over_18,
*1 = _SELF,
*2 = "local_eighteen",
*3 = gov_id1,
*4 = "age"
over_18(gov_id2.age)over_18,
*1 = _SELF,
*2 = "local_eighteen",
*3 = gov_id2,
*4 = "age"
under_18(gov_id3.age)under_18,
*1 = _SELF,
*2 = "local_eighteen",
*3 = gov_id3,
*4 = "age"
IsUnequal(gov_id1.name, gov_id2.name)(is unequal from entries)

Architecture

This document explains the architecture of the current implementation.

The main logic of the POD2 implementation is divided into three modules:

  • frontend
    • compiles user-friendly pod declarations into intermediate representations to be consumed by the backend
    • internally connects to the backend to get pods built (signed / proved).
    • presents pods to the user
  • middleware
    • defines the intermediate representation of Statements, Operations and interfaces of PODs
    • Statements and Operations are strongly typed here
    • Both frontend and backend use types defined in the middleware
    • Does not import types from frontend nor backend
  • backend
    • takes a middleware POD request representation, signs/proves it and returns a generic POD object

If this was the Rust language compiler:

  • frontend: takes a Rust code and compiles it to LLVM-IR
  • middleware: defines LLVM-IR instructions and blocks
  • backend: Takes LLVM-IR instructions and emits assembly code for a particular CPU

The following diagram shows visually how the components interact with each other:

In this organization, the middleware could be defined at arbitrary points:

  • closer to the user would be more high level
  • closer to the target would be more low level

All these positions are OK. We just need to choose one, and we can try to choose a point that simplifies the implementation.

For example in the middleware we could define Value = 4 x Goldilock (making it slightly low level); or Value = BigUint and letting the backend choose the maximum representable value, the field encoding, etc. (making it slightly higher level).

In the current iteration we choose Value = 4 x Goldilock, but we can revisit it in a future iteration (eg. if we want to support plonky3) by either moving the middleware to a higher level, or by keeping it the same and replacing the Value definition.

The diagram above includes an arrow that would show the typical flow followed by a user making a POD. This is a simplified description of the process.

  1. The user interacts with the frontend API and passes a list of Operations. The frontend takes those operations and generates the corresponding Statements. The list of Operations and Statements are transformed into middleware types. This process can be seen as a compilation step. The frontend sends this middleware data as a request to the Backend.
  2. The backend receives a request to build a POD from a list of Statements and Operations. It takes that bundle of data and lays it out in the appropriate format to be proved by a circuit, padding unused slots, etc. Then it calls a proof system API to generate a proof.
  3. The target (proof system) generates a proof from some circuit description and witness data and gives it back to the backend.
  4. The backend receives the proof and encapsulates it in an object that adheres to the Pod trait and passes it to the frontend
  5. The frontend receives a "blackbox" Pod object and wraps it in a presentation layer in order to show it to the user.