Arithmetic circuits in Rust

15-12-2024

11 minute read

Acknowledgments

This post and the arithmetic-circuits crate were developed at NP-Labs as part of our implementation of the Ligero Polynomial Interactive Oracle Proof (PIOP) system. This work was a collaborative effort between Antonio Mejías Gil, Marcin Górny, and the author.

Introduction

Arithmetic circuits are a powerful way to represent polynomial expressions using a sequence of addition and multiplication operations, conceptualized as gates. Each gate takes two previously computed expressions as input and outputs their sum or product. Formally, given a field $\mathbb{F}$, an arithmetic circuit $C$ is a directed acyclic graph (DAG) with a set of nodes $V$ and a set of edges $E$. Each node $v \in V$ is either a constant, a variable, an addition gate or a multiplication gate. Constants and variables are nodes with no incoming edges and represent assigned or unknown values in $\mathbb{F}$, and addition and multiplication gates are nodes with two incoming edges.

Note that in the canonical definition of an arithmetic circuit, the addition and multiplication gates may have an arbitrary number of incoming edges, but this is not the case in our implementation.

We do not require nodes to have fan-out 1, so we can have multiple edges leaving the same node. Furthermore, we allow two edges leaving a node to be directed to the same node. For instance, consider the arithmetic expression $2x + y^2$. We can represent this expression using the following arithmetic circuit:

2x + y^2

Implementation

For our implementation, we represent an arithmetic circuit as a list of nodes. Addition and multiplication gates are represented as structs (i.e., Add and Mul structs) with two fields: left and right, which are the indices of the left and right input nodes of the gate in the circuit list, respectively. Variables are represented as empty structs. Constants are represented as elements of the prime field F. The enum Node serves as a tagged union type for the different kinds of nodes in the circuit.

The ArithmeticCircuit struct contains the list of nodes, as well as two HashMaps. The first one maps each constant value to its index in the node list to avoid duplicating constants. The second one maps each variable name to its indices in the node list, which allows for easily looking up the index of a variable in the circuit.

Construction

In order to directly construct an arithmetic circuit, the user must instantiate an empty ArithmeticCircuit struct and mutate it via its public methods (e.g. new_variable, constant, add, mul), each of which returns the index of the new node in the array. The following code constructs the arithmetic circuit in the figure above:

1
2
3
4
5
6
7
let mut circuit = ArithmeticCircuit::new();
let two = circuit.constant(F::from(2));
let x = circuit.new_variable();
let two_x = circuit.mul(two, x);
let y = circuit.new_variable();
let y2 = circuit.mul(y, y);
let result = circuit.add(two_x, y2);

The construction logic yields a list of nodes in a post-order depth-first search (DFS) traversal of the circuit DAG. That is, we visit the left and right children of a node before visiting the node itself. Note that the post-order DFS traversal representation is not unique, so there may be several DFS trees enumerating the same circuit. For instance, the arithmetic circuit above admits the following two traversals $(a)$ and $(b)$, traversal $(a)$ corresponds to the code above, and traversal $(b)$ is the product of swapping lines 4 and 5:

Traversals

Several utility methods are provided to ease the process of circuit construction, such as new_variables, add_nodes and mul_nodes to operate on multiple variables/gates at once or pow and pow_bigint to efficiently computing large powers in $\mathbb{F}$ in a square-and-multiply fashion, to name a few.

Evaluation

The arithmetic circuit can be evaluated in several ways, depending on how the user wants to reference variables and what information is needed from the evaluation process. All evaluation methods receive either an indexation or a labeling of the desired variable assignment $x \in \mathbb{F}^n$, a list of length $n$ containing tuples of the form $(k, v)$ where $k$ is either the index of the variable in the circuit or a label given by the user and $v \in \mathbb{F}$ is the value assigned to the variable.

For basic evaluation of the circuit output, which defaults to the value of the final node, the user can either reference variables by their index or label:

// Evaluate using variable indices
let by_index = circuit.evaluate(vec![(1, F::from(2)), (3, F::from(3))]);

// Evaluate using variable labels
let by_label = circuit.evaluate_with_labels(vec![("x", F::from(2)), ("y", F::from(3))]);

When the user wants to evaluate a specific node in the circuit rather than the final output:

// Evaluate intermediate node by index
let by_index = circuit.evaluate_node(
    vec![(1, F::from(2)), (3, F::from(3))], 
    intermediate_node_idx
);

// Same operation using labels
let by_label = circuit.evaluate_node_with_labels(
    vec![("x", F::from(2)), ("y", F::from(3))], 
    intermediate_node_idx
);

For debugging or analysis purposes, the user can obtain the complete evaluation trace up to a specific node. The trace is a list of Options, where each Some(v) corresponds to the value of a node in the circuit and None corresponds to a node that was not needed in the recursive evaluation process.

// Get evaluation trace showing intermediate values
let by_index = circuit.evaluation_trace(
    vec![(1, F::from(2)), (3, F::from(3))],
    target_node
);

let by_label = circuit.evaluation_trace_with_labels(
    vec![("x", F::from(2)), ("y", F::from(3))],
    target_node
);

The evaluation process follows a recursive pattern, computing each node’s value only after its dependencies (left and right children) have been evaluated.

R1CS and Multiple outputs

From this point on we assume $\mathbb{F}$ is a prime field of order $p$ with multiplicative group $\mathbb{F}^{\times}$.

Our implementation provides functionality to construct arithmetic circuits from Rank-1 Constraint Systems (R1CS). The conversion is facilitated through integration with the ConstraintSystem trait from the arkworks relations crate, enabling compatibility with R1CS instances generated from Circom circuits.

Let $\mathbf{A}$, $\mathbf{B}$, and $\mathbf{C}$ be matrices in $\mathbb{F}^{m \times n}$. An R1CS instance consists of these three matrices and defines a system of constraints where, for each row $i \in [m]$: $$ \begin{equation} \langle \mathbf{A}_i, \mathbf{w} \rangle \cdot \langle \mathbf{B}_i, \mathbf{w} \rangle = \langle \mathbf{C}_i, \mathbf{w} \rangle \end{equation} $$ where $\mathbf{w} = (\mathbf{x}, \mathbf{s})$ is the concatenation of the instance vector $\mathbf{x}$ and the witness vector $\mathbf{s}$, and $\langle \cdot, \cdot \rangle$ denotes the inner product.

The computational expressiveness of arithmetic circuits and R1CS is equivalent. While the transformation from an arithmetic circuit to its corresponding R1CS formulation follows a deterministic and well-defined procedure, the inverse transformation — converting an R1CS instance to a single-output arithmetic circuit — presents significant theoretical and implementation challenges if we restrict ourselves to the single-output case (more on this below). Formally, given an R1CS instance $(\mathbf{A}, \mathbf{B}, \mathbf{C})$, we want to construct an arithmetic circuit $C$ such that for any assignment $\mathbf{w} = (\mathbf{x}, \mathbf{s})$, $C(\mathbf{w}) = 1$ if and only if relation $(1)$ is satisfied, otherwise $C(\mathbf{w}) = 0$.

To illustrate why the transformation is challenging, consider a trivial equality constraint $x = 0$, that is, any assignment $\mathbf{w}$ contains a single variable $x$ and $\mathbf{A}, \mathbf{B}, \mathbf{C}$ contain a single entry with value $1$. Its corresponding arithmetic circuit must be equivalent to a polynomial $P(x) \in \mathbb{F}[x]$ such that $P(x) = 1$ if and only if $x = 0$. Note that for every element $a \in \mathbb{F}^{\times}$, $a^{p-1} = 1$, and thus one possible solution is $P(x) = 1 - x^{p-1}$, furthermore any other solution must be of degree at least $p-1$.

Exponent tower

We can extend the previous example to $m > 1$ constraints in the following manner: for each constraint $i \in [m]$, compute the value $$ y_i := \langle \mathbf{A}_i, \mathbf{w} \rangle \cdot \langle \mathbf{B}_i, \mathbf{w} \rangle - \langle \mathbf{C}_i, \mathbf{w} \rangle $$

That is, $y_i = 0$ if and only if the $i$-th constraint is satisfied. Then, compute $\tilde{y_i} := 1 - y_i^{p-1}$ and let $\sigma = \prod_{i=1}^m \tilde{y_i}$. One can easily verify that $\sigma = 1$ if and only if all constraints are satisfied.

This approach has significant practical implications. Note that any arithmetic circuit implementing this R1CS must compute $y_i^{p-1}$ for each equality check, which requires $\mathcal{O}(\log(p-1))$ multiplication gates using an optimal square-and-multiply strategy. For the elliptic curves commonly used in zero-knowledge proof systems, such as BN254 or BLS12-381, the field characteristic $p$ is a prime number of 254 and 381 bits, respectively. Therefore, even a single equality constraint would require hundreds of multiplication gates.

This computational overhead becomes prohibitive when dealing with real-world R1CS instances that typically contain thousands or millions of constraints. For instance, a modest R1CS system to prove a poseidon hash on BN-254 has approximately 300 constraints, which would require approximately 100000 additional multiplication gates using the single-output formulation. This limitation motivates the use of multi-output arithmetic circuits, where each constraint can be represented as a separate output without the need for reducing the verification to a single output. We also relax the requirement that the circuit must output a boolean value, and simply require that the circuit outputs $1$ if and only if the constraint is satisfied. In this formulation, the circuit $C$ produces $m$ outputs, one for each constraint, where the $i$-th output corresponds to $1 + y_i$.

Arithmetic Expressions

While the ArithmeticCircuit interface provides precise control over circuit construction, its method-based API can become cumbersome when working with complex mathematical expressions. Consider constructing a polynomial like $(x + y)^3 + 2xy$. Using the direct circuit interface would require carefully tracking node indices and making multiple method calls:

let mut circuit = ArithmeticCircuit::new();
let x = circuit.new_variable("x");
let y = circuit.new_variable("y");
let x_plus_y = circuit.add(x, y);
let squared = circuit.mul(x_plus_y, x_plus_y);
let cubed = circuit.mul(squared, x_plus_y);
let xy = circuit.mul(x, y);
let two = circuit.constant(F::from(2));
let two_xy = circuit.mul(two, xy);
let result = circuit.add(cubed, two_xy);

To provide a more natural and intuitive experience, we implemented an Expression type that allows users to write arithmetic expressions using familiar mathematical notation. The same polynomial can be expressed as:

let x = Expression::variable("x");
let y = Expression::variable("y");
let result = (x.clone() + y.clone()).pow(3) + 2 * x * y;

This high-level interface not only makes the code more readable and maintainable but also reduces the likelihood of errors in complex circuit construction. Under the hood, the Expression type handles all the complexity of managing circuit nodes and their relationships, while ensuring efficient circuit generation through careful handling of shared subexpressions. Similarly, the Expression type provides a rich set of operations equivalent to those of the ArithmeticCircuit interface, such as addition, multiplication, subtraction, and exponentiation. When the expression is finally converted to a circuit with to_arithmetic_circuit(), it automatically optimizes the structure by eliminating redundant computations and sharing common subexpressions. We expand on the inner workings of the conversion process in the following paragraphs.

The implementation consists of two key components: an inner enum that represents the expression tree, and a wrapper struct that enables arithmetic operations overloading (Add, Mul…):

enum ExpressionInner<F: PrimeField> {
    Variable(String),
    Constant(F),
    Add(Expression<F>, Expression<F>),
    Mul(Expression<F>, Expression<F>),
}

pub struct Expression<F: PrimeField>(Rc<ExpressionInner<F>>);

The wrapper struct Expression uses the New Type pattern, which is necessary to implement operator traits that we can’t directly implement on the foreign type Rc<ExpressionInner<F>>. The interface supports various ways to construct and combine expressions:

// Creating variables and constants
let x = Expression::variable("x");  // Variable with string identifier
let c = Expression::constant(F::from(5));  // Constant from field element

// Basic arithmetic operations
let sum = x.clone() + y.clone(); 
let product = x.clone() * y.clone();  
let diff = x - y;  

// Integer constants can be used on the left side
let scaled = 3 * x;  // Equivalent to F::from(3) * x
let offset = 1 + x;  // Equivalent to F::from(1) + x

// Field elements can be used on either side
let scaled_f_left = F::from(3) * x;  
let scaled_f_right = x * F::from(3);

The implementation includes careful operator overloading to provide a natural syntax while maintaining type safety. For example, the Add trait implementation looks like this:

impl<F: PrimeField> Add for Expression<F> {
    type Output = Expression<F>;

    fn add(self, rhs: Expression<F>) -> Self::Output {
        Expression(Rc::new(ExpressionInner::Add(self.clone(), rhs.clone())))
    }
}

// Support for adding field elements on the right
impl<F: PrimeField> Add<F> for Expression<F> {
    type Output = Expression<F>;
    
    fn add(self, rhs: F) -> Self::Output {
        self + Expression::constant(rhs)
    }
}

// Support for adding integers on the left
impl<F: PrimeField> Add<Expression<F>> for i32 {
    type Output = Expression<F>;
    
    fn add(self, rhs: Expression<F>) -> Self::Output {
        self + Expression::constant(F::from(self))
    }
}

To efficiently manage the underlying circuit structure, the implementation leverages Rust’s Rc (Reference Counter) smart pointer. This ensures that when sub-expressions are reused in multiple places, we don’t duplicate the corresponding gates in the final circuit. For example, in the expression (x + 1) * (x + 1), the sub-expression (x + 1) is referenced twice but only generates one set of circuit gates.

let x = Expression::variable("x");
let term = x.clone() + F::from(1);
let expr = term.clone() * term;  // (x + 1)² shares the subexpression

In this case, term is reference counted, so both uses in the multiplication point to the same expression tree, ensuring that when converted to a circuit, the computation of x + 1 is only performed once.

In order to convert an Expression to an ArithmeticCircuit, the to_arithmetic_circuit method performs a post-order traversal of the expression tree, recursively converting each sub-expression to its corresponding circuit representation. The method performs the following steps:

Creates a topologically sorted list of nodes from the expression tree.
Converts the sorted nodes into circuit nodes while maintaining references.
Organizes variables and constants into HashMaps for efficient lookup.

Future work

The code is available here, but it is currently in the process of being made available as an independent arkworks crate following the merge of this PR. This implementation is a work-in-progress, and several potential improvements and extensions could be explored, for example:

Parallel Evaluation: The current evaluation process follows a sequential pattern. Given that arithmetic circuits are inherently DAGs, it is reasonable to expect that parallel evaluation strategies could be implemented to improve performance on multi-core systems.
Circuit Optimization and Gate Reduction: While our implementation handles basic optimizations like constant folding and subexpression sharing, there is no current support for gate reduction through algebraic simplification, that is, an expression like $(x + 1) - (x + 1)$ should be equivalent to the constant zero. While this is a simple example, the following improvements would be non-trivial:
- Common subexpression elimination across different expression branches
- Strength reduction of exponentiation gates
- Detection and removal of redundant gates

We welcome contributions to the implementation once the crate is standardized and merged into the snark repository as part of the arkworks project.

Author: César Descalzo