feat: added detailed readme for compiler

This commit is contained in:
shreyas-londhe
2025-04-28 11:11:40 +05:30
parent cc9172089e
commit ebc6a2f2c5
4 changed files with 214 additions and 14 deletions

70
README.md Normal file
View File

@@ -0,0 +1,70 @@
# ZK-Regex: Verifiable Regular Expressions in Arithmetic Circuits
`zk-regex` enables proving regular expression matching within zero-knowledge circuits. It compiles standard regex patterns into circuit-friendly Non-deterministic Finite Automata (NFAs) and generates corresponding circuit code for **[Circom](https://docs.circom.io/)** and **[Noir](https://noir-lang.org/)** proving systems.
This allows developers to build ZK applications that can verifiably process or validate text based on complex patterns without revealing the text itself.
## Key Features
- **Regex Compilation:** Converts standard regular expression syntax into NFAs optimized for ZK circuits.
- **Circuit Generation:** Automatically generates verifiable circuit code for:
- [Circom](https://docs.circom.io/)
- [Noir](https://noir-lang.org/)
- **Helper Libraries:** Provides supporting libraries and circuit templates for easier integration into Circom and Noir projects.
- **Underlying Tech:** Leverages the robust Thompson NFA construction via the Rust [`regex-automata`](https://github.com/rust-lang/regex/tree/master/regex-automata) crate.
## Project Structure
The project is organized into the following packages:
- **`compiler/`**: The core Rust library responsible for parsing regex patterns, building NFAs, and generating circuit code. See [compiler/README.md](./compiler/README.md) for API details and usage.
- **`circom/`**: Contains Circom templates and helper circuits required to use the generated regex verification circuits within a Circom project. See [circom/README.md](./circom/README.md) for integration details.
- **`noir/`**: Contains Noir contracts/libraries required to use the generated regex verification logic within a Noir project. See [noir/README.md](./noir/README.md) for integration details.
## High-Level Workflow
1. **Define Regex:** Start with your standard regular expression pattern.
```json
{
"parts": [
{ "Pattern": "(?:\r\n|^)subject:" },
{ "PublicPattern": ["[a-z]+", 128] },
{ "Pattern": "\r\n" }
]
}
```
2. **Compile & Generate Circuit:** Use the `zk-regex-compiler` library to compile the pattern and generate circuit code for your chosen framework (Circom or Noir).
```rust
// Simplified example - see compiler/README.md for full usage
use zk_regex_compiler::{gen_from_raw, ProvingFramework};
let parts = Vec::new();
parts.push(RegexPart::Pattern("(?:\\r\\n|^)subject:".to_string()));
parts.push(RegexPart::PublicPattern(("([a-z]+)".to_string(), 128)));
parts.push(RegexPart::Pattern("\r\n".to_string()));
let decomposed_config = DecomposedRegexConfig { parts };
let (nfa, circom_code) = gen_from_decomposed(parts, "MyRegex", ProvingFramework::Circom)?;
// Save or use circom_code
```
3. **Integrate Circuit:** Include the generated code and the corresponding helper library ([`zk-regex-circom`](./circom/README.md) or [`zk-regex-noir`](./noir/README.md)) in your ZK project.
4. **Generate Inputs:** Use the `zk-regex-compiler`'s [`gen_circuit_inputs`](./compiler/README.md#gen_circuit_inputsnfa-nfagraph-input-str-max_haystack_len-usize-max_match_len-usize-proving_framework-provingframework---resultproverinputs-compilererror) function to prepare the private and public inputs for your prover based on the text you want to match.
5. **Prove & Verify:** Run your ZK proving system using the generated inputs and circuit. The proof demonstrates that the (private) text matches the (public) regex pattern.
## Installation
Installation details depend on which part of the project you need:
- **Compiler:** If using the compiler directly in a Rust project, add it to your `Cargo.toml`. See [compiler/README.md](./compiler/README.md).
- **Circom Helpers:** See [circom/README.md](./circom/README.md) for instructions on integrating the Circom templates.
- **Noir Helpers:** See [noir/README.md](./noir/README.md) for instructions on adding the Noir library dependency.
## Contributing
Contributions are welcome! Please follow standard Rust development practices. Open an issue to discuss major changes before submitting a pull request.
## License
This project is licensed under the [Specify License Here - e.g., MIT License or Apache 2.0].

View File

@@ -1,15 +1,3 @@
# circom
# zk-Regex Circom
To install dependencies:
```bash
bun install
```
To run:
```bash
bun run index.ts
```
This project was created using `bun init` in bun v1.2.5. [Bun](https://bun.sh) is a fast all-in-one JavaScript runtime.
This package provides the necessary Circom templates to integrate regex verification logic generated by the `zk-regex-compiler` into your Circom projects.

139
compiler/README.md Normal file
View File

@@ -0,0 +1,139 @@
# ZK-Regex Compiler
This package contains the core Rust library for compiling regular expressions into circuit-friendly Non-deterministic Finite Automata (NFAs) and generating circuit code for Circom and Noir.
It uses the [`regex-automata`](https://github.com/rust-lang/regex/tree/master/regex-automata) crate to parse regex patterns and construct Thompson NFAs, which are then processed to create structures suitable for arithmetic circuits.
## Core API
The main functionalities are exposed through the [`lib.rs`](./src/lib.rs) file:
- **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`**
- Parses the input regex `pattern` string.
- Builds an internal NFA representation ([`NFAGraph`](./src/types.rs)).
- Returns the `NFAGraph` or a [`CompilerError::RegexCompilation`](./src/error.rs) if the pattern is invalid.
- **`gen_from_raw(pattern: &str, max_bytes: Option<Vec<usize>>, template_name: &str, proving_framework: ProvingFramework) -> Result<(NFAGraph, String), CompilerError>`**
- Compiles a raw regex `pattern` string directly into circuit code.
- `max_bytes`: Optional vector specifying maximum byte lengths for each capture group. If `None`, defaults might be used or capture groups might not be specifically handled (verify this behavior).
- `template_name`: A name used for the main template/contract in the generated code (e.g., Circom template name).
- `proving_framework`: Specifies the target output ([`ProvingFramework::Circom`](./src/types.rs#L23) or [`ProvingFramework::Noir`](./src/types.rs#L23)).
- Returns a tuple containing the compiled [`NFAGraph`](./src/nfa/mod.rs#L32) and the generated circuit code as a `String`, or a [`CompilerError`](./src/error.rs#L5).
- **`gen_from_decomposed(config: DecomposedRegexConfig, template_name: &str, proving_framework: ProvingFramework) -> Result<(NFAGraph, String), CompilerError>`**
- Constructs a regex pattern by combining parts defined in the `config` (of type [`DecomposedRegexConfig`](./src/types.rs#L15)).
- Generates circuit code similarly to `gen_from_raw`.
- Useful for building complex regex patterns programmatically.
- Returns a tuple containing the compiled [`NFAGraph`](./src/nfa/mod.rs#L32) and the generated circuit code as a `String`, or a [`CompilerError`](./src/error.rs#L5).
- _(Note: Requires understanding the structure of [`DecomposedRegexConfig`](./src/types.rs#L15))_
- **`gen_circuit_inputs(nfa: &NFAGraph, input: &str, max_haystack_len: usize, max_match_len: usize, proving_framework: ProvingFramework) -> Result<ProverInputs, CompilerError>`**
- Generates the necessary inputs for the prover based on the compiled [`nfa`](./src/nfa/mod.rs#L32), the `input` string to match against, and circuit constraints.
- `max_haystack_len`: The maximum length of the input string allowed by the circuit.
- `max_match_len`: The maximum length of the regex match allowed by the circuit.
- `proving_framework`: Specifies for which framework ([`Circom`](./src/types.rs#L23) or [`Noir`](./src/types.rs#L23)) the inputs should be formatted.
- Returns a [`ProverInputs`](./src/types.rs#L33) struct (containing formatted public and private inputs) or a [`CompilerError::CircuitInputsGeneration`](./src/error.rs).
- _(Note: Requires understanding the structure of [`ProverInputs`](./src/types.rs#L33) for the specific framework)_
## Usage Examples (Rust)
Add this crate to your `Cargo.toml`:
```toml
[dependencies]
zk-regex-compiler = { git = "https://github.com/zkemail/zk-regex", package = "compiler" }
```
**Example 1: Compile a simple regex to NFA**
```rust
use zk_regex_compiler::{compile, CompilerError};
fn main() -> Result<(), CompilerError> {
let pattern = r"^a+b*$";
let nfa = compile(pattern)?;
println!("Successfully compiled regex to NFA with {} states.", nfa.states().len());
// You can now inspect the nfa graph structure
Ok(())
}
```
**Example 2: Generate Circom Code**
```rust
use zk_regex_compiler::{gen_from_raw, ProvingFramework, CompilerError};
fn main() -> Result<(), CompilerError> {
let pattern = r"(a|b){2,3}";
let template_name = "ABRegex";
let (nfa, circom_code) = gen_from_raw(pattern, None, template_name, ProvingFramework::Circom)?;
println!("Generated Circom Code:\n{}", circom_code);
// Save circom_code to a .circom file or use it directly
Ok(())
}
```
**Example 3: Generate Noir Code**
```rust
use zk_regex_compiler::{gen_from_raw, ProvingFramework, CompilerError};
fn main() -> Result<(), CompilerError> {
let pattern = r"\d{3}-\d{3}-\d{4}"; // Example: Phone number
let template_name = "PhoneRegex";
let (nfa, noir_code) = gen_from_raw(pattern, None, template_name, ProvingFramework::Noir)?;
println!("Generated Noir Code:\n{}", noir_code);
// Save noir_code to a .nr file or integrate into a Noir project
Ok(())
}
```
**Example 4: Generate Circuit Inputs**
```rust
use zk_regex_compiler::{compile, gen_circuit_inputs, ProvingFramework, CompilerError};
fn main() -> Result<(), CompilerError> {
let pattern = r"abc";
let nfa = compile(pattern)?;
let input_str = "test abc test";
let max_haystack_len = 64; // Must match circuit parameter
let max_match_len = 16; // Must match circuit parameter
// Generate inputs for Circom
let circom_inputs = gen_circuit_inputs(&nfa, input_str, max_haystack_len, max_match_len, ProvingFramework::Circom)?;
println!("Circom Inputs: {:?}", circom_inputs); // Need to format/serialize ProverInputs
// Generate inputs for Noir
let noir_inputs = gen_circuit_inputs(&nfa, input_str, max_haystack_len, max_match_len, ProvingFramework::Noir)?;
println!("Noir Inputs: {:?}", noir_inputs); // Need to format/serialize ProverInputs
Ok(())
}
```
## Error Handling
The library uses the [`CompilerError`](./src/error.rs) enum to report issues:
- `RegexCompilation(String)`: An error occurred during regex parsing or NFA construction (from [`regex-automata`](https://github.com/rust-lang/regex/tree/master/regex-automata)).
- `CircuitGeneration(String)`: An error occurred during the generation of Circom or Noir code.
- `CircuitInputsGeneration(String)`: An error occurred while generating prover inputs for a given string.
Match on the enum variants to handle errors appropriately.
## Building & Testing
Navigate to the `compiler/` directory and use standard Cargo commands:
```bash
cargo build --release
cargo test
```

3
noir/README.md Normal file
View File

@@ -0,0 +1,3 @@
# zk-Regex Noir Library
This package provides the necessary Noir libraries/contracts and helper functions to integrate regex verification logic generated by the `zk-regex-compiler` into your Noir projects.