Feat/new compiler by shreyas-londhe · Pull Request #92 · zkemail/zk-regex

shreyas-londhe · 2025-03-28T15:39:56Z

Note

Replaces the old Rust compiler and extensive Circom tests with a new Bun/TypeScript scripts suite (gen-regex, gen-inputs), shared utilities, and Jest-based tests.

Tooling/Scripts (new):
- Add Bun/TypeScript scripts for generating regex artifacts: circom/scripts/gen-regex.ts, noir/scripts/gen-regex.ts, noir/scripts/gen-inputs.ts.
- Introduce shared utilities (utils/*) for logging, file ops, subprocess, and types.
- Add project config: tsconfig.json, jest.config.ts, jest.setup.ts, bun.lock, and package.json.
Tests (new):
- Add Jest integration and unit tests under scripts/__tests__ for script workflows and utilities.
Compiler (removed):
- Remove legacy Rust compiler package (packages/compiler/*).
Circom Tests (removed):
- Remove numerous legacy Circom test circuits and test cases under packages/circom/tests/*.

^{Written by Cursor Bugbot for commit 3d37c31. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

New Features
- Many new regex verification circuits for email headers/addresses, plus a compiler and WASM bindings to generate Circom/Noir circuits with capture extraction.
Infrastructure & Build
- CI workflow replaced; new test workflow and pre-push hook to keep generated templates in sync; new project/package and TypeScript configs for tooling.
Documentation
- Major README rewrite and new docs for compiler, Circom, and Noir with usage, troubleshooting, and developer guidance.
Tests
- End-to-end circuit tests, sample inputs, and generation scripts for multiple regexes.
Chores
- Lint/format configs removed and .gitignore/workspace manifests updated.

socket-security · 2025-03-31T07:26:18Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Package	Supply Chain Security	Quality	Maintenance
npm/@types/bun@1.3.10
npm/@zk-email/circuits@6.4.1-alpha.0
npm/circom_tester@0.0.20 ⏵ 0.0.24	⁺⁸	^-1	^-2
cargo/thiserror@2.0.12
npm/@types/node@22.13.0 ⏵ 20.19.35
npm/@types/node@22.13.0 ⏵ 22.19.13	⁺¹	⁺¹
cargo/serde@1.0.219
cargo/serde_json@1.0.140
npm/bun-types@1.3.10
cargo/wasm-bindgen@0.2.100
npm/ts-jest@29.4.6
npm/typescript@5.9.3
npm/prettier@3.8.1
cargo/heck@0.4.1
cargo/regex-syntax@0.8.5
cargo/heck@0.5.0
cargo/regex-automata@0.4.7
cargo/clap@4.2.1
cargo/serde-wasm-bindgen@0.6.5

View full report

- Add result saving to JSON files in results/ directory - Add formatted metrics display in console output - Reduce default run counts for faster iteration during development - Fix Circom v2 placeholder stats to include all TimingStats fields - Refactor Noir provider to parse bb gates JSON output directly - Fix capture group types to use number[][] for multi-group patterns

- Implement collect-results.ts to aggregate individual benchmark files into comparison.json with hardware specs and tool versions - Add Table 3 (Noir Backend Performance) to LaTeX output with ACIR opcodes, gates, prove/verify times, and proof size - Add Table 4 (Scaling Analysis) showing metrics across input lengths - Add helper functions for formatting and pattern extraction - Generate both markdown and LaTeX outputs with siunitx formatting

Display a formatted table summary when collecting benchmark results, showing the exact regex pattern and test input string used for each benchmark. This improves visibility into benchmark runs without needing to open JSON/Markdown output files. Changes: - Add PatternMetadata interface for pattern display data - Load regex patterns from graph JSON files - Add sample input strings for all benchmark patterns - Display results grouped by pattern with console.table() - Escape control characters (\r\n) for terminal readability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Move ptau URL/filename from hardcoded constants to benchmark.json - Auto-calculate maxConstraints from filename (e.g., pot18 = 2^18) - Upgrade default from pot16 (65K) to pot18 (262K constraints) - Add PtauConfig type definition - Update providers to use async getMaxConstraints()

- Rename ScalingDataPoint fields to distinguish v1/v2 Circom data - Add circomV1Constraints and circomV1ProveMs optional fields - Update console summary to show all three providers with clear labels - Update Markdown tables with "Circom v1 (DFA)", "Circom v2 (NFA)", "Noir v2 (UltraHonk)" headers - Update LaTeX tables with explicit provider names - Include v1 Circom data in scaling analysis table

- Separate circuit size and proving time into distinct tables for clarity - Add pattern definitions table with regex and sample inputs - Add escaping helpers for markdown and LaTeX special characters - Upgrade PTAU from pot18 to pot21 for larger circuits - Fix unnecessary field deletion in circom-v2 provider

Measure actual peak RSS for each benchmark phase using /usr/bin/time, replacing file-size-based estimation with accurate runtime measurements. - Add memory.ts utility with macOS/Linux platform detection - Add MemoryStats and PhaseMemory interfaces to types - Update all providers with memory tracking per phase - Rename executeMs to witnessGenMs in Noir for consistency - Add --no-memory CLI flag to disable profiling - Update result collection and output generation - Add memory tables to Markdown and LaTeX outputs - Add unit tests for memory utilities - Document memory profiling in README

… circuits Add complexity-based (simple/medium/complex/v2-only) and feature-based classification to pattern definitions. Create decomposed regex configs and generate Circom v2 + Noir v2 circuits for all new patterns.

…nfig Test all 15 patterns against v1 DFA compiler. Only fixed_range_quantifier fails (Accept Nodes Error). Update worktree.ts to use commit hash from config instead of hardcoded main. Add v1 provider sample inputs for all new patterns.

Rewrite generate-outputs.ts to support complexity-grouped tables, V1 vs V2 comparison, V1 compatibility analysis, V2-only patterns, summary statistics by complexity level, and separate Noir details. Also adds 512-byte input length to benchmark config.

…atterns Add sample_haystacks for 11 new benchmark patterns and regenerate Noir circuit test functions from the sample data.

… and fix input fallback - collect-results.ts was loading v1-compatibility.json as a benchmark result, causing undefined pattern name error - circom-v2 provider now falls back to pattern.sampleInput instead of hardcoded 'b' for new patterns - Same fallback fix applied to circom-v1 provider

The 11 new benchmark pattern circuits were generated but not added to noir/src/templates/circuits/mod.nr, causing nargo to fail with "Could not resolve" errors during Noir benchmarks.

…shutdown Replace per-process SIGINT/SIGTERM handlers with a centralized AbortController. All Bun.spawn calls now receive the abort signal, benchmark loops check isAborted() to break early, and the first Ctrl+C triggers graceful unwinding through finally blocks.

… at all sizes Replace hardcoded sample inputs with a configurable scaling system that fills each target input length with regex-matching content. Three strategies: repeat (tile template), extend (grow match portion), and pad-with-match (anchored template + filler). Each pattern in patterns.json now declares inputTemplate and scalingStrategy. All three providers use generateScaledInput() instead of static lookup tables. Output reports include scaling methodology, actual content lengths, and per-pattern strategy tables. Also improves memory measurement: separates command stderr from /usr/bin/time output for reliable RSS parsing, records data from non-zero exits (common with nested shell wrapping), and adds --no-memory flag support via BENCH_NO_MEMORY env var.

Use accurate compiler backend names (DFA/NFA) instead of version numbers (v1/v2) throughout benchmark configs, output generation, and type definitions for clarity in academic context.

Increase minRuns to 5 and maxRuns to 10 for better statistical significance. Add missing generateScaledInput imports to circom and noir providers. Add docs/ to gitignore.

Add compilerCommitHash and toolVersions fields to individual benchmark result files for reproducibility. Rename all provider identifiers from v1/v2 to DFA/NFA to better describe the algorithm difference. - Extend BenchmarkProvider interface with getCommitHash() and getToolVersions() - Move BenchmarkResult type from bench.ts to shared types.ts - Rename providers: circom-v1→circom-dfa, circom-v2→circom-nfa, noir-v2→noir-nfa - Update all type fields, config keys, CLI args, scripts, and README - Add compilerVersions aggregation to collect-results.ts

Remove ~120 lines of dead code: isOk, isErr, unwrap, unwrapOr, map, flatMap from errors.ts; runHyperfineBatch, measureSync, hashFile, proveAndVerify from utils. None had any callers in the codebase.

…code Create two new shared modules (exec.ts, project.ts) and centralize duplicated functions across 7 files (~19 duplicate instances removed): - execAsync/execCommand: unified in utils/exec.ts with configurable options (useNvm, errorFactory) to preserve each call site's behavior - wrapCommandWithNvm/getNvmWrappedCommand: unified with bashWrap option - getProjectRoot/getGitCommitHash: extracted to utils/project.ts - defaultMemoryStats: removed local copy from collect-results.ts - getToolVersions: Circom sync version centralized in hardware.ts - measureWitnessGeneration: promoted to shared utility in snarkjs.ts Net reduction: ~300 lines. No behavioral changes.

…teStats Variance was dividing by n (population variance) instead of n-1 (sample variance). With minRuns=5 this underestimates stddev by ~12%. Added n>1 guard to avoid division by zero on single measurements.

Validate git ref against a strict pattern and quote it in the shell command to prevent injection via a malicious benchmark.json value.

Rename remaining v1/v2 references in generate-outputs.ts comments and reductionPct parameter names to use DFA/NFA terminology, and update the module docstring in src/index.ts.

The PTAU file download performed no integrity verification after fetching a ~2GB file. Add streaming SHA256 verification for both fresh downloads and cached files, with automatic re-download on mismatch.

Replace the abstract class with direct interface implementation since all methods were abstract except an empty cleanup() that every provider overrides. Also consolidate NoirCircuitInput and MemoryStats into types.ts.

…benchmarks-package feat(benchmarks): add benchmarking suite for DFA vs NFA comparison

shreyas-londhe and others added 16 commits March 3, 2025 07:56

fix: 3 char issue

e861888

feat: v2 fresh start

02150d4

feat: remove epsilon states (wip)

6ec7675

fix: epsilon-nfa -> nfa

e309004

feat: circom codegen

6667e88

fix: conditional capture group logic

7613e48

feat: added substring extraction logic

a61a224

feat: basic case working

09c7cd8

noir compiler for regex matching (no substring capture

499db3e

feat: basic compiler functions (untested)

f45751c

update to work with bin and new compiler

f755d92

split functionality, add packed capture group data to lut

9bb586f

fix: capture group logic

2901ec1

feat: added function to generate inputs for circuit

7930a41

feat: basic e2e working

20d465a

feat: added wasm functions

8a24437

shreyas-londhe and others added 13 commits April 2, 2025 18:20

feat: minor changes

2121a41

feat: added sample circom circuits

20ee1d3

merge

d9d839b

input gen works for non-capture

0738447

feat: improve epsilon removal

7f7a6e9

optimized substring capture single case

e39ea6c

common utilities moved to common

90c2027

codegen for capture groups

5e1d6cb

codegen works with non-captures and captures

29ab2b6

constrain capture start end masks

0735934

fix: epsilon removal

49573cf

Merge branch 'feat/new-compiler' into feat/new-compiler-noir

3983096

updated regex compiler

c7cc953

rutefig and others added 30 commits February 10, 2026 21:54

feat(benchmarks): add sample haystacks and Noir test inputs for new p…

72f6792

…atterns Add sample_haystacks for 11 new benchmark patterns and regenerate Noir circuit test functions from the sample data.

fix(noir): register new benchmark circuit modules in mod.nr

d2f2a8f

The 11 new benchmark pattern circuits were generated but not added to noir/src/templates/circuits/mod.nr, causing nargo to fail with "Could not resolve" errors during Noir benchmarks.

chore(noir): regenerate templates and circuit inputs

31e8aff

refactor(benchmarks): rename v1/v2 terminology to DFA/NFA

33cfde4

Use accurate compiler backend names (DFA/NFA) instead of version numbers (v1/v2) throughout benchmark configs, output generation, and type definitions for clarity in academic context.

fix(benchmarks): increase run counts and add missing imports

1e9ae4f

Increase minRuns to 5 and maxRuns to 10 for better statistical significance. Add missing generateScaledInput imports to circom and noir providers. Add docs/ to gitignore.

refactor(benchmarks): remove unused exported functions

786b7a8

Remove ~120 lines of dead code: isOk, isErr, unwrap, unwrapOr, map, flatMap from errors.ts; runHyperfineBatch, measureSync, hashFile, proveAndVerify from utils. None had any callers in the codebase.

chore: remove .DS_Store from tracking and add benchmarks to workspace

037e9d4

fix(benchmarks): use sample variance (Bessel's correction) in calcula…

609ba13

…teStats Variance was dividing by n (population variance) instead of n-1 (sample variance). With minRuns=5 this underestimates stddev by ~12%. Added n>1 guard to avoid division by zero on single measurements.

fix(benchmarks): sanitize commit hash in worktree shell command

08b3d42

Validate git ref against a strict pattern and quote it in the shell command to prevent injection via a malicious benchmark.json value.

fix(benchmarks): add warmup runs to in-process timing measurements

7f81c28

fix(benchmarks): complete v1/v2 → DFA/NFA rename in comments and params

90dcd46

Rename remaining v1/v2 references in generate-outputs.ts comments and reductionPct parameter names to use DFA/NFA terminology, and update the module docstring in src/index.ts.

fix(benchmarks): add SHA256 checksum verification to PTAU download

c75978e

The PTAU file download performed no integrity verification after fetching a ~2GB file. Add streaming SHA256 verification for both fresh downloads and cached files, with automatic re-download on mismatch.

chore: collapse generated files in PR diffs with linguist-generated

1d2abd0

Merge pull request #114 from zkemail/rutefig/reg-660-zk-regex-create-…

1599f48

…benchmarks-package feat(benchmarks): add benchmarking suite for DFA vs NFA comparison

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/new compiler#92

Feat/new compiler#92
shreyas-londhe wants to merge 196 commits intomainfrom
feat/new-compiler

shreyas-londhe commented Mar 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

socket-security bot commented Mar 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

shreyas-londhe commented Mar 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

socket-security bot commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

shreyas-londhe commented Mar 28, 2025 •

edited by coderabbitai bot

Loading

socket-security bot commented Mar 31, 2025 •

edited

Loading