From e8984918bed84df02ddcede8e037d51e4bbb094d Mon Sep 17 00:00:00 2001 From: JernKunpittaya <61564542+JernKunpittaya@users.noreply.github.com> Date: Mon, 17 Jul 2023 13:33:38 +0700 Subject: [PATCH] clean readme, comment in circom --- README.md | 57 +++++++++++++++++++++++++++++++++++------------ src/gen_circom.js | 3 ++- 2 files changed, 45 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 7d4f58b..4e77987 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,36 @@ # full_zk_regex -Presentation: https://drive.google.com/file/d/1MFT7BZmB7wgMqhr_AgT_60dukdG_0v9P/view +Presentation [just 10 Min!]: https://drive.google.com/file/d/1MFT7BZmB7wgMqhr_AgT_60dukdG_0v9P/view Slide: https://docs.google.com/presentation/d/1nSZdmwDKXjEM6bP6WBYyAWbCgK4cjpm-SXqDAA-MOjE/edit?usp=sharing - ## Summary -We allow users to easily create circom circuit to reveal submatch. After a few steps on frontend, we can deal with our newly baked circom circuit by +We allow developers to instantly create circom circuit that can both match regex and reveal the submatch they are interested in, without needing to manually mark the states from regex state machine.(See more issues our approach has solved in presentation or slide above!) After a few steps on frontend as shown in presentation, developers can use their newly baked circom circuit by -component main { public [in, match_idx] } = Regex(max_msg_byte,max_reveal_byte,group_idx); +component main { public [in, match_idx] } = Regex(max_msg_byte, max_reveal_byte, group_idx); -where in is the whole text, match_idx is to tell which occurance of a certain submatch we are interested in, and group_idx is to tell which submatch we are interested in. +where "in" is the whole text, "match_idx" is to tell which occurance of a certain submatch we are interested in, "max_msg_byte" = maximum byte we allow on input text, "max_reveal_bytes" = maximum byte we allow on revealing the submatch, and "group_idx" = to tell which submatch we are interested in."; -## Overview +## How to Use + +1. Fill the text field with the original text format you want to extract subgroup from (have multiple lines and tabs are ok, so just copy your interested text.) + +2. Fill the regex field with the regex you want to match but with explicit syntax like \n to represent new line instead of using original format like the text field. (same for \r, \t, \v,\f) + +Escape chars are escaped with \ e.g. \”, \*, \+, ... + +3. When defining regex with \* and + for subgroup match, write () over that subgroup we are interested in e.g. ((a|b|c)+) + +4. Click Match RegEx! to see where in the text that are matched by our regex + +5. Highlight "Regex to be Highlighted" by clicking "Begin Regex Highlight", then choose two points as subgroup inclusive boundary we want to match, then click "End Regex Highlight" to name the subgroup we are extracting. + +6. Repeat Step 5, If done, just "Download Circom" and DONE! + +7. We also have msg generator at the bottom, in case you want to generate msg for testing with zkrepl.dev + +## How it works (overview) Input: regex, submatches, text Output: Return circom that allows us to reveal a specific submatch we defined through frontend. @@ -51,7 +68,7 @@ Data flow and related functions [All for frontend, except the last one for gener Note that we can see more tests of calling function in test.js file -## Details +## How it works (Details) How to creat circom for extracting submatch in regex. @@ -118,14 +135,12 @@ However, in this project, we needs the naive minimized DFA without tag to get th - This project assumes that our regex is well-defined that there is only ONE string that match our regex. (But there can be multiple submatches, and multiple substrings for each submatches in that ONE regex match) - This project doesn't allow users to decide the algorithm for ambiguous submatch. For example, the text a = b = c, but with submatch [submatch1]=[submatch2], it can be either (a)=(b=c) or (a=b)=c, resulting in ambiguity. In this project, we just choose the first one that we found, but in reality there are tons of ways to define how to break ambiguity. (In paper, they handle ambiguity in submatch by using +/-) - -## Future Work -Algorithm: -- We are already at linear with msg_byte*state number (same complexity as naive zk regex), and we know that we need to run at least 2 rounds of state machine, one to run reversed version and store state change, while the other is to use that stored state to run through the forward state machine. However, currently to help write circom, we run the other round of naive DFA forward first to find the last alphabet to help keep state change of reversed DFA. We should try to cut this round out to reduce to just 2 rounds of state machine run. - -- As Aayush suggests, we should optimize Circom via LessThan gate by changing from 47