update space in readme

This commit is contained in:
jernkun
2023-06-19 22:49:53 +07:00
parent a4e77e8acd
commit b96607ce49

View File

@@ -1,6 +1,6 @@
# full_zk_regex
Presentation: [WIP]
Presentation: [WIP]
Slide: [WIP] https://docs.google.com/presentation/d/1nSZdmwDKXjEM6bP6WBYyAWbCgK4cjpm-SXqDAA-MOjE/edit?usp=sharing
We allow users to easily create circom circuit to reveal submatch. After a few steps on frontend, we can deal with our newly baked circom circuit by
@@ -11,7 +11,7 @@ where in is the whole text, match_idx is to tell which occurance of a certain su
## Overview
Input: regex, submatches, text
Input: regex, submatches, text
Output: Return circom that allows us to reveal a specific submatch we defined through frontend.
Data flow and related functions [All for frontend, except the last one for generating circom]
@@ -56,7 +56,9 @@ Part 1: we create Tagged DFA. Given Regex, we want to be able to specify a certa
Parameter:
- Regex: Define the whole regex we want to match together with a certain submatch we want to extract from that regex. For this tool, please put parentheses over the submatch you want to extract, again there can be multiple submatches! for example instead of writing regex as "I am [a-z]+, and [0-9]+ yrs old", put it as "I am ([a-z]+), and ([0-9]+) yrs old" in case we want to extract name and age of that person.
[We can see that with this, we make sure people have well-defined submatch definition, and must not extract submatch of "b)c" from regex "d(a|b)c". Instead they can redefine regex as "d((a|b)c)" and now can extract submatch "((a|b)c)"]
- Submatch: Assume it is already processed by simplifyRegex and simplifyPlus. In case of multiple submatches, we order it in ascending order of the leftmost parentheses of that submatch.
Steps
@@ -66,21 +68,33 @@ Steps
3. To take care of this tagged problem, we uses the method explained in https://www.labs.hpe.com/techreports/2012/HPL-2012-41R1.pdf Basically, we just ignore the notion of +/- and epsilon because we start M1 as the DFA that have Si, Ei transition attached to it. (Note that we do not start the process in paper from scratched NFA because doing so will result in exponential number of states, so we make sure to minimize the state machine into DFA first to be able to have linear complexity in DFA states, which is very critical for circom circuit constraint)
We run this following methods. [All methods are in gen_tagged_dfa.js]
const tagged_simp_graph = tagged_simplifyGraph(regex, submatches);
let m2_graph = M1ToM2(tagged_simp_graph);
let m3_graph = M2ToM3(m2_graph);
let m4_graph = createM4(tagged_simp_graph);
Brief explanation for algo:
tagged_simp_graph is the tagged minimized DFA with Si, Ei attached (we use as m1 in paper).
m2 graph is similar to m1 but have only states that have outgoing transition.
m3 graph is the reverted version of m2 to run on backward text. For each alphabet of revert text that run upon m3 machine, we record the states of m3 machien that the alphabet leads the transition into. We will use these states as the input in m4 machine.
m4 graph is similar to m2 but the transition alphabet is m3 state instead, and for each transition it has register that store the start and end index of string that falls under that certain submatch.
4. After gettting m4 as in paper, we transform the graph that extract subgroup by register into specifying which state transition is included in each tag (this process also allows us to detect multiple occurences of strings in the same submatch naturally). Then, we reassign the state number of m3 and m4 to become just plain consecutive numbers.
let tagged_m4_graph = registerToState(m4_graph);
let final_m3_m4 = reassignM3M4(m3_graph, tagged_m4_graph);
console.log("final m3: ", final_m3_m4["final_m3_graph"]);
console.log("final m4: ", final_m3_m4["final_m4_graph"]);
Part 2: Create Circom. we run dfa 3 times. [high level is as follows]
@@ -92,6 +106,7 @@ Steps
3. Use each state we got in m3 as alphabet for transition in m4 graph. [and start running only from the last alphabet that ends in m3 graph (since m3 graph is reversed, the last of it is the first alphabet of the string that matches our regex)]
Note: M3 and M4 are needed in constructing circom because we need both forward and backward running state machines, or else we cannot distinguish extracting (a|b) from between regex (a|b)c and (a|b)d.
However, in this project, we needs the naive minimized DFA without tag to get the first alphabet for m3 to be able to calculate state of m3 graph that each alphabet leads to in circom.
[Likely to be optimized and can remove running this naive non-tagged DFA]