49 Commits

Author SHA1 Message Date
Max Brunsfeld
ecca53d796 Don't push to the parse stack for rules without while or end patterns 2019-10-07 11:44:25 +02:00
Allan Odgaard
afa7e4cdc1 Limit syntax highlight parser to first 4096 bytes of line
Ideally this would be a time limit rather than number of bytes, but it’s much easier to limit number of bytes.
2016-10-02 22:51:24 +02:00
Allan Odgaard
ed29cf9374 Ensure a capture begin/end rule does not leak its scope
Previously if a capture contained a begin/end rule with an unsatisfied end match then its scope would be applied beyond the end of the capture.
2014-09-16 20:38:09 +02:00
Allan Odgaard
39b94e6ac3 Harmonize whitespace and add trailing newline 2014-04-14 14:26:52 +07:00
Allan Odgaard
c2397484b8 Use C++11 for loop
Majority of the edits done using the following ruby script:

    def update_loops(src)
      dst, cnt = '', 0

      block_indent, variable = nil, nil
      src.each_line do |line|
        if block_indent
          if line =~ /^#{block_indent}([{}\t])|^\t*$/
            block_indent = nil if $1 == '}'
            line = line.gsub(%r{ ([^a-z>]) \(\*#{variable}\) | \*#{variable}\b | \b#{variable}(->) }x) do
              $1.to_s + variable + ($2 == "->" ? "." : "")
            end
          else
            block_indent = nil
          end
        elsif line =~ /^(\t*)c?iterate\((\w+), (?!diacritics::make_range)(.*\))$/
          block_indent, variable = $1, $2
          line = "#$1for(auto const& #$2 : #$3\n"
          cnt += 1
        end
        dst << line
      end
      return dst, cnt
    end

    paths.each do |path|
      src = IO.read(path)

      cnt = 1
      while cnt != 0
        src, cnt = update_loops(src)
        STDERR << "#{path}: #{cnt}\n"
      end

      File.open(path, "w") { |io| io << src }
    end
2014-03-03 10:34:13 +07:00
Allan Odgaard
2fe3b95585 Add debug output 2013-10-04 16:51:27 +02:00
Allan Odgaard
f87724f406 Rules are (unfortunately) not const during parsing
This is because of the “included” boolean we use to mark rules to avoid collecting them twice.
2013-10-04 16:51:27 +02:00
Allan Odgaard
1c308c810d Use map::emplace instead of inserting std::pair (C++11) 2013-09-05 20:59:11 +02:00
Allan Odgaard
5c4573873f Protect grammar_t with mutex
Presently buffer_t::wait_for_repair will use the grammar from the main thread (rather than wait for the parse thread to finish) which does cause a race condition since the parser will now mutate the grammar (graph coloring).

Ideally buffer_t::wait_for_repeat would simply wait for the parser to finish, but I’d prefer to first switch the parser to use GCD.
2013-09-05 17:26:47 +02:00
Allan Odgaard
e4e80a946c Use std::make_shared 2013-09-03 12:27:20 +02:00
Allan Odgaard
ba29e90762 Remove redundant assert 2013-08-31 21:24:39 +02:00
Allan Odgaard
c01e181b49 Use name convention to identify non-content scopes 2013-08-31 16:09:49 +02:00
Allan Odgaard
33ed6b6637 Watch leaks for grammar_t 2013-08-28 00:23:08 +02:00
Allan Odgaard
d93b20d571 Use new scope_t API 2013-08-28 00:23:08 +02:00
Allan Odgaard
057096af5b Rename API to make searching easier
Using generic names like ‘append’ is not good when analyzing code for potential refactoring.
2013-08-27 15:30:09 +02:00
Allan Odgaard
275273d39b Add missing “is content scope” argument 2013-08-27 15:30:09 +02:00
Allan Odgaard
df7cbc8da0 Add debug code 2013-08-26 15:54:05 +02:00
Allan Odgaard
e27f9f4071 Fix scopes when re-running parser on captures
If we have multiple overlapping captures and define rules for one of these, the scopes would not be stacked correctly.

Unfortunately the fix (for this rare edge case) does degrade parser speed, as we need to switch to an alternative way of keeping track of scopes, yet for injection grammars, we need to also keep the old system (to have the “current” scope available as a scope_t instance), so while the new system is almost as fast as the old, using both is not.

There should be a few ways to optimize scope_t construction, so that this part will add less overhead in the future (I think it’s roughly 5-10% of parsing time spent to scope_t related stuff).
2013-08-25 17:58:08 +02:00
Allan Odgaard
7115b80051 Skip processing empty captures 2013-08-25 17:58:07 +02:00
Allan Odgaard
cab42a83c5 Use injection patterns from “group rules”
For example if one grammar includes another, the included grammar works as a group rule and previously had its injections ignored.
2013-08-20 18:43:18 +02:00
Allan Odgaard
5752aa2dd7 Make it explicit, if a match is from a rule’s end pattern 2013-08-20 18:43:18 +02:00
Allan Odgaard
c7be71b41d Code shuffle 2013-08-20 18:43:18 +02:00
Allan Odgaard
35efecd4d8 Avoid using std::shared_ptr in parser
We can safely work with pointers since grammar_t now retain all rules involved in parsing the document.
2013-08-20 18:43:17 +02:00
Allan Odgaard
cf2637a69b Make injection grammars part of grammar_t 2013-08-19 23:36:01 +02:00
Allan Odgaard
aef742b142 Refactor grammar_t implementation 2013-08-19 23:36:00 +02:00
Allan Odgaard
b0fc120e7e Parser no longer need to handle include of ‘$base’ 2013-08-19 23:36:00 +02:00
Allan Odgaard
2db287f128 Refactor grammar setup
A grammar_t instance now deep-copies potential grammars it includes and each call to parse_grammar() returns a new unique instance.

The latter allows mutating the grammar (by the parser) and the former ensures that grammars are not left with expired pointers (to other grammars) when bundle items are updated.
2013-08-19 23:36:00 +02:00
Allan Odgaard
adc0a0a4a7 Code style changes 2013-08-19 23:36:00 +02:00
Allan Odgaard
8c1dd5fc06 Don’t use GCD for regexp matching
Using GCD actually makes the code slower — it might have to do with locking overhead from std::shared_ptr and onig_region_new/region_free.

Worth trying again once use of std::shared_ptr has been removed from the parser, and oniguruma regions are preallocated.
2013-08-18 20:36:24 +02:00
Allan Odgaard
8402f713cf Do not rewrite regexps during parsing
Previously we had to test if the patterns contained \A, \G, or \z, and if so, rewrite those anchors based on wether or not the current line/match position could match them.
2013-08-18 17:29:30 +02:00
Allan Odgaard
e9fb8aa9ae Add constructor to ranked_match_t 2013-08-18 17:29:29 +02:00
Allan Odgaard
2bd7b877e6 Tag rules in grammar with wether or not they have been seen
This is instead of keeping a std::set with rule identifiers. Keeping the information in the grammar is a lot faster (about 25%) as we can update the status in O(1) without any memory allocation.

The downside is that the grammar is now being mutated by the parser. This is currently safe because only a single thread is used for parsing. When we switch to allowing multiple threads to perform parsing, we should make a copy of the grammar for each instance.

Another downside is that we only tag rules that have begin/match patterns, so rules that are wrappers for a set of rules, or rules that are including another rule, are never rejected, even if already visited, but the target rules they resolve to will be, though if an include (indirectly) include itself, we will no longer break such cycle (though it is clearly a bug in the grammar, if this happens, and we could preprocess the grammar to catch it).
2013-08-18 17:29:29 +02:00
Allan Odgaard
ee43777c3a Use GCD to perform concurrent rule matching 2013-08-18 17:29:29 +02:00
Allan Odgaard
3ae9bfe7b8 Collect active rules before performing any matching 2013-08-18 17:29:29 +02:00
Allan Odgaard
4677a91fff Factor out resolving of included rules 2013-08-18 17:29:29 +02:00
Allan Odgaard
4edca13ca1 Add assertion for grammar construction 2013-08-16 22:40:09 +02:00
Allan Odgaard
bf1e92b865 Do not use global constructors for fixtures 2013-08-16 22:40:08 +02:00
Allan Odgaard
612d8735ee Rename test header extension: .cc → .h 2013-08-01 19:08:15 +02:00
Allan Odgaard
688d3d4a9c Update testing system for parse framework 2013-07-26 13:53:57 +02:00
Allan Odgaard
ce395fa46a Make bundle item queries thread safe
Note though that mutating the bundle item index is not allowed if other threads are querying it.
2013-07-26 13:53:57 +02:00
Allan Odgaard
49a424438c Revert "Apply injections from child rules without match pattern"
This reverts commit fc419f5332.
2013-06-15 23:08:11 +07:00
Joachim Mårtensson
fc419f5332 Apply injections from child rules without match pattern
This is useful when including a grammar and its injections needs to be applied.
2013-06-15 16:06:43 +07:00
Joachim Mårtensson
35227b48ea Injected rules rank higher if they match against the left scope
This allows overriding “native” rules via injecting.

This commit drops support for using ‘.’ as (injection) scope selector to match everywhere. Instead use ‘*’.
2013-05-02 15:07:10 +07:00
Joachim Mårtensson
eaf2d97141 Using ‘$’ in scope selector will anchor to end of content scope
The content scope is the portion of the scope created while parsing the document content, unlike scope attributes, document, project, SCM, and dynamic scopes (appended to the content scope).
2013-03-25 10:22:27 +01:00
Allan Odgaard
a9ba549cda Use strchr instead of std::find 2012-09-20 12:22:20 +02:00
Allan Odgaard
8849899007 Fix crash for missing grammars
Incase our index is out-of-date and we try to load and parse a grammar that does not exist on disk, we would get a grammar_t with a “null” root rule, this would later crash when trying to use the grammar.

As a simple fix we now ensure a dummy rule always exist.
2012-09-05 15:23:40 +02:00
Jacob Bandes-Storch
d4ce498f60 Use 64-bit: numeric type fixes
Unfortunately a printf precision specifier (‘%.*s’) can not come with a width specifier so we have to cast to int. The width specifier ‘t’ is used for ptrdiff_t.
The int → NSInteger change fixed a bug with popup menu positioning, but there was no associated warning or error. It's possible there are more such bugs that we haven't found yet!
2012-08-28 21:32:47 +02:00
Jacob Bandes-Storch
e3aa997b06 Use libc++: replace std::tr1 with std 2012-08-28 13:30:20 +02:00
Allan Odgaard
9894969e67 Initial commit 2012-08-09 16:25:56 +02:00