This provided value during early development, but has been unused for years, and it would generate too much noise if converted to os_log.
So better to just remove it all and add os_log statements as needed.
This was required when we linked each framework as its own thing, which we do not do anymore, and if we do go back to this system, we can simply have symbols public by default.
Presently buffer_t::wait_for_repair will use the grammar from the main thread (rather than wait for the parse thread to finish) which does cause a race condition since the parser will now mutate the grammar (graph coloring).
Ideally buffer_t::wait_for_repeat would simply wait for the parser to finish, but I’d prefer to first switch the parser to use GCD.
If we have multiple overlapping captures and define rules for one of these, the scopes would not be stacked correctly.
Unfortunately the fix (for this rare edge case) does degrade parser speed, as we need to switch to an alternative way of keeping track of scopes, yet for injection grammars, we need to also keep the old system (to have the “current” scope available as a scope_t instance), so while the new system is almost as fast as the old, using both is not.
There should be a few ways to optimize scope_t construction, so that this part will add less overhead in the future (I think it’s roughly 5-10% of parsing time spent to scope_t related stuff).
A grammar_t instance now deep-copies potential grammars it includes and each call to parse_grammar() returns a new unique instance.
The latter allows mutating the grammar (by the parser) and the former ensures that grammars are not left with expired pointers (to other grammars) when bundle items are updated.
Using GCD actually makes the code slower — it might have to do with locking overhead from std::shared_ptr and onig_region_new/region_free.
Worth trying again once use of std::shared_ptr has been removed from the parser, and oniguruma regions are preallocated.
Previously we had to test if the patterns contained \A, \G, or \z, and if so, rewrite those anchors based on wether or not the current line/match position could match them.
This is instead of keeping a std::set with rule identifiers. Keeping the information in the grammar is a lot faster (about 25%) as we can update the status in O(1) without any memory allocation.
The downside is that the grammar is now being mutated by the parser. This is currently safe because only a single thread is used for parsing. When we switch to allowing multiple threads to perform parsing, we should make a copy of the grammar for each instance.
Another downside is that we only tag rules that have begin/match patterns, so rules that are wrappers for a set of rules, or rules that are including another rule, are never rejected, even if already visited, but the target rules they resolve to will be, though if an include (indirectly) include itself, we will no longer break such cycle (though it is clearly a bug in the grammar, if this happens, and we could preprocess the grammar to catch it).
This allows overriding “native” rules via injecting.
This commit drops support for using ‘.’ as (injection) scope selector to match everywhere. Instead use ‘*’.
The content scope is the portion of the scope created while parsing the document content, unlike scope attributes, document, project, SCM, and dynamic scopes (appended to the content scope).
Incase our index is out-of-date and we try to load and parse a grammar that does not exist on disk, we would get a grammar_t with a “null” root rule, this would later crash when trying to use the grammar.
As a simple fix we now ensure a dummy rule always exist.