This provided value during early development, but has been unused for years, and it would generate too much noise if converted to os_log.
So better to just remove it all and add os_log statements as needed.
This was required when we linked each framework as its own thing, which we do not do anymore, and if we do go back to this system, we can simply have symbols public by default.
The algorithm to calculate exact rank requires n × m storage which is stack allocated, so for large strings, we could blow the stack.
Rather than switch to dynamic allocation, we’re just foregoing the exact rank, since for large strings, it’s unlikely to be useful to try to calculate such a thing.
Since creating new untitled documents go through the same “open” code they would have their newlines set to LF, this is no longer the case, so the global (or targeted) lineEndings setting now decide what to use (when saving the document).
Currently creating an untitled document from a buffer (e.g. `echo foo|mate`) will do newline detection and thus will ignore user settings during save, if the buffer had any newlines during initialization.
This may or may not be desired. Probably it should do newline detection when the data is provided by the user, but not when it is based on “internal” data, for example a command with “New Document” as output location.
This is an optimization but it also fixes a crash when CFCharacterSetIsLongCharacterMember is called with extremely large values.
One crash report shows the input being 0x1001DEBC, which is not valid UTF-32, could perhaps be the result of loading a garbage file, so it might make sense to perform some range checks when the user selects to load a file using a UTF encoding.
It could also be command output or possibly copy/paste.
Apart from being simpler to use this wrapper supports adding ‘//BOM’ to the charset name to either consume or produce a byte order marker.
It also converts invalid byte sequences to (ASCII) escape codes, e.g. \x8F.
Our previous method of converting CR/CRLF files to LF representation was somewhat heuristic in that we would convert all CRs to LF and skip any LF following a CR.
While files should generally not mix and match line endings, it does occasionally happen in practice, and in that case, we do not want to “lose information” by converting too many newline characters to LF.
This function is (indirectly) used by a lot of code, and not all of it provide with valid indexes, though it seems like an issue that can be fixed locally, hence why I have decided to allow it (coupled with this being the main reason for crashes).
It is however still not allowed when building in debug mode (rationale being that running it in debug mode and getting an assertion failure should provide enough info to fix the issue).
Previously calling the function with invalid UTF-8 could cause it to iterate over all the data and, if built in debug mode, could cause an assertion failure.
Now we return the sequence’s end when the data appears to be malformed and we never look at more than the last 6 bytes in the sequence.
Both because of performance and because the latter can throw an exception (although we check the input, so it should not happen with our use of the API).
I initially wanted to do this change globally, but std::stoX will throw an exception if it fails to parse something and we use strtoX a few places where parsing nothing (and getting back zero) is fine.