55 Commits

Author SHA1 Message Date
Allan Odgaard
94d3b9b670 Remove old build files 2021-02-15 16:01:50 +01:00
Allan Odgaard
f7d765ba0e Add build files (for new build system) 2021-02-15 16:01:50 +01:00
Allan Odgaard
c93030b385 Remove all debug output from custom log macros
This provided value during early development, but has been unused for years, and it would generate too much noise if converted to os_log.

So better to just remove it all and add os_log statements as needed.
2020-06-05 21:22:50 +07:00
Allan Odgaard
4ec10c0923 Don’t annotate types and classes with PUBLIC
This was required when we linked each framework as its own thing, which we do not do anymore, and if we do go back to this system, we can simply have symbols public by default.
2020-06-05 21:22:50 +07:00
Allan Odgaard
71df4611ff text: We no longer need to link with the ‘crash framework’
This is since commit 82c4e272a4.
2020-04-29 08:02:02 +07:00
Allan Odgaard
9f339bfcf6 Do not hardcode the type of utf8::iterator_t
This is to allow it to be used with both const and non-const character buffers (the latter being returned from std::string’s data member function).
2019-10-27 15:25:35 +01:00
Allan Odgaard
82c4e272a4 Remove unnecessary crash reporting diagnostics 2019-10-27 15:18:20 +01:00
Allan Odgaard
cf3fe7c575 Fix buffer overflow bug
Without the boundary check we would write one byte beyond the end of the stack-allocated ‘first’ array.
2019-08-25 13:16:10 +02:00
Allan Odgaard
da996b6542 Use “quote” include statements for framework’s own headers 2019-06-26 13:21:11 +02:00
Allan Odgaard
f3a16e513b Remove assertion that does not hold for malformed content 2018-10-28 10:15:56 +07:00
Allan Odgaard
c9cc0c266d Disable exact rank calculation for large strings
The algorithm to calculate exact rank requires n × m storage which is stack allocated, so for large strings, we could blow the stack.

Rather than switch to dynamic allocation, we’re just foregoing the exact rank, since for large strings, it’s unlikely to be useful to try to calculate such a thing.
2018-03-08 16:05:52 +07:00
Allan Odgaard
783d073098 Opening document with no newlines no longer default to LF
Since creating new untitled documents go through the same “open” code they would have their newlines set to LF, this is no longer the case, so the global (or targeted) lineEndings setting now decide what to use (when saving the document).

Currently creating an untitled document from a buffer (e.g. `echo foo|mate`) will do newline detection and thus will ignore user settings during save, if the buffer had any newlines during initialization.

This may or may not be desired. Probably it should do newline detection when the data is provided by the user, but not when it is based on “internal” data, for example a command with “New Document” as output location.
2016-11-02 23:02:18 +07:00
mathbunnyru
440414f96c Use nullptr in all C++ files instead of NULL
This brings us a bit of extra type safety, for example where an integer is expected, nullptr should be disallowed by the compiler (unlike NULL).
2016-10-22 21:40:14 +07:00
Allan Odgaard
f66d48d4d5 Limit input to CFCharacterSetIsLongCharacterMember
This is an optimization but it also fixes a crash when CFCharacterSetIsLongCharacterMember is called with extremely large values.

One crash report shows the input being 0x1001DEBC, which is not valid UTF-32, could perhaps be the result of loading a garbage file, so it might make sense to perform some range checks when the user selects to load a file using a UTF encoding.

It could also be command output or possibly copy/paste.
2016-10-10 22:38:06 +02:00
Allan Odgaard
2feb4a497a Use crash_reporter_info_t’s convenience constructor
Also change most ‘crashInfo’ variable names to just ‘info’ to be consistent.
2016-10-07 22:14:33 +02:00
Allan Odgaard
2ef853d4ff Add crash report info for CFCharacterSetIsLongCharacterMember 2016-09-11 10:55:57 +02:00
Allan Odgaard
46fb745bbe Use perrorf when printing errors with dynamic strings
Also revise error messages to be more consistent.
2016-08-28 17:25:26 +02:00
Allan Odgaard
dbdfa3c6af Add identifying information to perror output 2016-08-21 12:09:30 +02:00
Allan Odgaard
d18d524037 Use CFCharacterSet for “East Asian Width” and update tables
See 45f847d01e for code used to update the tables.
2016-07-01 13:49:44 +02:00
Allan Odgaard
f78bfd306d Remove unused constant for “mixed newlines” and callback related to this 2016-06-29 11:37:28 +02:00
Allan Odgaard
0520e4fe88 Add text::transcode_t which is a wrapper for iconv
Apart from being simpler to use this wrapper supports adding ‘//BOM’ to the charset name to either consume or produce a byte order marker.

It also converts invalid byte sequences to (ASCII) escape codes, e.g. \x8F.
2016-06-21 18:31:29 +02:00
Allan Odgaard
4758061719 Do not add final newline to hex dump 2016-06-21 10:51:55 +02:00
Allan Odgaard
e83fee564c Refactor: use emplace_back(…) instead of push_back(make_pair(…)) 2016-05-28 22:12:46 +02:00
Allan Odgaard
91a7fa0ad2 Change heuristic to detect line endings
Only if there is consistent use of CR or CRLF will we treat the file as such, otherwise it is treated as LF delimited.
2016-05-24 17:11:24 +02:00
Allan Odgaard
0e7a04cede Don’t treat extraneous CRs when converting from CRLF
Our previous method of converting CR/CRLF files to LF representation was somewhat heuristic in that we would convert all CRs to LF and skip any LF following a CR.

While files should generally not mix and match line endings, it does occasionally happen in practice, and in that case, we do not want to “lose information” by converting too many newline characters to LF.
2016-05-24 16:51:36 +02:00
Allan Odgaard
6a8ce140e2 Construct undefined text::range_t/pos_t if created from NULL_STR 2014-10-14 17:35:42 +02:00
Allan Odgaard
7675aeb4ec Changing case would truncate the result if it grew in size 2014-06-28 17:42:22 +02:00
Allan Odgaard
c272afaff2 Cleanup/harmonize whitespace
Leading indent should consist only of tabs, beyond that, only spaces should be used.
2014-04-25 16:55:31 +07:00
Allan Odgaard
7bedb531ef Let utf8::multibyte<T>::length return 1 for non-multibyte chars
Previously we asserted that the API was always called with multibyte start characters.
2014-04-18 06:39:27 +07:00
Allan Odgaard
39b94e6ac3 Harmonize whitespace and add trailing newline 2014-04-14 14:26:52 +07:00
Allan Odgaard
7d0100fa2b utf16::distance: End iterator may point into multi-byte character
This function is (indirectly) used by a lot of code, and not all of it provide with valid indexes, though it seems like an issue that can be fixed locally, hence why I have decided to allow it (coupled with this being the main reason for crashes).

It is however still not allowed when building in debug mode (rationale being that running it in debug mode and getting an assertion failure should provide enough info to fix the issue).
2014-04-05 14:13:39 +07:00
Allan Odgaard
0daa6d0ec2 Tighter code for removing malformed UTF-8 sequences 2014-04-01 16:01:19 +07:00
Allan Odgaard
cf452cdcee Increase the number of tests for sanitizing UTF-8
Also harmonize the formatting of the existing tests.
2014-04-01 16:01:19 +07:00
Allan Odgaard
471fbe45c2 Do not stack allocate potentially large buffer
Also test that each system function used actually succeeds.
2014-04-01 16:01:19 +07:00
Allan Odgaard
d7660bd89e Detect first loop iteration using std::exchange “idiom” 2014-03-23 22:47:15 +07:00
Allan Odgaard
f3f4efd062 Use binary literals in code (C++14) 2014-03-16 18:06:03 +07:00
Allan Odgaard
1840f5b7fa Improve utf8::find_safe_end implementation
Previously calling the function with invalid UTF-8 could cause it to iterate over all the data and, if built in debug mode, could cause an assertion failure.

Now we return the sequence’s end when the data appears to be malformed and we never look at more than the last 6 bytes in the sequence.
2014-03-03 13:48:12 +07:00
Allan Odgaard
c2397484b8 Use C++11 for loop
Majority of the edits done using the following ruby script:

    def update_loops(src)
      dst, cnt = '', 0

      block_indent, variable = nil, nil
      src.each_line do |line|
        if block_indent
          if line =~ /^#{block_indent}([{}\t])|^\t*$/
            block_indent = nil if $1 == '}'
            line = line.gsub(%r{ ([^a-z>]) \(\*#{variable}\) | \*#{variable}\b | \b#{variable}(->) }x) do
              $1.to_s + variable + ($2 == "->" ? "." : "")
            end
          else
            block_indent = nil
          end
        elsif line =~ /^(\t*)c?iterate\((\w+), (?!diacritics::make_range)(.*\))$/
          block_indent, variable = $1, $2
          line = "#$1for(auto const& #$2 : #$3\n"
          cnt += 1
        end
        dst << line
      end
      return dst, cnt
    end

    paths.each do |path|
      src = IO.read(path)

      cnt = 1
      while cnt != 0
        src, cnt = update_loops(src)
        STDERR << "#{path}: #{cnt}\n"
      end

      File.open(path, "w") { |io| io << src }
    end
2014-03-03 10:34:13 +07:00
Allan Odgaard
bc4650f2b0 Move line ending support to text framework 2013-10-31 18:32:16 +01:00
Allan Odgaard
2fa5d7ddb2 Add UTF-8 sanitization function
This can be used to remove malformed multibyte sequences.
2013-10-08 21:59:54 +02:00
Allan Odgaard
1c308c810d Use map::emplace instead of inserting std::pair (C++11) 2013-09-05 20:59:11 +02:00
Allan Odgaard
b7bc35ed9d Let decode::url_part convert plus to space 2013-08-29 13:26:16 +02:00
Allan Odgaard
7ccd7add60 Use digittoint() instead of std::stoi()
Both because of performance and because the latter can throw an exception (although we check the input, so it should not happen with our use of the API).
2013-08-27 15:30:09 +02:00
Allan Odgaard
585a32344a Allow comparison of text::indent_t 2013-07-29 10:03:25 +02:00
Allan Odgaard
f05426378c Update testing system for text framework 2013-07-26 13:53:58 +02:00
Allan Odgaard
ea2cf8d875 Add CR to default trim character set 2013-06-22 21:02:45 +07:00
Allan Odgaard
fd60fd25c7 Change strtol → std::stol (C++11)
I initially wanted to do this change globally, but std::stoX will throw an exception if it fails to parse something and we use strtoX a few places where parsing nothing (and getting back zero) is fine.
2013-02-08 11:20:35 +01:00
Allan Odgaard
e75e7ec8e5 Change text::format → std::to_string (C++11) 2013-02-08 11:20:34 +01:00
Allan Odgaard
20378c426e A full match should rank highest 2013-01-18 13:34:57 +01:00
Allan Odgaard
ebab500ba3 Use std::map/set instead of C arrays
These types come with a find() method and avoids having to use helper functions to get the begin/end of the array (for linear search).
2012-09-20 12:22:20 +02:00