This function is (indirectly) used by a lot of code, and not all of it provide with valid indexes, though it seems like an issue that can be fixed locally, hence why I have decided to allow it (coupled with this being the main reason for crashes).
It is however still not allowed when building in debug mode (rationale being that running it in debug mode and getting an assertion failure should provide enough info to fix the issue).
Previously calling the function with invalid UTF-8 could cause it to iterate over all the data and, if built in debug mode, could cause an assertion failure.
Now we return the sequence’s end when the data appears to be malformed and we never look at more than the last 6 bytes in the sequence.
Both because of performance and because the latter can throw an exception (although we check the input, so it should not happen with our use of the API).
I initially wanted to do this change globally, but std::stoX will throw an exception if it fails to parse something and we use strtoX a few places where parsing nothing (and getting back zero) is fine.
There is a bunch of functions that deal with the logical column count and these now all count code points with the “east asian width” (unicode) property set as two columns.
This closes issue #206.
This checks if the character needs to be counted as double-width (for soft wrap and similar).
I used the following script to generate the tables, it should be improved to collapse the ranges:
#!/usr/bin/ruby
fixed, start, stop = [ ], [ ], [ ]
open('|curl -Ls http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt') do |io|
io.grep(/^([0-9A-F]+)(?:..([0-9A-F]+))?;[A-Za-z]*W/) do
if $2
start << "0x#$1"
stop << "0x#$2"
else
fixed << "0x#$1"
end
end
end
puts "static uint32_t Fixed[] = { #{fixed.join(', ')} };\n"
puts "static uint32_t RangeBegin[] = { #{start.join(', ')} };\n"
puts "static uint32_t RangeEnd[] = { #{stop.join(', ')} };\n"