Avoid wcwidth(), wcrtomb() and mbrtowc() on ASCII/ISO8859-1 characters.
ASCII <-> UTF has trivial mappings. Avoid wcrtomb() and mbrtowc(). ISO-8859-1 is all narrow characters, and cheap to test for. It might be possible to cheaply test other popular UTF blocks and/or planes as well. These two changes get 2-3x faster input processing on Linux and FreeBSD. Performance improvement in actual usage is more modest but still significant.
This commit is contained in:
@@ -80,10 +80,15 @@ void Parser::UTF8Parser::input( char c, Actions &ret )
|
||||
{
|
||||
assert( buf_len < BUF_SIZE );
|
||||
|
||||
/* 1-byte UTF-8 character, aka ASCII? Cheat. */
|
||||
if ( buf_len == 0 && static_cast<unsigned char>(c) <= 0x7f ) {
|
||||
parser.input( static_cast<wchar_t>(c), ret );
|
||||
return;
|
||||
}
|
||||
|
||||
buf[ buf_len++ ] = c;
|
||||
|
||||
/* This function will only work in a UTF-8 locale. */
|
||||
|
||||
wchar_t pwc;
|
||||
mbstate_t ps = mbstate_t();
|
||||
|
||||
|
||||
Reference in New Issue
Block a user