Unroll loop manually to accelerate UTF-8 byte sequence encoding.
Achieves 30% to 50% performance gain on x86 and Arm servers.
NOTE: Per https://en.wikipedia.org/wiki/UTF-8#Invalid_code_points,
since RFC3629(November 2003), code points after U+10FFFF must be
treated as invalid UTF-8 byte sequence.
But to be compatible with curent code, this implementation still
accepts these illegal strings.