[Libre-soc-isa] [Bug 794] SVP64 REMAP for utf8

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Wed Mar 30 16:16:30 BST 2022


https://bugs.libre-soc.org/show_bug.cgi?id=794

--- Comment #6 from Jacob Lifshay <programmerjake at gmail.com> ---
additional links:
(WTF-8 is UTF-8 but modified to also represent unpaired surrogates, like in
ill-formed UTF-16. this is useful for Windows File Names, Java/JS Strings,
etc.)
https://simonsapin.github.io/wtf-8/

https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf
Table 3-7 (modified to put a star next to where the original used bold text)
Well-Formed UTF-8 Byte Sequences
Code Points        First Byte Second Byte Third Byte Fourth Byte
U+0000..U+007F     00..7F
U+0080..U+07FF     C2..DF     80..BF
U+0800..U+0FFF     E0         *A0..BF     80..BF
U+1000..U+CFFF     E1..EC     80..BF      80..BF
U+D000..U+D7FF     ED         80..*9F     80..BF
U+E000..U+FFFF     EE..EF     80..BF      80..BF
U+10000..U+3FFFF   F0         *90..BF     80..BF     80..BF
U+40000..U+FFFFF   F1..F3     80..BF      80..BF     80..BF
U+100000..U+10FFFF F4         80..*8F     80..BF     80..BF

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list