Ollj Oh

Resident

View Profile See their activity

Posts
3
Joined
October 6, 2009
Last visited
February 28, 2017

Content Type

All Activity

Forums

Topics
Posts

Blogs

Knowledge Base

Everything posted by Ollj Oh

Key2UTF() and UTF2Key()

Ollj Oh replied to irihapeti's topic in LSL Library

I tested; you can store 20 bit per character by using only the 21-bit-code point range of utf8. sadly lsl is bound to use "utf8 code points" because thats how lsl converts unicode to base64. 21 bit per character wont work because you are skipping 2^16 to start with the 21-bit code point range. skipping 2^16 also skips all the invalid utf8 characters, including U+FFFE by simply adding +2^16 before you convert an integer to a character, and by substracting 2^16 after you converted the character back.. Your highest possible base is 2^21-2^16 (a higher base simply isnt worth the hassle of checking valid character ranges in more detail...), and since we only care for whole bits, thats base 2^20 to encode 20 bits per character. (and no higher base that doesnt reach base 2^12 will change that) And no; there is no support for the 26-bit or 31-bit code point ranges for utf8 in sl whatsoever. someone once made an "RFC" to "cut off" these cote point ranges off utf8 (that barely anyone was using anyways) to ensure fast compatibility with utf16. --- when i wrote my integer to utf8 encoder back in january 2008 (almost pre-mono era) , it was smart to only encode up to the 16-bit-code-point range, because lsl was still allowing utf8 characters to be written in object descripotions and names. but the 21-bit code point range was always written as two characters, and that made it a waste of space when you wanted to store many bits in as few characters as possible. bacj then you could store a key in 9 letters in an object description. this is no longer possible. You wanted to use the same function that stores in object description as you use to store in a string. so 15 bit per character was a reasonable limit. now object descriptions only allow for 2^7 different characters of the small ascii range. And a 21-bit code point character is only 1 character for the mono compiler , stored as 2 byte character after a 16-byte string-header. Now you can happily store 20 bit per character of a string within lsl code or get data as compat via httprequest. Because ther is no more compatibility with storeing base 2^15 in a prims object description. base 2^7 is jsut too small to bother. use unicode to store in base 2^20, all the way!
- October 3, 2014
- 24 replies

Ollj Oh

Posts

Joined

Last visited

Content Type

Forums

Blogs

Knowledge Base

Everything posted by Ollj Oh

Key2UTF() and UTF2Key()

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us

Forums

Blogs

Activity