Jump to content

llGetDisplayName() Returns '?' for (I think) Non-Ascii Characters


GManB
 Share

You are about to reply to a thread that has been inactive for 1461 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

OK. First of all, here's Furware's default font.

furwarefonts.thumb.png.017e75a1d80479965fc408924e53cbba.png

Furware's default font texture, which is open source. The Github code will show you the UUID of this texture. That's all you can get on most SL signs. That's the Basic Latin block of Unicode, plus an additional block of symbols.

Letters are shown by showing a tiny piece of that texture in each letter position. Cramming all of Unicode onto one texture won't fit.

LSL/Mono uses UTF-16, which is what Microsoft and Java used 15 years ago or so. UTF-16 is supposed to be a fixed-length encoding - one character takes 2 bytes. There's a hack to use two UTF-16 characters to represent characters way up in the Unicode character set (these used to be called the "astral planes") but I don't know if SL supports that. UTF-16 covers everything in languages people actually use much. Up in the astral planes are archaic and obscure languages (Cretan Linear B, Sanskrit, etc.) and emoji.

UTF-8 is a clever variable length system; one character can occupy 1 to 4 bytes. So all that astral plane stuff can be represented without making common characters longer.

In general, the Web is now UTF-8 everywhere, Linux is UTF-8 everywhere, Go, Javascript, and Rust are UTF-8 everywhere, the major databases all support UTF-8 but also support legacy encodings, Python talks UTF-8 but internally is ASCII, UTF-16, or UTF-32, and Java and Microsoft still have legacy problems but are mostly UTF-8 now. After a decade of misery for everyone who understands this stuff, we're mostly done with it.

(Then there are emoji modifiers, the gimmick that allows changing the skin tone on emoji and other special effects. That's on top of Unicode. Another nightmare for anyone who has to put in line breaks. SL seems to have very little emoji support. On mobile, though, everybody has all that and people keep demanding new emoji, or complaining about old ones. You can put a "not" modifier, the "do not enter" sign, on anything, and this upsets some people.)

 

Link to comment
Share on other sites

On 10/2/2020 at 9:12 AM, Mollymews said:

it may be that LSL string type doesn't handle extended 4 byte chars at all. Dunno exactly without testing   

i had a play with this. The LSL string type does handle 4 byte chars quite well.  The only issue is when we don't have the font glyphs installed on our own computer, in which case they will display as a outline box char

under LSL (mono) llStringLength returns the number of UTF-16 characters.  4 byte characters count as 1. As do the 2 byte characters. In UTF-16 there are no 1 byte characters as there are in UTF-8

Link to comment
Share on other sites

4 hours ago, DoteDote Edison said:

You could minimize testing by only doing so when the matching CHARS index is not found (equals -1).  The FURWARE script already performs that test but assigns index 68 (a "?") on NOT FOUND. Instead, use that opportunity to jump out of the loop and send a link message back to the referring script requesting  a safe name.

DoteDote,

I think the only way to accomplish what you suggest would be to modify the FURWARE code. I'd prefer not to have to maintain a separate version of that.

G

Link to comment
Share on other sites

5 hours ago, Hexadeci Mole said:

Getting there.... took forever to figure out the FURWARE face offsets. Gotta throw on the MIT license n stuff.

Here is an example video of HexText rendering Unicode:

https://gyazo.com/f34f7951a2f79fd0363a3539e7fdbbf2

(Gyazo capture is blurry compared to in-world.)

How does that work? There must be some huge texture with the font.

Link to comment
Share on other sites

3 hours ago, animats said:

How does that work? There must be some huge texture with the font.

114 textures, 512 x 256 each covering about 28,000 characters.

Sounds like a huge GPU memory hog, but it takes 8 of these to equal a single 1024 * 1024 texture. Unless it is displaying a full range of Kanji, it will rarely go over about 16 or so.  People have been pretty predictable  about which unicode characters they use for enhancing display names. Displaying pure ASCII, it uses far less GPU memory.

Edited by Hexadeci Mole
  • Thanks 2
Link to comment
Share on other sites

47 minutes ago, Hexadeci Mole said:

Sounds like a huge GPU memory hog, but it takes 8 of these to equal a single 1024 * 1024 texture. Unless it is displaying a full range of Kanji, it will rarely go over about 16 or so.  People have been pretty predictable  about which unicode characters they use for enhancing display names. Displaying pure ASCII, it uses far less GPU memory.

I did think about that too but a single 1024 texture on an avatar with materials (3x1024) is equal to 24 character page textures. So an average avatar with 10 textures (thats pretty few) equals 240 texture pages. So I came to the conclusion that the texture memory of such a display is nothing to worry about. It's practiclly nothing compared to the avatars that you need to trigger the usage of that texture pages.

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 1461 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...