Jump to content

llGetDisplayName() Returns '?' for (I think) Non-Ascii Characters


GManB
 Share

You are about to reply to a thread that has been inactive for 1371 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

When I call llGetDisplayName() with a valid key for an av in the region I get back, for this one particular av, 'V?????? T??'. He says there are non-ascii characters in his display name. Reading the doc for llGetDisplayName() I see it can return '?' if it cannot return the name. But, given that I get back the correct first character of both his first and second display name I wonder what is going on. I am outputting the name to prim faces using FURWARE. He says he has not seen his name displayed with the '?'s before.

Do I need to think about UFT-8? Or use a dataserver request via llRequestDisplayName()?

 

Thanks as always,

G

Link to comment
Share on other sites

17 minutes ago, GManB said:

When I call llGetDisplayName() with a valid key for an av in the region I get back, for this one particular av, 'V?????? T??'. He says there are non-ascii characters in his display name

if he can see his name correctly on the scripted device display on his screen and you do not, then is most likely that you don't have the font installed on your computer

Link to comment
Share on other sites

8 minutes ago, Mollymews said:

if he can see his name correctly on the scripted device display on his screen and you do not, then is most likely that you don't have the font installed on your computer

I can see his display name correctly in his profile when I bring it up and the characters don't look too odd.

 

1 minute ago, animats said:

I think SL's internal Unicode representation is UTF-16, in which all the major languages should work, but maybe not emoji.

Seems like I should be looking for a code bug then. (I have had a few of those on this project lol).

 

G

Link to comment
Share on other sites

19 minutes ago, GManB said:

I can see his display name correctly in his profile when I bring it up and the characters don't look too odd.

i haven't done any testing myself, but if you can see it in his profile then I think it has to do with how the LSL string type handles UTF-16 characters above U+00010000 (which are 4 bytes) under Mono. Emojis and some human languages symbols as animats mentions

it may be that LSL string type doesn't handle extended 4 byte chars at all. Dunno exactly without testing   

Edited by Mollymews
symbols
Link to comment
Share on other sites

18 minutes ago, GManB said:

 

hhmmm looks like I need to do a quick test or two.

G

the av in question isn't online atm so llGetDisplayName() silently does nothing, as documented. So I tried llRequestDisplayName() and displayed the results from the dataserver event in chat and in the FURWARE box. Looked correct in chat but had the '?'s in the FURWARE box. So, seems the issue is with FURWARE...

I think tried llRequestUsernameI() and as one would expect that looked fine both places.

I guess I will write a function that takes the av key and tests the return from llGetDisplayName() to see if any character is non-ascii and if not returns that result or if so returns the result of llGetUsername(). Although this seems less than desirable.

Anyone have any ideas about how to test? Or maybe just use username always (ugh). Or any ideas/suggestions at all.

 

Thanks,
G

 

Link to comment
Share on other sites

Like Wulfie says, the problem isn't LSL-related, it's just that the Furware Text font texture only includes certain characters, not a full set of every possible fancy character.  You can see which characters are supported by viewing the entire font texture, or by outputting the CHARS string to chat like below.  If the character doesn't exist in the set, you get a question mark.  You can create your own font texture and replace the lesser-used characters with characters more commonly used in names, then update the CHARS string with the same new characters.  Typically, the script is given a string to display, looks up each character in the CHARS variable to get its index, and calculates the texture scale/offset to frame that single character.

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,:;!?"'´`^~+-*/\|()[]{}<>=@$%&#_
àáâãäåæªçðèéêëìíîïñòóôõöøœºÞšßùúûüýÿžÀÁÂÃÄÅÆÇÐÈÉÊËÌÍÎÏÑÒÓÔÕÖØŒþŠÙÚÛÜÝŸŽ¢£€¥§µ¡¿©®±×÷·°¹²³«»¬…‹›–
¼½¾™•┌┬┐┏┳┓╔╦╗─╴╷╻ ┯▲◀▼▶○◔◑◕●├┼┤┣╋┫╠╬╣╺━╸│┃║┿↑←↓→↺↻☐☑☒└┴┘┗┻┛╚╩╝ ═ ╵╹ ┷┠╂┨↕↔♀♂⚠ℹ⌖∡♪♫
♠♣♥♦⚀⚁⚂⚃⚄⚅✔✘☺☹■▬▮█

 

Edited by DoteDote Edison
  • Thanks 2
Link to comment
Share on other sites

1 hour ago, DoteDote Edison said:

  If the character doesn't exist in the set, you get a question mark.  You can create your own font texture and replace the lesser-used characters with characters more commonly used in names, then update the CHARS string with the same new characters.

DoteDote, I might be missing something. Let's say the string that is a display name and returned from llGetDisplayName()  that we see in most places looks like 'Vincent Ten'. (actually the icent and en are small upper-case characters). When I output that to FURWARE I get 'V?????? T??'. Now, in the string those '?'s have a numerical value of, (probably) two bytes..  Let's say the first '?' after the 'V' has the value 0xBEEF. If I understand what FURWARE is doing is mapping that value to some index in CHARS to get the character. Now, no matter what we do the binary values returned by llGetDisplayName() will include 0xBEEF as the second character. So the FURWARE code, I would imagine, checks the value of the characters and if it is larger than the largest index in CHAR then returns a '?' to be output to the prim face. 

So, to make this work wouldn't I have to both change CHARS AND manipulate the string I give FURWARE to be within the largest index of CHARS? Or, change the code of FURWARE to accommodate a larger CHARS array so that 0xBEEF is less than the max.

IDK, seems like a lot to do to get the display name on a set of prim faces and fraught with potential potholes. Or maybe I am missing something.

Maybe the answer is to just convert the returned string to ascii. There are some conversion routines I found on the web. Whatever conversion though it would seem to have to be subjective, e.g., replace any character that *looks* like and 'e', with 'e'.

G

 

Link to comment
Share on other sites

This whole discussion highlights the reason why we routinely advise people who complain in the Answers forum NOT to use oddball fonts and high-bit codes in their Display Names.  Those non-standard characters are almost always undecipherable -- even hard for a human to read, in many cases.  Personally, I'm not eager to find some way to encourage more people to use them, but that's just me.

  • Like 3
Link to comment
Share on other sites

1 hour ago, GManB said:

Maybe the answer is to just convert the returned string to ascii. 

A big reason display names exist is to support non-western text, and ASCII can't help much with that.

Is it necessary that the text is shown on a surface texture? Media is one alternative that can display a lot more varieties of text formats with a little embedded javascript, subject to the fonts available on the viewer's machine.

 

Link to comment
Share on other sites

1 hour ago, Rolig Loon said:

This whole discussion highlights the reason why we routinely advise people who complain in the Answers forum NOT to use oddball fonts and high-bit codes in their Display Names.  Those non-standard characters are almost always undecipherable -- even hard for a human to read, in many cases.  Personally, I'm not eager to find some way to encourage more people to use them, but that's just me.

I agree.The option to just test the return <<somehow>> and if it's not displayable use the username seems, at this point, a decent choice. But, will look into Media as Que suggests.

G

Link to comment
Share on other sites

2 hours ago, GManB said:

<<somehow>>

I took what I suggested in my earlier posts posts and did this:

integer GetStringBytes (string text)
{
    return (llStringLength ((string) llParseString2List (llStringToBase64 (text), ["="], [])) * 3) >> 2;
}

default
{
    state_entry ()
    {
        llListen (PUBLIC_CHANNEL, "", "", "");
    }

    listen (integer channel, string name, key id, string message)
    {
        integer length = llStringLength (message);
        integer bytes = GetStringBytes (message);
        llOwnerSay ("\"" + message + "\" contains " + (string) length + " characters and uses " + (string) bytes + " bytes.");
        integer index;
        while (index < length)
        {
            string character = llGetSubString (message, index, index);
            string character_base64 = llStringToBase64 (character);
            integer character_base64_integer = llBase64ToInteger (character_base64);
            llOwnerSay (character + ": " + character_base64 + ", " + (string) character_base64_integer);
            ++index;
        }
    }
}

//KT Kingsley
//ĶŦ Ķīňģşłęŷ

It produces this output:

KT Kingsley: KT Kingsley
Object: "KT Kingsley" contains 11 characters and uses 11 bytes.
Object: K: Sw==, 1258291200
Object: T: VA==, 1409286144
Object:  : IA==, 536870912
Object: K: Sw==, 1258291200
Object: i: aQ==, 1761607680
Object: n: bg==, 1845493760
Object: g: Zw==, 1728053248
Object: s: cw==, 1929379840
Object: l: bA==, 1811939328
Object: e: ZQ==, 1694498816
Object: y: eQ==, 2030043136
KT Kingsley: ĶŦ Ķīňģşłęŷ
Object: "ĶŦ Ķīňģşłęŷ" contains 11 characters and uses 21 bytes.
Object: Ķ: xLY=, -994705408
Object: Ŧ: xaY=, -978976768
Object:  : IA==, 536870912
Object: Ķ: xLY=, -994705408
Object: ī: xKs=, -995426304
Object: ň: xYg=, -980942848
Object: ģ: xKM=, -995950592
Object: ş: xZ8=, -979435520
Object: ł: xYI=, -981336064
Object: ę: xJk=, -996605952
Object: ŷ: xbc=, -977862656

It looks like my idea of comparing the character length of a string with its byte length will flag it as problematical for FURWARE when the byte length is greater.

The testing of individual characters didn't throw up the values I'd expected (I'd expected to see ASCII numerical values for ASCII characters), but it did suggest the intriguing possibility that ASCII characters evaluate as positive, while the extended characters evaluate as negative. Like I said, this isn't an area I'm familiar with.

I'm not going to test this any further (at least until I actually need it myself), but this does suggest a way of testing strings to see if they'll display properly in FURWARE.

Edited by KT Kingsley
Link to comment
Share on other sites

16 minutes ago, KT Kingsley said:

but it did suggest the intriguing possibility that ASCII characters evaluate as positive, while the extended characters evaluate as negative. Like I said, this isn't an area I'm familiar with.

This is, most likely, just a coincidence with integer overflow. And/or primarily: "If str contains fewer than 6 characters the return is unpredictable."

Edited by Wulfie Reanimator
  • Thanks 1
Link to comment
Share on other sites

Just now, Wulfie Reanimator said:

This is, most likely, just a coincidence with integer overflow. And/or primarily: "If str contains fewer then 6 characters the return is unpredictable."

Ah, I missed that. Maybe using some padding might help, somehow, if someone wanted to pursue this.

I bet there's already plenty of LSL wishlists that include a function that returns a numerical value for a string character.

Link to comment
Share on other sites

Just now, KT Kingsley said:

Ah, I missed that. Maybe using some padding might help, somehow, if someone wanted to pursue this.

I bet there's already plenty of LSL wishlists that include a function that returns a numerical value for a string character.

Imagine being able to access the individual characters of a string through an array index without having to explicitly create a new string with a separate function call.

  • Like 2
Link to comment
Share on other sites

I'm working on getting HexText on the wiki. Might be a day or two. HexText can display Unicode characters on 8 face prims. It handles Japanese/Chinese, and many other languages as well as many symbols. I'm not a fan of unicode characters in Display names, but it can deal with that. It was built for in-world displays in a variety of languages.

https://wiki.secondlife.com/wiki/HexText

Of particular note: Strife had long ago contributed a function to obtain the numerical value of a string containing unicode characters (up to 0xFFFF.)

http://wiki.secondlife.com/wiki/UTF-8

Edited by Hexadeci Mole
  • Like 2
  • Thanks 2
Link to comment
Share on other sites

2 hours ago, Wulfie Reanimator said:

Imagine being able to access the individual characters of a string through an array index without having to explicitly create a new string with a separate function call.

Oh, what a wish!

KT, Thanks for the test code. Looks perfect.

Hexadeci, Looks like you are creating a really useful tool. For this project I am pretty far down the road with FURWARE and use non-8-face prims and touch queries extensively so I will stay with it and keep your work in mind for the next project!

 

Thanks everyone!!

G

Link to comment
Share on other sites

10 hours ago, Mollymews said:

a way to keep things predictably tidy could be to use the displayname when all the chars in the displayname have a corresponding glyph image. When one of more of the displayname chars do not then use the username

Yes, I believe this is a good solution.

G

 

Link to comment
Share on other sites

20 hours ago, KT Kingsley said:

It looks like my idea of comparing the character length of a string with its byte length will flag it as problematical for FURWARE when the byte length is greater.

 

I am more confused. 

From the LSL String page: http://wiki.secondlife.com/wiki/String

'But note that in Mono, strings use UTF-16 and occupy two memory bytes per character.'

------------------------------------------------

From the UTF-16 Wikipedia page: https://en.wikipedia.org/wiki/UTF-16

The encoding is variable-length, as code points are encoded with one or two 16-bit code units.

Number
of bytes
First
code point
Last
code point
16-bit code 1 16-bit code 2
2 U+00000000 U+0000D7FF xxxxxxxxxxxxxxxx  
2 U+0000E000 U+0000FFFF xxxxxxxxxxxxxxxx  
4 U+00010000 U+0010FFFF 110110xxxxxxxxxx

110111xxxxxxxxxx

--------------------------------------------------------------------------

So, if  LSL/Mono follows the UTF-16 specification a character needs either two or four bytes in the raw binary values of the string. Thus,

for any string S

GetStringBytes(S) should equal 2 * llStringLength(S) (that is, if GetStringBytes is attempting to return the number of bytes in the binary memory representations of S)

That llStringLength("KT Kingsley") returns 11 makes perfect sense since the doc says it returns an integer whose value is the number of characters in the string. No disputing of confusion with this.

That GetStringBytes("KT Kingsley") also returns 11 seems not to square with the documentation for String in LSL and UTF-16.

It could be, however, that LSL is either optimizing memory usage and violating the UTF-16 spec OR, more likely, I think, is really using UTF-8 (which allows for 1,2, 3, or 4 bytes to represent a single character).

In KT's code I changed the while loops body to be

string character = llGetSubString (message, index, index);
            string character_base64 = llStringToBase64 (character);
            integer ordValue = Ord(character);
            integer character_base64_integer = llBase64ToInteger (character_base64);
            llOwnerSay (character + ": " + character_base64 + ", " + (string) character_base64_integer + " : OrdOutput : " + (string)ordValue);
            ++index;

 

I added the call to Ord() for each character and output it's value. The output is what one would expect. E.g., Ord("A") returns 65.

 

I think the test I will do is to loop through the characters of the display name (as shown by KT) and check that the output of Ord() is between 32 and 126 (the printable ASCII characters. If any character fails the test then use the user name.

 

I don't mean to belabor this point, really, I am just trying to learn as much about the underlying LSL execution engine, memory model, etc. as I can. So this thread has been a great learning experience.

 

Comments welcome!

 

G

 

BTW, in writing this I seem to have, at least slightly, un-confused myself :)

 

Link to comment
Share on other sites

51 minutes ago, GManB said:

I think the test I will do is to loop through the characters of the display name (as shown by KT) and check that the output of Ord() is between 32 and 126 (the printable ASCII characters. If any character fails the test then use the user name.

You could minimize testing by only doing so when the matching CHARS index is not found (equals -1).  The FURWARE script already performs that test but assigns index 68 (a "?") on NOT FOUND. Instead, use that opportunity to jump out of the loop and send a link message back to the referring script requesting  a safe name.

Edited by DoteDote Edison
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 1371 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...