Jump to content

Easy Scripting question - Handle Key Uniqueness


Love Zhaoying
 Share

You are about to reply to a thread that has been inactive for 667 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

I seem to remember that for many, possibly most purposes, storing just the last 4 characters of a "key" for comparison is "close enough".  

Can you please confirm this is correct?

My use-case is: I am getting handle(s) for dataserver() calls - in this case for Experience key-value data - and don't want to keep the entire key if necessary.

Part of the reason is: In addition to "storing" the handle-key temporarily in memory, I plan to use the JSON functions to look it up later.

Example:  

- hKey is the handle I want to use.

- OtherData is other data that I want to store with the key.

Storing OtherData with the handle-key:

      string sJSONVars = llSetJsonValue(sJONVars, ["Handles", (string)hKey], OtherData]);

Retrieving OtherData with the handle-key:

      OtherData = llGetJasonValue(sJONVars, ["Handles", (string)hKey]); // ==> Retrieves OtherData

So the question becomes: Is it "close enough" to just use the last 4 digits of the key in this case?

The dataserver() event would also just check for an entry with the last 4 digits of the key in the "Handles" JSON variable.

Other benefits include:

- Can store an Object as "OtherData" not just a value

- Can delete the associated JSON variable from "Handles" just knowing the last 4 digits of the key.

Thanks,

Love

 

Link to comment
Share on other sites

IMO, the answer is: Maybe, but probably not.

It's worth noting that functions like llGenerateKey (likely) doesn't generate unique keys, so if you're using a generated key, then it's 100% possible that the last 4 digits could match, while being a different key.

I'd probably look into using a key compression algorithm to leverage more UTF characters to compress a full key, which can be decompressed later if needed.

(This thread may be of use, although you may want to replace Ord with llOrd

)

  • Thanks 1
Link to comment
Share on other sites

IMO it would be a bit more 'Sane' to do something like

key k = ...; // get a key from somewhere.
string test = llGetSubString(llIntegerToBase64(llHash(k)),0,5); // 6 ascii characters that stand for our key.

which has 32 bits of entropy.

just taking the last 4 digits of a key only has 16 bits of entropy.

The key compression algorithm by Molly in the LSL library is more "exact" but can be a bit much depending.

  • Thanks 2
Link to comment
Share on other sites

1 minute ago, Jenna Huntsman said:

IMO, the answer is: Maybe, but probably not.

It's worth noting that functions like llGenerateKey (likely) doesn't generate unique keys, so if you're using a generated key, then it's 100% possible that the last 4 digits could match, while being a different key.

I'd probably look into using a key compression algorithm to leverage more UTF characters to compress a full key, which can be decompressed later if needed.

(This thread may be of use, although you may want to replace Ord with llOrd

)

I am only storing these for "a little while" - a key compression algorithm is a good solution but, overkill  for my use-case. 

If keys aren't "quite unique enough", I can certainly store the entire key since this specific scenario is short-term.

 

Link to comment
Share on other sites

2 minutes ago, Quistess Alpha said:

IMO it would be a bit more 'Sane' to do something like

key k = ...; // get a key from somewhere.
string test = llGetSubString(llIntegerToBase64(llHash(k)),0,5); // 6 ascii characters that stand for our key.

which has 32 bits of entropy.

just taking the last 4 digits of a key only has 16 bits of entropy.

The key compression algorithm by Molly in the LSL library is more "exact" but can be a bit much depending.

Thanks!

Link to comment
Share on other sites

Without getting into the details about how the keys are generated (I don't even remember), if you used the last 4 characters you were cutting the space of keys into 16 bits: only 2^16 = 65536 different ones, instead of 2^128. 65536 may sound big, but as per the Birthday problem and the square approximation formula there, it would take only ~36 entries to start getting into entire percents of collision chance. I wouldn't want a 1% collision chance.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

4 minutes ago, Frionil Fang said:

Without getting into the details about how the keys are generated (I don't even remember), if you used the last 4 characters you were cutting the space of keys into 16 bits: only 2^16 = 65536 different ones, instead of 2^128. 65536 may sound big, but as per the Birthday problem and the square approximation formula there, it would take only ~36 entries to start getting into entire percents of collision chance. I wouldn't want a 1% collision chance.

Thanks! While I'm only storing maybe 3-4 at a time for this case (and only 1 initially), you never know.  

 

Link to comment
Share on other sites

The closest to what Love is contemplating I am aware of is a library function key2chan, which takes an avatar key and generates a -ve channel number from 8 bits of the key, but I can't recall if it's the top 8 or bottom 8.

 

eta 

it uses 0x80000000 | (integer) ("0x" + (string) key)

 

Edited by Profaitchikenz Haiku
  • Thanks 1
Link to comment
Share on other sites

2 minutes ago, Profaitchikenz Haiku said:

The closest to what Love is contemplating I am aware of is a library function key2chan, which takes an avatar key and generates a -ve channel number from 8 bits of the key, but I can't recall if it's the top 8 or bottom 8.

You beat me, while I was posting something similar!

Link to comment
Share on other sites

Math time! Skimming Wikipedia the relevant equation here is:

{\displaystyle n(p;d)\approx {\sqrt {2d\cdot \ln \left({\frac {1}{1-p}}\right)}}.}

That is, if as Frionil Fang suggests, you want a less than 1% chance of collision, and you're using the 'last 4 digits' method, then you need to keep your list of keys below

sqrt(2*(2^16)*l(1/(1-0.01))) = 36 entries;  for less than a 0.1% chance, 11 entries.

Compare that to a 32 bit entropy which would allow for ~2931 entries with less than a 0.1% chance of collision.

 

  • Thanks 2
Link to comment
Share on other sites

12 minutes ago, Love Zhaoying said:

If keys aren't "quite unique enough", I can certainly store the entire key since this specific scenario is short-term.

Given that any other method is going to require processing AND as has been pointed out, still runs a risk of getting it wrong, I'd personally go for the whole key.

 

That said, I've used that key2chan method and never (yet) had it talk to the wrong person. 

  • Thanks 1
Link to comment
Share on other sites

5 minutes ago, Quistess Alpha said:

Math time! Skimming Wikipedia the relevant equation here is:

{\displaystyle n(p;d)\approx {\sqrt {2d\cdot \ln \left({\frac {1}{1-p}}\right)}}.}

That is, if as Frionil Fang suggests, you want a less than 1% chance of collision, and you're using the 'last 4 digits' method, then you need to keep your list of keys below

sqrt(2*(2^16)*l(1/(1-0.01))) = 36 entries;  for less than a 0.1% chance, 11 entries.

Compare that to a 32 bit entropy which would allow for ~2931 entries with less than a 0.1% chance of collision.

 

Oh dear, and I'm over here with Linden Lab, sitting on the "bad at maths" bench! 

* pats bench invitingly * We have cookies!

  • Sad 1
Link to comment
Share on other sites

2 minutes ago, Love Zhaoying said:

"bad at maths" bench!

Breaking it down in simple terms (hopefully) there are 3 different variables that are all related to eachother:

  • D : How many different compressed keys exist? In the "last 4 characters" case, there are 4 letters, and because of how keys work, each character can be one of 16 different letters. 16*16*16*16 = 16^4 = 65536. the more possibilities, the less likely that 2 of them will be the same.
  • N : How many compressed keys are you keeping in a list? The more keys you store at once, the more probable that two of them will be the same.
  • P : The probability that 2 or more of the things in the list are the same. You want this to be low. How low? if you dump your list and fill it again 1/p times, you could expect about 1 collision.

If you only have 5~10 things in your list, taking the last 4 characters isn't as awful as I would have expected, but you're going to get collisions every 5000~1000 keys. depends on how important avoiding collisions is for you. the llHash method isn't too bad computationally and has a ~Much lower expectation for collisions.

  • Thanks 1
Link to comment
Share on other sites

I got interested into the way key2chan works, especially since the wiki says this Keys cannot be converted to integers,

So what seems to happen is

(integer) ("0x" + (string) key) generates a 32-bit integer by grabbing as much of the key as will not overflow 32 bits before then turning on the top (-ve sign) bit just in case it wasn't already on.

Which then fits into

28 minutes ago, Quistess Alpha said:

Compare that to a 32 bit entropy which would allow for ~2931 entries with less than a 0.1% chance of collision.

 

Suggesting that one day (out of how many?) one of my scripts is going to say something to the wrong person

 

  • Thanks 1
Link to comment
Share on other sites

13 minutes ago, Quistess Alpha said:

the llHash method isn't too bad computationally and has a ~Much lower expectation for collisions.

..Referencing your previous suggestion, correct? 

53 minutes ago, Quistess Alpha said:
key k = ...; // get a key from somewhere.
string test = llGetSubString(llIntegerToBase64(llHash(k)),0,5); // 6 ascii characters that stand for our key.

 

  • Thanks 1
Link to comment
Share on other sites

16 minutes ago, Profaitchikenz Haiku said:

Suggesting that one day (out of how many?) one of my scripts is going to say something to the wrong person

Assuming there are always 64 people in any region (unrealistic, but just go with it.) (N=64) and given (integer)("0x"+(string)key) has 32 bits entropy, (D=2^32) we can expect your script to get it wrong once every: 1/p(N,D) ~= 1/(1-(((2^32)-1)/2^32)^((64*63)/2)) ~= 2,130,441 utterances.

 

Edited by Quistess Alpha
  • Thanks 1
Link to comment
Share on other sites

9 hours ago, Love Zhaoying said:

In addition to "storing" the handle-key temporarily in memory, I plan to use the JSON functions to look it up later.

For this reason, storing partial keys (especially only 4 chars) doesn't seem like a good idea. How much later? How many keys do you have expect to search through?

For small domains, like avatars in a single sim, I will sometimes pass avatar keys as the integer for on_rez events, for example. For this you can use the first part of a key, which is conveniently 8 hex chars, which is enough to define a full 32-bit integer (though it won't have wide distribution). You can literally just typecast the key as integer.

Edited by Wulfie Reanimator
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Wulfie Reanimator said:
10 hours ago, Love Zhaoying said:

In addition to "storing" the handle-key temporarily in memory, I plan to use the JSON functions to look it up later.

For this reason, storing partial keys (especially only 4 chars) doesn't seem like a good idea. How much later? How many keys do you have expect to search through?

This is for short-lifetime transactions. Later in this case means, just during the lifetime of the "handle=llFunction()" call, then the dataserver() response handler. Not very long at all. For this specific case, I'd probably expect no more than 4 or 5.

For some other later case, I'd not depend on the keys to still be in my JSON variable memory, unless I store the data for later (for example in an Experience key-value store). 

Link to comment
Share on other sites

48 minutes ago, Love Zhaoying said:

This is for short-lifetime transactions. Later in this case means, just during the lifetime of the "handle=llFunction()" call, then the dataserver() response handler. Not very long at all. For this specific case, I'd probably expect no more than 4 or 5.

For some other later case, I'd not depend on the keys to still be in my JSON variable memory, unless I store the data for later (for example in an Experience key-value store). 

In that case it should be fine, though personally (and this is totally a gut feeling), I would probably use more than 4 characters, just in case, especially with how convenient it is to use the first 8 instead. Comma.

Edited by Wulfie Reanimator
  • Thanks 1
Link to comment
Share on other sites

when we want to keep the JSON handles short and unique then is best to maintain the handle ident method ourselves, encoding this as the 1st element of the data returned in the dataserver event. Example:
 

dataserver(key id, string data)
{
   // JSON handle is the first two chars of data in the range ["10".."99"] which rolls over. Range managed by data source
   string handle = llGetSubString(data, 0, 1);


   ... write json ... handle ... data  
}

i would leave the handle in the data, so that should we want to extract the data from JSON as a chunk, the handle can readily accompany the data. Parsing the handle from the data chunk as necessary

i would also set the unique ident counter to 10 rather than 0, increment to 99 then rollover to start again at 10. So to not have to left pad a leading "0" for the numbers less than 10

 

 

 

Edited by Mollymews
_
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Mollymews said:

when we want to keep the JSON handles short and unique then is best to maintain the handle ident method ourselves, encoding this as the 1st element of the data returned in the dataserver event. Example:
 

dataserver(key id, string data)
{
   // JSON handle is the first two chars of data in the range ["10".."99"] which rolls over. Range managed by data source
   string handle = llGetSubString(data, 0, 1);


   ... write json ... handle ... data  
}

i would leave the handle in the data, so that should we want to extract the data from JSON as a chunk, the handle can readily accompany the data. Parsing the handle from the data chunk as necessary

i would also set the unique ident counter to 10 rather than 0, increment to 99 then rollover to start again at 10. So to not have to left pad a leading "0" for the numbers less than 10

 

 

 

Hi Molly, 

This is certainly an interesting  solution, but I don't see a way to make it fit my "use-case".

Here are the specifics of my current "use-case" (others will be different):

1) I will be calling the following LSL Experience functions:

llReadKeyValue()

llUpdateKeyValue()

llCreateKeyValue()

llDeleteKeyValue()

Also, potentially but not relevant to the use-case: llDataSizeKeyValue(), llKeysKeyValue(), llKeyCountKeyValue().

2) All of the above functions return GUID's. (Unless I've gone mad and forgotten, again.)

When the dataserver() event fires, the "key id" parameter will contain a GUID.  This is the same GUID value that was returned to the original caller of the llFunctions() above. (The same llFunctions() that then ultimately, asynchronously result in the dataserver() event being fired.)

3) The dataserver() event returns the "string data" parameter in the format shown for example in https://wiki.secondlife.com/wiki/LlReadKeyValue:

Poorly explained by the page:

        The dataserver callback parameters are:

        A key containing the handle returned from llReadKeyValue

        A string containing a comma-delimited list (cdl). llDumpList2String([ integer success ] + components);

        components vary depending upon success or failure of request.

        Failure: cdl = llDumpList2String([ 0, integer error],",")

        Success: cdl = llDumpList2String([ 1, string value ],",")

Anyway, the "string data" parameter will be:

    Success/Error flag  (1 character), + "," (Comma) + Error Code (if failure) / Experience call result Data (if success).

So, for this "use-case" there is no way to "maintain a handle identification method" as you say, unless I want to maintain a separate GUID-to-other-handle relationship, then have the dataserver() event add the not-GUID handle to the data instead of using the original GUID. 

My current solution is that the dataserver() will call a custom "callback" function; I have to stick with this solution due to the overall architecture of my script.  

Thanks,

Love

Link to comment
Share on other sites

2 hours ago, Love Zhaoying said:

Anyway, the "string data" parameter will be:

    Success/Error flag  (1 character), + "," (Comma) + Error Code (if failure) / Experience call result Data (if success).

 

i see what you mean

you don't have a lot of choices

1) use the uuid as is (36 chars)

2) losslessly encode the uuid down to 9 chars

3) use a trim or hash of the uuid and gamble against a collision

 

if you can find a way to afford the 36 chars then go with 1)

i personally would not go with 3). I would not want to make gambling a component of this kind of application. A bet that will lose. A loss from which you will have to work out how to recover or settle for ignoring the loss as if it never happened

we can ignore losses in some situations. Like a visitor greeter that greets a visitor twice and ignores the other visitor. But am pretty sure you are not making a visitor greeter

 

 

Edited by Mollymews
typd
  • Thanks 1
Link to comment
Share on other sites

29 minutes ago, Mollymews said:

i see what you mean

you don't have a lot of choices

1) use the uuid as is (36 chars)

2) losslessly encode the uuid down to 9 chars

3) use a trim or hash of the uuid and gamble against a collision

 

if you can find a way to afford the 36 chars then go with 1)

i personally would not go with 3). I would not want to make gambling a component of this kind of application. A bet that will lose. A loss from which you will have to work out how to recover or settle for ignoring the loss as if it never happened

we can ignore losses in some situations. Like a visitor greeter that greets a visitor twice and ignores the other visitor. But am pretty sure you are not making a visitor greeter

 

 

I think that in this case, I will go with "1" as the lifetime of the needed information is short: The specific object using the script will only need to get Experience key/value data for 1 years, the attached owner.

If I were coding a "server" object, on the Experienced-enabled land, and it needed to remember the GUID keyed data for many users at once, then I would shorten the key GUID's using one of the recommended methods. 

Here's a potential sequence, where I would probably need to do something like you are suggesting - a "roll your own" handle:

- The User object can't call llExperience() functions because it is not on Experience-enabled land.  The User object knows the GUID of the Server object.

- The Server object is on the Experienced-enabled land, so can call llExperience() functions for the User object.

1) User object sends request to Server object: In this case, the user object would not know the GUID key of the actual llExperience() request (read/create/update/delete key), as the Server is doing the actual llExperience() request.

The User object would need to generate and store it's own "handle" to match against the Server object's response - this looks like the use-case for what you were suggesting.

This "handle" could be the user's GUID + an incrementing number.  

We'll call that a "roll your own" handle in the remaining steps.

2) Server gets request from "user object" (identifying the objec't's GUID) with the actual user's GUI and request information. (Is it a Read/Create/Update/Delete, what's the data, etc.)

3) Server calls llExperience() function - server must save the GUID key returned by the llExperience() function, and associate it with the original User object's "roll your own" handle.

4) Server's dataserver() event fires with the reply from the llExperience() function.  

Server must lookup the "roll your own" handle information given the key passed in the dataserver() event.

5) Server returns response to the user object with the original "roll your own" handle.

Server object can immediately forget this "roll your own" handle information.

6) User object gets the server response, identified with the "roll your own" handle.

User object can immediately forget the "roll your own" handle information (after invoking a callback function with the response data).

 

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 667 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...