Jump to content

"Linkset Data" is coming.


Lucia Nightfire
 Share

You are about to reply to a thread that has been inactive for 276 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

18 minutes ago, primerib1 said:

Hmmm... Come to think of it... based on the previous discussions, and depending on the architecture, the size of LSD Store (that is, the number of KVP within) might not matter.

*) If implemented as, say, microservice, then size has no impact

*) If implemented as a blob, size might be static: 64 KiB preallocated regardless of how many KVP inside

*) If implemented as actual hashtable/dict that gets composed into item object, then number of KVP does matter

Scripts are max 64k too (Mono at least), so to make LSD max 64K makes sense from that perspective: It's like another script footprint-wise.

Link to comment
Share on other sites

I spent most of the morning "fixing" my complex Profiling Json which I'm writing to LSD, so that I can put it in "table" format and sort it. Making big progress!

When I've made "real" progress using this Profiling LSD data to improve my code, I'll create a stand-alone post about it, working subject: "A Linkset Data Success Story - Profiling Scripts".

Now that I've got the Profiling data (what was called, how often, how long did it take) at the "atomic" level, I'll be able to start!

Laid awake last night coming up with the code improvements that are needed..other people count sheep, I hear.

ETA: I finally used the FindKeys function, to only get back the profiling data I want from LSD. I despise RegEx!

Edited by Love Zhaoying
Link to comment
Share on other sites

On 1/13/2023 at 9:01 PM, Quistess Alpha said:

PS. what key-packing method are you thinking of? because utf-8 is different/ more efficient than utf-16, @Mollymews's key-compression compresses a 36-character key to 27~25 bytes (variable depending on key) ; a llBase64 on each of the 4 encoded integers compresses to 24 bytes. The theoretical minimum is around 20~21 bytes.

 

Actually, the theoretical minimum for packing a UUID into UTF-8 is 16 bytes. Because overall a UUID has 32 hex digits = 32 nibbles = 16 bytes.

Because we have to avoid the first 32 characters, that means every byte needs to have 32 added.

As long as the result falls into the range [0x20, 0x7F] (that means original value in the range of [0x00, 0x5F]), the minimum number can be achieved. Original values in the range of [0x60, 0xFF] will be mapped to [0x80, 0x11F], and thus will need 2 bytes to encode in UTF-8.

So the packing range is between 16*-32 bytes, with expected average of 26 bytes.

* Actually 17 ... because there's one byte that, due to how the UUID version is encoded, is guaranteed to have a value > 0x80. Which means using 2 bytes already.

Edited by primerib1
  • Thanks 1
Link to comment
Share on other sites

20 minutes ago, primerib1 said:

Actually, the theoretical minimum for packing a UUID into UTF-8 is 16 bytes.

I'm presuming we're not using the 8th bit, because that's reserved for designating multi-byte characters.

7 bits per character -> 16*8/7 =~ 18.25 or 19 characters.

If you convert the encoded value of a UUID into base 95 (128 possibilities in 7 bits, less 32 = 96, one fewer for safety) 20 digits gives you a range up to 3584859224085422343574104404449462890625 values, which is just a bit bigger than the 2^128 possible values of a UUID.

  • Like 2
Link to comment
Share on other sites

On 1/13/2023 at 4:37 PM, Anna Salyx said:

The key differences, in my own understanding, is that the script is a "living" process. When a script moves from sim to sim it must be registered on the new host as a running process.  It's byte code loaded (if needed) and it's current stack and event queue applied, and finally given place in the queue do it's thing. Where as a mesh/prim object by itself is just a static thing and all that needs to be done is to provide the receiving sim a packed (I assume) copy of it's current properties alongside it's asset ID. The client/viewer renders it and Bob's your uncle.  Yes there is going to be some overhead in the LSD store, but that store is not an action item that requires VM registration and time slices. 

And as LZ pointed out (below), in the scheme of things it's not *that* much really.   If you're carrying 38 attachments each chock full to brim with LSD keys, maybe then it'll have an impact, but if all you've got is a small set of objects, 1 to 3 maybe, each with only a handful of keys moving around with you, that might not be even noticed. 

 

if I'm wrong on my admittedly limited info assessment on how things move from region to region, I'll be happy corrected so I can learn :)

From my understanding, the LSD store is a standard data type on the sim object. It probably adds only a fairly trivial amount of extra time to the (de)serialize of the object on region change.

We could likely test this if viewer already has a teleport-time metric we can access (or if we add one) and just do a bunch of TPs with and without a bunch of full LSD stores.

  • Thanks 1
Link to comment
Share on other sites

11 hours ago, Quistess Alpha said:

I'm presuming we're not using the 8th bit, because that's reserved for designating multi-byte characters.

7 bits per character -> 16*8/7 =~ 18.25 or 19 characters.

If you convert the encoded value of a UUID into base 95 (128 possibilities in 7 bits, less 32 = 96, one fewer for safety) 20 digits gives you a range up to 3584859224085422343574104404449462890625 values, which is just a bit bigger than the 2^128 possible values of a UUID.

Well, we currently have to avoid the first 32 characters of Unicode (\u0000 to \u0019) due to BUG-233015, so we can't just simply use 7 bits for encoding.

So yeah, Base95 seems good. But the implementation probably will get hairy, and likely a bit slow.

Or use one of the higher-efficiency encodings in this list here, which already has implementations we might be able to adapt to LSL: https://en.wikipedia.org/wiki/Binary-to-text_encoding#Encoding_standards

EDIT: OMG, I see you have actually implemented the Base95 algo! Ahaha, well done!

EDIT 2: All being said and done, standard usage of LSD should not need a packed_uuid; it's only when you really need to eke out every last byte of LSD that you should consider using a packed_uuid. Explore other way of adding more LSD space, like having a separate prim (MUST be separate/unlinked so it has its own LSD store) and talk to that prim using standard messaging. Shape it like a server hard disk, make it phantom, and plug it into your main object. You can then identify the object with something like "/dev/sda", "/dev/sdb" and so on... 😁

Edited by primerib1
Link to comment
Share on other sites

A Question about Persistence

To the best of my knowledge, it seems that LSD Store is made persistent (that is, actually stored in the asset server) whenever an attachment is detached (either via Detach action or clean Quit).

However, when do the LSD Stores of in-world objects become persistent?

And somewhat related: What will happen to them during weekly sim reset?

  • Like 1
Link to comment
Share on other sites

18 minutes ago, primerib1 said:

A Question about Persistence

To the best of my knowledge, it seems that LSD Store is made persistent (that is, actually stored in the asset server) whenever an attachment is detached (either via Detach action or clean Quit).

However, when do the LSD Stores of in-world objects become persistent?

And somewhat related: What will happen to them during weekly sim reset?

They behave exactly the same way as all other prim properties. 🙂 Assets instances on a sim are saved when they're de-rezzed, regardless of whether that's detaching, sim restart, etc.

Edited by Wulfie Reanimator
  • Like 1
  • Thanks 2
Link to comment
Share on other sites

7 hours ago, primerib1 said:

standard usage of LSD should not need a packed_uuid;

Yeah that's my opinion as well. script memory UUIDs are grossly inefficient because utf-16 takes up a constant 2-bytes per character, and I've heard rumors keys also store the numeric value of the UUID as well. utf-8 has neither of those issues, so you're already more than halfing the memory footprint while the key is stored as a string in LinksetData memory. I don't really see a good use-case for saving an extra 16 bytes on top of that, but since I already had a base 92 for positive integers lying around, (landmark coordinates are conveniently always positive) it wasn't too much of a stretch to work out once I wrapped my head around negative numbers. (I kinda wish I had figured out how to overflow just right so decoding didn't have to check the sign of the number, oh well.)

  • Like 1
Link to comment
Share on other sites

2 hours ago, Quistess Alpha said:

Yeah that's my opinion as well. script memory UUIDs are grossly inefficient because utf-16 takes up a constant 2-bytes per character, and I've heard rumors keys also store the numeric value of the UUID as well. utf-8 has neither of those issues, so you're already more than halfing the memory footprint while the key is stored as a string in LinksetData memory.

When you have to pull it into script-space, though, the UUIDs balloon into at least 90 bytes each. 102 bytes if stored as "key" type...

90 comes from 18 bytes of string overhead + 2 bytes for each character (and there are 36 chars in a "key")

Not sure what the other 12 bytes are for in the "key" type...

Edited by primerib1
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

On 1/13/2023 at 12:48 AM, primerib1 said:

Btw I've created a ticket requesting for the addition of yet another LSD function:

llLinksetDataDeleteKeys()

Ticket: https://jira.secondlife.com/projects/BUG/issues/BUG-233195

Yay! The issue is Accepted !!

(Yes I know that means it gets copied into LL's internal JIRA and no promises of being implemented, but at least it's not rejected outright!)

Link to comment
Share on other sites

On 1/16/2023 at 9:57 PM, primerib1 said:

Well, we currently have to avoid the first 32 characters of Unicode (\u0000 to \u0019) due to BUG-233015, so we can't just simply use 7 bits for encoding.

So yeah, Base95 seems good. But the implementation probably will get hairy, and likely a bit slow.

Or use one of the higher-efficiency encodings in this list here, which already has implementations we might be able to adapt to LSL: https://en.wikipedia.org/wiki/Binary-to-text_encoding#Encoding_standards

EDIT: OMG, I see you have actually implemented the Base95 algo! Ahaha, well done!

EDIT 2: All being said and done, standard usage of LSD should not need a packed_uuid; it's only when you really need to eke out every last byte of LSD that you should consider using a packed_uuid. Explore other way of adding more LSD space, like having a separate prim (MUST be separate/unlinked so it has its own LSD store) and talk to that prim using standard messaging. Shape it like a server hard disk, make it phantom, and plug it into your main object. You can then identify the object with something like "/dev/sda", "/dev/sdb" and so on... 😁

I have a few I've been working on, was planning to get it out last month, but been sick so much this Winter that everything is a mess.

One's have have so far:

Base64
Just a bunch of utility functions that wrap the built-in Base64 support. By far the fastest executing method.

BaseE91
The capital E is significant. This one is compatible with the existing BaseE91 implementations out there. Requires a lookup table due to the non-ordered dictionary used by this standard.

Base91
Base127 (which is bugged due to BUG-233015)
Any base from 91 through 127 can be done with the same method by just swapping out a couple constants and a magic number. I'd have to go back through my workbook, but I think <base91 wasn't worth the trade-offs over using the built in Base64, and >129 is getting in to the "bad for UTF-8". If staying within UTF-16, larger bases might make sense (I might have already figured that out and which, do need to check that workbook again...).

Base32k/Base32768
Not as space efficient as the others, but what I remember from the benchmarking, it was faster due to being a power of 2, which makes being unable to use Base128 more sad.

Base1T/Base1099511627776
Kinda similar to the last one with one of the ideas from the non-square bases. This was mostly an experiment to see if it was possible. It stores 40bits of data per two Unicode chars, so the number of actual bits used varies on if its UTF-8 or UTF-16 and per char depending on what codeblock its in. Code was pretty fast due to how simple this it is, the most complicated thing is that it uses 3 different blocks of unicode.

BaseN
I've had a generic BaseN implementation for years. It takes a list of integers of one arbitrary base and coverts it to another arbitrary base. Can get some good packing for binary data, but its like O(n^2), lol. With small lists, its not bad, but it becomes insane pretty quickly.

If speed is more important, stick with llBase64ToInteger(str) / llGetSubString(llIntegerToBase64(num),0, 5).

I'll try to finish and publish this library with benchmarks SOON-ish.

  • Thanks 2
Link to comment
Share on other sites

Using Base64 I think it's enough come to think of it.

It's a 3B:4B encoding, so the first 15 bytes of UUID can be encoded into 20 characters, while the last byte can simply have 32 added, and encoded directly to Unicode (1 or 2 bytes on UTF-18, 2 bytes on UTF-16)

So we end up with max 22 bytes in LSD (a saving of 14 bytes), or 42 bytes in script space (a saving of 60 bytes).

If we want to not do anything special with the last byte of UUID, we'll end up with 24 characters, which means 24 bytes in UTF-8, or 48 bytes in UTF-16. Still significant saving in the big picture of things (12 bytes and 54 bytes, respectively).

 

Link to comment
Share on other sites

On 1/16/2023 at 11:57 PM, primerib1 said:

All being said and done, standard usage of LSD should not need a packed_uuid;

There are some edge cases where it really does come in handy though.  I can store* ~1,771 unpacked keys into the data store. Using Mollymews llOrd/llChar packing method I can up that to ~2,383, (roughly ~600 more).  Everyday use scenarios, you're right and it probably wouldn't be worth the extra overhead to pack and unpack. But it's worth noting that it's use in some cases can be significant, especially when offloading to a DB prim is not feasible. 

Added note: Watching the discussion is interesting.  I'm not sure I'd need any method more advanced than what I'm using here, but it's always good to have options.

 

(* llLinksetDataWrite(key, "0"); )

  • Like 1
Link to comment
Share on other sites

46 minutes ago, Anna Salyx said:

I can store* ~1,771 unpacked keys into the data store. Using Mollymews llOrd/llChar packing method I can up that to ~2,383, (roughly ~600 more)

I haven't tried it, but my back-of-the envelope calculation says you could fit about 3120 keys if they were compressed using base 95 (20 bytes per key + 1 for an unused value) or ~700 more than Molly's method. If you could get rid of the value, that'd only give you another ~100 keys.

1 hour ago, primerib1 said:

[Base64 is] a 3B:4B encoding, so the first 15 bytes of UUID can be encoded into 20 characters, while the last byte can simply have 32 added, and encoded directly to Unicode (1 or 2 bytes on UTF-18, 2 bytes on UTF-16)

theoretically true, but in order to leverage existing base64 functions in LSL in the most obvious way, you have to encode in 32 bit -> 6 character blocks (*4 ints in a key = 24 characters). I might see if I can get the byteshifting to work it down to 20+1.5 next I have some free time though.

ETA: Actually I think that's basically what you said in your last sentence after I parsed it a few times...

Edited by Quistess Alpha
  • Like 1
Link to comment
Share on other sites

6 hours ago, Quistess Alpha said:

theoretically true, but in order to leverage existing base64 functions in LSL in the most obvious way, you have to encode in 32 bit -> 6 character blocks (*4 ints in a key = 24 characters). I might see if I can get the byteshifting to work it down to 20+1.5 next I have some free time though.

The key is to grab only 3 bytes at a time (6 hex digits), rather than 4 bytes. So the bytes meshes nicely with how Base64 works (3x8 bits => 4x6 bits). No need for bit twiddling 😉

 

EDIT: Just because I'm bored out of my mind at the office, I made a code. Down to 22 chars, and URL-safe.

Edited by primerib1
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  I am so deeply invested in LSD now, that I am about to change my "source" (Schema processing) script to process everything directly in LinksetData, instead of processing everything in memory and writing to LinksetData at the end for the client script to use.

  Because: This one "source" Schema processing script still has random stack/heap errors!  It has been bizarre, as it happens randomly.  Since the script starts out with 40K memory free, starting out with LSD instead of memory should work out fine.

  The client script that "consumes" the LSD data has been fine..so far. (Need to refactor its code still.)

  • Like 1
Link to comment
Share on other sites

20 hours ago, Anna Salyx said:

There are some edge cases where it really does come in handy though.  I can store* ~1,771 unpacked keys into the data store. Using Mollymews llOrd/llChar packing method I can up that to ~2,383, (roughly ~600 more).  Everyday use scenarios, you're right and it probably wouldn't be worth the extra overhead to pack and unpack. But it's worth noting that it's use in some cases can be significant, especially when offloading to a DB prim is not feasible. 

Added note: Watching the discussion is interesting.  I'm not sure I'd need any method more advanced than what I'm using here, but it's always good to have options.

 

(* llLinksetDataWrite(key, "0"); )

7 hours ago, Coffee Pancake said:

I think the issue at this point isn't how many keys you can store, but what can be meaningfully done with more than a thousand of them within the processing and output conditions of LSL.

I don't think there would ever be much call for needing that many keys at once in live memory. More so storing a lot of data for lookup when needed or to step through to do larger processes where size is of greater importance than raw speed.

The most obvious common use-case I could see would be for caching modified poses in furniture. With maximum floating point value limits (either fixed point or split decimal), should be able to store poses for quite a lot of residents.

The use-case we've had is storing level/map data. While it may not have many UUIDs in the structure, a UUID is essentially just 4 integers with a minorly inconvenient periodic hyphenation.

Those have been the two targets I've been working on supporting in a generic/template-able library.

 

19 hours ago, Quistess Alpha said:

I haven't tried it, but my back-of-the envelope calculation says you could fit about 3120 keys if they were compressed using base 95 (20 bytes per key + 1 for an unused value) or ~700 more than Molly's method. If you could get rid of the value, that'd only give you another ~100 keys.

theoretically true, but in order to leverage existing base64 functions in LSL in the most obvious way, you have to encode in 32 bit -> 6 character blocks (*4 ints in a key = 24 characters). I might see if I can get the byteshifting to work it down to 20+1.5 next I have some free time though.

ETA: Actually I think that's basically what you said in your last sentence after I parsed it a few times...

17 hours ago, primerib1 said:

The key is to grab only 3 bytes at a time (6 hex digits), rather than 4 bytes. So the bytes meshes nicely with how Base64 works (3x8 bits => 4x6 bits). No need for bit twiddling 😉

 

EDIT: Just because I'm bored out of my mind at the office, I made a code. Down to 22 chars, and URL-safe.

I'll test this later if I remember, but might only be slightly more execution-time to do a higher base instead and maybe still be URL-safe too. Either Base85 or BasE91 seem like decent options as they are somewhat standards that'll be easier to support on a remote server.

 

  • Like 1
Link to comment
Share on other sites

6 hours ago, Kadah Coba said:

Either Base85 or BasE91 seem like decent options as they are somewhat standards that'll be easier to support on a remote server.

Neither Base85 nor BasE91 are URL safe, though, as they use the symbols % + ? and &

Plus you might want to check if they're JSON-safe.

As for support, if I can make a proper encoder/decoder in LSL with all its limitations, then surely it will be implementable in other languages 😄

Edited by primerib1
Link to comment
Share on other sites

7 hours ago, Kadah Coba said:

a UUID is essentially just 4 integers with a minorly inconvenient periodic hyphenation.

Indeed. In script-space you can represent them as 4 integers then use strided list.

But since we're in the LSD thread, there's an additional limitation: You have to convert that into string as LSD can only store strings.

So all the "packing" discussion in this thread is just us discussing about the most efficient, reversible way of storing 4 integers in LSD.

If you simply stringify the integers, you can end up with 47 characters per encoded UUID when you're unlucky... (4 * 11 chars [if your integer is *very negative*] + 3 separators)

(Example of 'very negative' integer: -1564371818 = 11 characters)

Link to comment
Share on other sites

15 hours ago, Love Zhaoying said:

Because: This one "source" Schema processing script still has random stack/heap errors!  It has been bizarre, as it happens randomly. 

I think it's because the garbage collector triggered late. So the temporary lists etc you have within an event handler lingered for some time before they get freed.

And remember because all values are practically immutable, adding something to a list means you actually end up with 2 lists: Original list and the list with the added item. At least for awhile until GC gets triggered and the original list is destroyed.

And if you're quickly adding lots of items into a list (e.g. in a loop), you will quickly consume memory in a triangular way (1, 3, 6, 10, 15, 21, and so on... f(n) = sum(1..n) ).

And this begs the questions:

When does the GC get triggered?

Is it possible for us to trigger the GC? Maybe by switching to a different state then switching back?

Edited by primerib1
  • Thanks 1
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 276 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...