Jump to content

LSL String compression


Jenna Huntsman
 Share

You are about to reply to a thread that has been inactive for 573 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

Hey!

I've got a script which needs to pass a long list to another script via RegionSayTo

However, I notices on the LSL wiki that scripts can't receive messages longer than 256 characters, so I wanted to look into compressing the content of the list and decompressing it on the other side.

How could I do this? I looked at the llMD5String but that seems to only be useful to authenticate the content of a message, and not really compress it.

Thanks!

EDIT: Would llXorBase64 be suitable for this kind of thing? I don't really need the security aspect but if it can compress a longer string into a shorter one that'd be really useful!

Edited by Jenna Huntsman
Link to comment
Share on other sites

From the wiki entry for the listen event

  • Chat on public channel and positive channels is truncated to 1023 bytes. Chat on negative channels is truncated to 254 bytes.

So if you send on a positive channel you shouldn't have a need to compress the messages, unless you're streaming a novel :)

 

  • Thanks 1
Link to comment
Share on other sites

The wiki entry for the listen event at http://wiki.secondlife.com/wiki/Listen makes no mention of the length of messages: neither "1023", "256" or "254" appear on the page.

The wiki entry for llRegionSayTo at http://wiki.secondlife.com/wiki/LlRegionSayTo does specify a maximum message length of 1024 bytes; how many characters this can accommodate will depend on whether unicode is involved.

Edit: I did just check in-world, and I can confirm it's possible to send a 1024 byte message using llRegionSayTo to a script in another object which does receive it in full in its listen event, using a positive-numbered channel.

Edited by KT Kingsley
Link to comment
Share on other sites

adding to the conversation

short string compression is best achieved with a dictionary similar to how Google Brotli does it

the fixed dictionary contains short codes for words that are used frequently in the app

the first instance of a word not found in the fixed dictionary is encoded verbatim and then appended to the dictionary with an assigned short code. When a second instance of the word is found in the string then this is encoded with its assigned short code

Brotli links:

https://github.com/google/brotli

https://en.wikipedia.org/wiki/Brotli


after being run thru dictionary encoding then the string can potentially be compressed/packed further using a general compression algorithm. There are a few LSL implementations of general compression algorithms on the wiki. To find search on: LSL compression


when our source string only contains ASCII chars then a simple post-dictionary packing technique is to use Pedro Oval's Ord and Chr functions documented here: http://wiki.secondlife.com/wiki/Ord

with ASCII only source strings then with these functions we can effectively half the length of the string. Example:

string source = "~ Some string ~";
                
// encode : pack 2 ascii chars into 1 UTF-16 char       
string encoded;
integer strlen = llStringLength(source);
integer i;
for (i = 0; i < strlen; i += 2)
{
  integer c1 = Ord(llGetSubString(source, i, i));
  integer c2 = Ord(llGetSubString(source, i + 1, i + 1));
  integer n = (c1 << 8) | c2;
  string e = Chr(n);
  encoded += e;
}   

// decode
strlen = llStringLength(encoded);
string decoded;
for (i = 0; i < strlen; i++)
{
  integer c = Ord(llGetSubString(encoded, i, i));
  integer n1 = (c >> 8) & 255;
  integer n2 = c & 255;
  string d = Chr(n1) + Chr(n2);
  decoded += d;
}

// check
if (decoded == source)
  llOwnerSay("we are good");

 

Link to comment
Share on other sites

1 hour ago, KT Kingsley said:

The wiki entry for the listen event at http://wiki.secondlife.com/wiki/Listen makes no mention of the length of messages: neither "1023", "256" or "254" appear on the page.

True, it was the function I had read that on, it's in the notes for llListen()

Colour me fatigued

  • Thanks 1
Link to comment
Share on other sites

29 minutes ago, Profaitchikenz Haiku said:

True, it was the function I had read that on, it's in the notes for llListen()

Colour me fatigued

I can confirm that using both positive and negative channels, and using both llSay and llRegionSayTo a listen event will receive a full 1024 byte message.

Perhaps someone with wiki editing rights will see this and correct the wiki entry for llListen at https://wiki.secondlife.com/wiki/LlListen.

Link to comment
Share on other sites

52 minutes ago, KT Kingsley said:

I can confirm that using both positive and negative channels, and using both llSay and llRegionSayTo a listen event will receive a full 1024 byte message.

Perhaps someone with wiki editing rights will see this and correct the wiki entry for llListen at https://wiki.secondlife.com/wiki/LlListen.

I verified that script to script allows 1024 bytes for both positive and negative channels. Public channel 0 chat was cropped at 1023 bytes. Tested both LSL and Mono.

Edited the Wiki.

Edited by Phate Shepherd
  • Thanks 1
Link to comment
Share on other sites

3 hours ago, Jenna Huntsman said:

Hey!

I've got a script which needs to pass a long list to another script via RegionSayTo

However, I notices on the LSL wiki that scripts can't receive messages longer than 256 characters, so I wanted to look into compressing the content of the list and decompressing it on the other side.

 

If you are trying to send more than 1024 bytes, I'd be looking at something simpler... splitting the string into smaller chunks like 512 bytes, and putting a header on the front with a x of y parts and timestamp info on the front. Then reassembling on the other end if all parts have arrived, and the timestamps match. (Messages aren't guaranteed to arrive in order)

  • Like 2
Link to comment
Share on other sites

8 hours ago, Phate Shepherd said:

If you are trying to send more than 1024 bytes, I'd be looking at something simpler... splitting the string into smaller chunks like 512 bytes, and putting a header on the front with a x of y parts and timestamp info on the front. Then reassembling on the other end if all parts have arrived, and the timestamps match. (Messages aren't guaranteed to arrive in order)

agree

this would be a lot more efficient than compression ever will be

Link to comment
Share on other sites

Just building on this actually:

Did some modification to my scripts so now the strings are sent in blocks to solve the memory issue.

I've actually now run into another (mostly unrelated) issue wherein I have a long list of strings, however I'm running out of memory when that list gets too long. Each string is around 50 characters or so.

That Ord function looks like it might do the job of compressing that list well, would that be a good idea or should I look at another solution?

Thanks guys!

Link to comment
Share on other sites

One thing I'd suggest is to have a second script that only deals with the stored data and communicates with the main script using link messages, perhaps serving up just one list at a time. How useful that'd be probably depends on how much of the main script's memory is used by the code, and how difficult it'd be to implement on how you're using the data in the main script.

Link to comment
Share on other sites

6 minutes ago, Jenna Huntsman said:

Probably going on around 600. Script is running in Mono with no set memory limit

Ah okay, that's gonna be more memory than any single script can hold. Each string in a list is going to need 18 bytes + 2 per character, or 118 * 600 in your case. That's ~69 KB (Plus all the other code required to make use of that data.) while Mono is limited to 64 KB.

You'll either have to start compressing things (the Ord suggestion seems pretty good), or add more scripts whose only job is to hold data and pass it back to the main scripts when needed. Both have their pros and cons. Compression allows you to work with fewer scripts, but adding more scripts is easier to scale with your data. You might even consider an external server if you're expecting the amount to keep growing over time.

Link to comment
Share on other sites

  • 2 years later...

Hey all!

Had to come back here for a project that I'm working on. Did some updates to @Mollymews' Ord-based compression, to make use of the native LSL functions, and tighten up the function to use as little memory as possible.

string MemComp(string sInput, integer iDir)
{ //Compress 2 UTF-8 chars into UTF-16, or decompress UTF-16 to UTF-8. Credit: Jenna Huntsman, Mollymews
    string sOut;
    integer i;
    for (i = 0; i < llStringLength(sInput); ++i)
    {
        if(!iDir) //If iDir is 0, we're encoding - any other value will decode.
        { //Encode
            sOut += llChar((llOrd(sInput,i) << 8) | llOrd(sInput,i+1));
            ++i; //Iterate i by 2 and not 1 on encode.
        }
        else
        { //Decode
            integer c = llOrd(sInput,i);
            sOut += llChar((c >> 8) & 255) + llChar(c & 255);
        }
    }
    return sOut;
}

 

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

42 minutes ago, Jenna Huntsman said:

tighten up the function to use as little memory as possible

Good job! Some really minor suggestions, but wouldn't flipping the conditional save doing a negation operation for every character? if(iDir) instead of if(!iDir). Come to think of it, you could just place the conditional outside the loop, and if I'm being over-optimizationalist, overload the input parameter as the loop variable:

string MemComp(string sInput, integer iDir)
{ //Compress 2 UTF-8 chars into UTF-16, or decompress UTF-16 to UTF-8. Credit: Jenna Huntsman, Mollymews
    string sOut;
    if(iDir) // decode for every non-0 value.
    { //Decode
      for (iDir = 0; iDir < llStringLength(sInput); ++iDir) // overload iDir to be a loop increment.
      {   integer c = llOrd(sInput,iDir);
          sOut += llChar((c >> 8) & 255) + llChar(c & 255);
      }
    }else // idir==0
    { //Encode
      for (iDir = 0; iDir < llStringLength(sInput); ++iDir) // overload iDir to be a loop increment.
      {   sOut += llChar((llOrd(sInput,iDir) << 8) | llOrd(sInput,iDir+1));
          ++iDir; //Iterate i by 2 and not 1 on encode. 
          //(if you wanted to be fancy could embed ++ in the += line, but that would be harder to read and debug given lsl execution order, and no more efficient.)
          // something like llChar(llOrd(sInput,++iDir) | (llOrd(sInput,++iDir) << 8) ); with the increment removed from the for construction and initialize to -1.
      }
    }
    return sOut;
}

again though, really minor nitpicks!

Link to comment
Share on other sites

1 hour ago, Quistess Alpha said:

Good job! Some really minor suggestions, but wouldn't flipping the conditional save doing a negation operation for every character? if(iDir) instead of if(!iDir). Come to think of it, you could just place the conditional outside the loop, and if I'm being over-optimizationalist, overload the input parameter as the loop variable:

string MemComp(string sInput, integer iDir)
{ //Compress 2 UTF-8 chars into UTF-16, or decompress UTF-16 to UTF-8. Credit: Jenna Huntsman, Mollymews
    string sOut;
    if(iDir) // decode for every non-0 value.
    { //Decode
      for (iDir = 0; iDir < llStringLength(sInput); ++iDir) // overload iDir to be a loop increment.
      {   integer c = llOrd(sInput,iDir);
          sOut += llChar((c >> 8) & 255) + llChar(c & 255);
      }
    }else // idir==0
    { //Encode
      for (iDir = 0; iDir < llStringLength(sInput); ++iDir) // overload iDir to be a loop increment.
      {   sOut += llChar((llOrd(sInput,iDir) << 8) | llOrd(sInput,iDir+1));
          ++iDir; //Iterate i by 2 and not 1 on encode. 
          //(if you wanted to be fancy could embed ++ in the += line, but that would be harder to read and debug given lsl execution order, and no more efficient.)
          // something like llChar(llOrd(sInput,++iDir) | (llOrd(sInput,++iDir) << 8) ); with the increment removed from the for construction and initialize to -1.
      }
    }
    return sOut;
}

again though, really minor nitpicks!

I actually started off in a similar manner, but figured I could tighten up the code into a single loop which I thought would save on memory a bit. Neat idea about using iDir though, I just put my own spin on it to compress it back into using a single loop again. See what you think!

string MemComp(string sInput, integer iDir) //iDir is a bool, so should be FALSE (0) for encode or TRUE (1) for decode
{ //Compress 2 UTF-8 chars into UTF-16, or decompress UTF-16 to UTF-8. Credit: Jenna Huntsman, Mollymews
    string sOut;
    for (; iDir < llStringLength(sInput)*2; iDir = iDir + 2)
    {
        if(((iDir % 2) == 0)) //If iDir is a en even number (including 0), we're encoding - any other value will decode.
        { //Encode
            sOut += llChar((llOrd(sInput,iDir/2*2) << 8) | llOrd(sInput,(iDir/2*2)+1));
        }
        else
        { //Decode
            integer c = llOrd(sInput,iDir/2);
            sOut += llChar((c >> 8) & 255) + llChar(c & 255);
        }
    }
    return sOut;
}

 

Edited by Jenna Huntsman
Link to comment
Share on other sites

26 minutes ago, Jenna Huntsman said:

figured I could tighten up the code into a single loop which I thought would save on memory a bit.

I guess it depends on what you're trying to optimize. A single loop means the function itself will take up, maybe 10~50 bytes less script space? but will run incalculably slower. (because you're checking something you already know, for every character)

ETA: Also, I liked the first one better. % and / are both slow. Preffer &1 and >>1 (or *0.5 for floats) respectively .

(n/2)*2 == n&(~1) == n&(integer)(-2)

Edited by Quistess Alpha
Link to comment
Share on other sites

1 minute ago, Wulfie Reanimator said:

If the goal is absolutely minimal script memory usage, you could run it through @Sei Lisa's LSL PyOptimizer.

I'm not sure if there are any benefits in doing that since this is a user function, which should use a minimum 512 byte block of memory.

Thanks, I had heard of this but never saw it until now.

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 573 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...