Jump to content

Key2LSD packer/unpacker (optimal size 19 UTF-8 chars)


elleevelyn
 Share

You are about to reply to a thread that has been inactive for 108 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

I noticed (March 2023) that Linden fixed the LSD UTF-8 to UTF-16 bug. So I rewrote Key2LSD and place it here in a post of its own. Is a bit more efficient than the previous bug-workround version

/* pack/unpack a key into 19 UTF-8 symbols

   as wrote it allows char(0x1) to be optionally appended for 20 symbols
   appending 0x1 prevents cross-boundary collisions when searching a string of packed keys
   
   credit: Mollymews/irihapeti for the carry and fast int2hex methods      

   public domain (March 2023)
*/ 
            
string Key2LSD(string k, integer delimit)
{   // encode high to low
    string result;
    integer carry;
    integer sign;

    // strip dash chars
    k = llDumpList2String(llParseString2List(k, ["-"], []), "");
    
    // encode symbols. loop 4 times: 128 bits / 32 bits = 4
    integer i;
    for (i; i < 25; i += 8)
    {
        // read 8 hex symbols into 32-bit buffer
        // strip off the sign bit and save (negative division coding is unnecessarily complex)
        integer buffer = (integer)("0x" + llGetSubString(k, i, i + 7));
        sign = (sign << 1) + ((buffer >> 31) & 1);
        buffer = buffer & 0x7FFFFFFF;    

        // encode remainder 31 bits        
        integer j;
        for (j; j < 4; ++j)
        {
            // undecodable value 0x0
            // allow 0x1 as a optional delimiter 
            // leaves 126 encodable values
            // shift base126 into base128 symbol             
            result += llChar(buffer % 126 + 2);
            buffer /= 126;
        }
        // carry is max. base9
        carry = carry * 9 + buffer;
    }
    
    // encode carry
    // 4 * log2(9) + 4 (high bits) = 16.68 bits
    carry = (carry << 4) + sign;
    for (i = 0; i < 3; ++i)
    {
        result += llChar(carry % 126 + 2);
        carry /= 126;
    }
    
    // append delimiter when requested
    if (delimit) result += llChar(0x1);
    return result;
}

key LSD2Key(string s)
{   // decode low to high
    // note: decoder ignores the 20th symbol (0x1) when appended

    integer dash = 0x3C;
    // note: list lookup is the fastest way to do Int2Hex
    list hex = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f"];
    
    string result;
    integer carry;
    integer sign;
        
    // decode carry and sign
    integer i;
    for (i = 18; i > 15; --i)
        carry = carry * 126 + (llOrd(s, i) - 2); 
    sign = carry & 15;
    carry = carry >> 4;

    // decode symbols
    for (i = 12; i >= 0; i -= 4)
    {
        integer buffer = carry % 9;
        carry /= 9;
        integer j;  
        for (j = i + 3; j >= i; --j)
            buffer = buffer * 126 + (llOrd(s, j) - 2); 

        // restore sign bit to buffer
        buffer = buffer | ((sign & 1) << 31);
        sign = sign >> 1;
      
        // buffer to hex symbols
        string h;
        if (dash & 1) h = "-";
        dash = dash >> 1;
        h += llList2String(hex, (buffer >> 28) & 15) +
            llList2String(hex, (buffer >> 24) & 15) +
            llList2String(hex, (buffer >> 20) & 15) +
            llList2String(hex, (buffer >> 16) & 15);
        if (dash & 1) h += "-";
        dash = dash >> 1;
        h += llList2String(hex, (buffer >> 12) & 15) +
            llList2String(hex, (buffer >> 8) & 15) +
            llList2String(hex, (buffer >> 4) & 15) +
            llList2String(hex, buffer & 15);
            
        result = h + result; 
    }
    return (key)result; 
}

default
{
    state_entry()
    {
        key k = llGenerateKey();
        string e = Key2LSD(k, FALSE);  // no delimiter
        string t = Key2LSD(k, TRUE); // delimited with 0x1
        string d = LSD2Key(t);                   
        
        llOwnerSay("\nk: " + (string)k +
            "\nd: " + (string)d + 
            "\ne: " + e + "  len: " + (string)llStringLength(e) +
            "\nt: " + t + "  len: " + (string)llStringLength(t));       
    }   
}

 

Edited by elleevelyn
typo
  • Thanks 1
Link to comment
Share on other sites

a variable length key packer where max. length is 19 and min. length is 16. Is not compatible with the above fixed length packer

/* pack/unpack a key into a variable length string of UTF-8 symbols

   Max length 19. Min length 16

   as wrote it uses base127 so best used in the single key case
   the carry is encoded such that any trailing 'zero' symbols are removed
   
   credit: Mollymews/irihapeti for the carry and fast int2hex methods      

   public domain (March 2023)
*/ 
            
string Key2Lsd(string k)
{   // encode high to low
    string result;
    integer carry;
    integer sign;

    // strip dash chars
    k = llDumpList2String(llParseString2List(k, ["-"], []), "");
    
    // encode symbols. loop 4 times: 128 bits / 32 bits = 4
    integer i;
    for (i; i < 25; i += 8)
    {
        // read 8 hex symbols into 32-bit buffer
        // strip off the sign bit and save (negative division coding is unnecessarily complex)
        integer buffer = (integer)("0x" + llGetSubString(k, i, i + 7));
        sign = (sign << 1) + ((buffer >> 31) & 1);
        buffer = buffer & 0x7FFFFFFF;    

        // encode remainder 31 bits        
        integer j;
        for (j; j < 4; ++j)
        {
            // undecodable value 0x0
           // leaves 127 encodable values
            // shift base127 into base128 symbol             
            result += llChar(buffer % 127 + 1);
            buffer /= 127;
        }
        // carry is max. base9
        carry = carry * 9 + buffer;
    }
    
    // encode carry
    carry = (carry << 4) + sign;
    while (carry)
    {
        result += llChar(carry % 127 + 1);
        carry /= 127;
    }
    return result;
}

key Lsd2Key(string s)
{   // decode low to high
    // note: decoder ignores the 20th symbol (0x1) when appended

    integer dash = 0x3C;
    // note: list lookup is the fastest way to do Int2Hex
    list hex = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f"];
    
    string result;
    integer carry;
    integer sign;
        
    // decode carry and sign
    integer i = llStringLength(s) - 1;
    for (i; i > 15; --i)
        carry = carry * 127 + (llOrd(s, i) - 1); 
    sign = carry & 15;
    carry = carry >> 4;

    // decode symbols
    for (i = 12; i >= 0; i -= 4)
    {
        integer buffer = carry % 9;
        carry /= 9;
        integer j;  
        for (j = i + 3; j >= i; --j)
            buffer = buffer * 127 + (llOrd(s, j) - 1); 

        // restore sign bit to buffer
        buffer = buffer | ((sign & 1) << 31);
        sign = sign >> 1;
      
        // buffer to hex symbols
        string h;
        if (dash & 1) h = "-";
        dash = dash >> 1;
        h += llList2String(hex, (buffer >> 28) & 15) +
            llList2String(hex, (buffer >> 24) & 15) +
            llList2String(hex, (buffer >> 20) & 15) +
            llList2String(hex, (buffer >> 16) & 15);
        if (dash & 1) h += "-";
        dash = dash >> 1;
        h += llList2String(hex, (buffer >> 12) & 15) +
            llList2String(hex, (buffer >> 8) & 15) +
            llList2String(hex, (buffer >> 4) & 15) +
            llList2String(hex, buffer & 15);
            
        result = h + result; 
    }
    return (key)result; 
}

default
{
    state_entry()
    { 
        //key k = NULL_KEY;
        key k = llGenerateKey();
        string e = Key2Lsd(k);
        string d = Lsd2Key(e);                   
        
        llOwnerSay("\nk: " + (string)k +
            "\nd: " + (string)d + 
            "\ne: " + e + "  len: " + (string)llStringLength(e));  
    }   
}

 

Edited by elleevelyn
typo
Link to comment
Share on other sites

i just add on here that is possible to write a variable length packer such that max. length is 19 and min. length is 1.  It uses a different more  complex algorithm than above.

i am not sure tho the extra time to execute would be worth it, given that the large majority of the set of all keys would still be in the 19 and 18 length category

i might have a go at it if I get a time, but I think would be more about curious than anything

Edited by elleevelyn
Link to comment
Share on other sites

4 hours ago, elleevelyn said:
    // note: list lookup is the fastest way to do Int2Hex

I felt like I needed to add a caveat to this! The lookup list should be declared as a global "constant" variable. Using a local stack-based assignment has lots of overhead and a global list is over 2x faster.

Context: I'm working on an emulator that is both memory and performance constrained, and to show the CPU internal state it needs to call dec2hex dozens of times per second. I can't use the list lookup due to the memory hit -- 500-1000 bytes -- compared to a llChar loop function which is only slightly slower than a local list lookup; unrolling the loop costs considerable memory for negligible performance boost, too. 

Edited by Frionil Fang
  • Thanks 1
Link to comment
Share on other sites

7 hours ago, Frionil Fang said:

I felt like I needed to add a caveat to this! The lookup list should be declared as a global "constant" variable. Using a local stack-based assignment has lots of overhead and a global list is over 2x faster

back in the day when Key2UTF was being developed there was discussion over the street about this, and from what you report is still the case. So is good to know when we get into performance optimisation

about llChar vs llList2String. llChar is a relatively slow function generally

from what little testing I have done llChar has about 40-50% the performance of llList2String in the inline Int2Hex case
 
 llChar((48 + ((buffer >> 28) & 15) + ((buffer > 9) * 39));

 llList2String(hex, (buffer >> 28) & 15);

I may be wrong about the exact performance ratio tho, so am happy to be corrected

 

 

Link to comment
Share on other sites

38 minutes ago, elleevelyn said:

rom what little testing I have done llChar has about 40-50% the performance of llList2String in the inline Int2Hex case

I'm not super professional about my testing, but this seems to match what I got -- if you're using a global list and not a local one. After compensating for the general function call overhead with a dummy conversion function, global list lookup was 2-3x faster than llChar, and local list lookup was only modestly faster, maybe 20%. I, uh, didn't save the exact numbers and forgot already.

For memory use, the looped-llChar was by far the best by at least 500 bytes (and 1000 bytes when I tested it in my project and not just a test harness, so all these numbers are ballpark), so if you're really pushed for memory, that's something to look out for.

Edited to add: also in my actual project, using a local list lookup was 40% *slower* than llChar and the global list was barely noticeably faster, the increased memory strain may have had something to do with it with the script hanging on at ~4k free memory in the best case.

Edited by Frionil Fang
Link to comment
Share on other sites

yes agree. When we talking about efficiency/optimisation then it depends on what we are optimising for. In general there are tradeoffs in memory used vs execution time. And as you experiencing there can be low memory used execution time vs high memory used execution time

Edited by elleevelyn
-
Link to comment
Share on other sites

i put this here as when we start getting into packing keys then we sometimes start thinking about packing other types like integer, float, vector, rotation; given that when we cast these to string then the result can at times be verbose

which can be problematic when we wanting to stuff as much info into the LSD as possible. The merits of stuffing the LSD or otherwise in this way notwithstanding

 

// Base126 Integer to UTF-8 packer
// allows char(0x1) as delimiter appended external to this function
// return string can be stuffed into LSD and retrieved 
// variable length: min = 1 max = 5
//
// public domain (March 2023) 

string Int2LSD(integer n)
{   // encode low to high
    integer sign = n < 0;
    // when negative flip bits so -1 -2 .. use same space as +1 +2 ..
    if (sign) n = n ^ 0xFFFFFFFF;
    // encode sign and log2(63) value bits into 1st char
    // log2(63) so 1st char cannot be char(0 or 1) 
    string result = llChar(((n % 63 + 1) << 1) + sign);
    n /= 63;
    // encode remainder log2(126) + 2 so chars cannot be char(0 or 1);  
    while (n)
    {
        result += llChar(n % 126 + 2);
        n /= 126;  
    }
    return result;
}

integer LSD2Int(string s)
{   // decode high to low
    integer result;
    integer i = llStringLength(s) - 1; 
    for (i; i; --i) 
        result = result * 126 + (llOrd(s, i) - 2);
    i = llOrd(s, 0);
    result = result * 63 + ((i >> 1) - 1);
    if (i & 1) result = result ^ 0xFFFFFFFF;
    return result;    
}

/*
    note: at this time (March 2023) optimal size packing of the 
    set of all floats uses the functions: fui() and iuf()
    https://wiki.secondlife.com/wiki/Combined_Library
    
    string packed = Int2LSD(fui(afloat));
    float unpacked = iuf(LSD2Int(packed));
    
    from here we can build up to packing vectors and rotations
*/    
    
default
{
    state_entry()
    {
        llOwnerSay("...");
        integer v = -3; 
        //integer v = 0x80000000; 
        //integer v = 0x7FFFFFF8;
        integer n;
        for (n = v; n < v + 7; ++n)
        {
            string e = Int2LSD(n);
            integer d = LSD2Int(e);
            llOwnerSay("n: " + (string)n + " d: " + (string)d + " e: " + e + " len: " + (string)llStringLength(e));
        }
    }
}

 

Edited by elleevelyn
(126 + 2) not (126 + 1)
Link to comment
Share on other sites

  • 9 months later...
On 3/23/2023 at 3:42 AM, Quistess Alpha said:

It's possible but very hacky to convert floats to hexadecimal losslessly, and then convert that to an integer which can be used in the standard way:

https://lslwiki.digiworldz.com/lslwiki/wakka.php?wakka=LibraryFloat2Hex (not my work)

i been meaning to comment on this, so I do now

this method makes decoding a whole lot faster. So when decoding speed is important (and in pretty much every use case it is) then go with this combined method as Tessa suggests

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 108 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...