Jump to content

(yet another) Key Packing Method (22 chars, URL safe, JSON safe)


primerib1
 Share

Recommended Posts

Sooo after much discussion about UUID packing in the LSD thread, I discovered a reliable method to downpack UUIDs to just 22 charactersAlways 22 characters, never more, never less

  • A saving of 14 bytes in LSD (38% saving)
  • A saving of 58 bytes in script-space (57% saving)

In addition, my method produces URL-safe characters, i.e., characters that won't be decoded as something else when used in a URL.

And it is also JSON-safe, meaning that the packed result can be immediately put in a JSON structure without needing any escaping.

The principle:

  1. Grab 6 nybbles (hex digits) from the UUID = 3 bytes = 24 bits
  2. Convert these 24 bits to 4 base64url symbols <= this is the 'secret sauce'
  3. Repeat from 1 until we have only 2 nybbles left
  4. Append the leftover 2 nybbles without further processing to the result (besides, 8 bits => 2 base64 symbols anyways, so no saving at all there)

The decoding is of course the reverse:

  1. Grab 4 symbols
  2. Convert the 4 symbols to 6 nybbles 
  3. Repeat from 1 until we have only 2 symbols left
  4. Append the leftover 2 symbols to the result

And without further ado:

// These are the "base64url" character set from RFC 4648 §5
string B64U = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
string XDIGITS = "0123456789abcdef";

string PackUuid(key id) {
    string hex = llDumpList2String(llParseString2List((string)id, ["-"], []), "");
    string result;
    integer len = llStringLength(hex) - 2;
    integer offs;
    integer portion;
    integer j;
    integer i;
    for (;i<len;i+=6) {
        portion = (integer)("0x" + llGetSubString(hex, i, i+5));
        for (j=0;j<4;++j) {
            offs = portion & 0x3F;
            portion = portion >> 6;
            result += llGetSubString(B64U, offs, offs);
        }
    }
    return result + llGetSubString(hex, -2, -1);
}

string UnPackUuid(string packed) {
    string result;
    integer idx;
    integer threebytes;
    integer j;
    integer i;
    for (;i<20;i+=4) {
        threebytes = 0;
        for (j=3;j>=0;--j) {
            idx = i + j;
            threebytes = (threebytes << 6) | llSubStringIndex(B64U, llGetSubString(packed, idx, idx));
        }
        for (j=0;j<6;++j) {
            idx = (threebytes & 0xF00000) >> 20;
            result += llGetSubString(XDIGITS, idx, idx);
            threebytes = threebytes << 4;
        }
    }
    return (
        llGetSubString(result, 0, 7) + "-" + llGetSubString(result, 8, 11) + "-" + llGetSubString(result, 12, 15) + "-"
        + llGetSubString(result, 16, 19) + "-" + llGetSubString(result, 20, -1) + llGetSubString(packed, -2, -1)
        );
}
Edited by primerib1
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

probably more performant one, using built-in funcs llIntegerToBase64() and llBase64ToInteger(), but the result of the packing is different from the above.

Also this method is not URL-safe; you might end up with the characters + and / , and those characters have special meanings when in a URL.

string XDIGITS = "0123456789abcdef";

string PackUuid2(key id) {
    string hex = llDumpList2String(llParseString2List((string)id, ["-"], []), "");
    string result;
    integer len = llStringLength(hex) - 2;
    integer portion;
    integer i;
    for (;i<len;i+=6) {
        // Must lshift by 2 to align the big-endian zeroes to multiples of 6
        portion = ((integer)("0x" + llGetSubString(hex, i, i+5))) << 2;
        result += llGetSubString(llIntegerToBase64(portion), 1, 4);
    }
    return result + llGetSubString(hex, -2, -1);
}

string UnPackUuid2(string packed) {
    string result;
    integer decoded;
    integer len = llStringLength(packed) - 2;
    integer idx;
    integer j;
    integer i;
    for(;i<len;i+=4) {
        // This is actually lshift by 8, then rshift by 2
        // The rshift is to neutralize the lshift performed during packing
        decoded = llBase64ToInteger("A" + llGetSubString(packed, i, i+3) + "A==") << 6;
        for (j=0; j<6; ++j) {
            idx = (decoded & 0xF0000000) >> 28;
            result += llGetSubString(XDIGITS, idx, idx);
            decoded = decoded << 4;
        }
    }
    return (
        llGetSubString(result, 0, 7) + "-" + llGetSubString(result, 8, 11) + "-" + llGetSubString(result, 12, 15) + "-"
        + llGetSubString(result, 16, 19) + "-" + llGetSubString(result, 20, -1) + llGetSubString(packed, -2, -1)
        );
}
Edited by primerib1
  • Like 1
Link to comment
Share on other sites

A note on these lines:

portion = ((integer)("0x" + llGetSubString(hex, i, i+5))) << 2;
result += llGetSubString(llIntegerToBase64(portion), 1, 4);

The left-shift by 2 makes the binary representation of the integer to be 6-bit aligned in such a way that when Base64-encoded, they look like this:

// [03:00] Object: : APHAMA==
// [03:00] Object: : AxA0yA==
// [03:00] Object: : ATkeeA==
// [03:00] Object: : Ak6OIA==
// [03:00] Object: : A/sO7A==

Note how every encoded value starts with "A" and ends with "A=="

Because of that, we can throw away the 1st char and the trailing 3 chars, getting only characters [1, 4].
(The thrown away characters are restored statically in the unpacking process)

Edited by primerib1
  • Like 1
Link to comment
Share on other sites

A possible further size optimization is to turn the last byte of UUID into a Unicode character, after adding 32 to it (to avoid the bug with characters < 32 ).

However, in UTF-8 this will save an additional byte only 37% of the time; 63% of the time you'll end up with 2 bytes.

(In UTF-16, though, you'll save 2 bytes, guaranteed)

Plus, you will end up with characters that might get corrupted / mangled if put in a URL.

Not to mention the additional script-space memory used to implement the encoding/decoding logic.

Because of the marginal improvement, but potential drawbacks, I decided to not process the last byte further and just append the hex digits "as is".

 

EDIT:

In case you want to know how to do it:

In the Packer side you need to use this:

    llChar(((integer)("0x" + llGetSubString(hex, -2, -1))) + 32);

Not too bad ... but on the UnPacker side you need to use this:

    integer itail = llOrd(llGetSubString(packed, -1, -1)) - 32;
    string stail = llGetSubString(XDIGITS, itail & 0xF, itail & 0xF);
    itail >>= 4;
    stail = llGetSubString(XDIGITS, itail & 0xF, itail & 0xF) + stail;

I'd say the bytes used to implement this last-byte encoding will likely exceed the bytes you save. Plus it makes the functions slightly slower. So, not worth it.

Edited by primerib1
  • Like 1
Link to comment
Share on other sites

  • primerib1 changed the title to (yet another) Key Packing Method (22 chars, URL safe, JSON safe)
  • 1 year later...

I just realized that the difference between "URL safe" Base64 and the "normal, non-URL-safe" Base64 is just replacing "+" and "/" of the latter, with "-" and "_", respectively.

So if you need a fast but URL-safe Base64, you can use the one here

 and do a global replace of "+" and "/" with "-" and "_", respectively, at the end of the PackUuid2 function, just before the 'return' statement.

Then, in the UnPackUuid2 function, perform the reverse at the very beginning of the function.

 

In code:

    }
    result = llReplaceSubString(result, "+", "-", 0);
    result = llReplaceSubString(result, "/", "_", 0);
    return result + llGetSubString(hex, -2, -1);
}

and

string UnPackUuid2(string packed) {
    packed = llReplaceSubString(packed, "-", "_", 0);
    packed = llReplaceSubString(packed, "_", "/", 0);
    string result;

 

And there you have it! Fast, yet URL-safe.

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...