(yet another) Key Packing Method (22 chars, URL safe, JSON safe)

primerib1 · January 20, 2023

Sooo after much discussion about UUID packing in the LSD thread, I discovered a reliable method to downpack UUIDs to just 22 characters. Always 22 characters, never more, never less

A saving of 14 bytes in LSD (38% saving)
A saving of 58 bytes in script-space (57% saving)

In addition, my method produces URL-safe characters, i.e., characters that won't be decoded as something else when used in a URL.

And it is also JSON-safe, meaning that the packed result can be immediately put in a JSON structure without needing any escaping.

The principle:

Grab 6 nybbles (hex digits) from the UUID = 3 bytes = 24 bits
Convert these 24 bits to 4 base64url symbols <= this is the 'secret sauce'
Repeat from 1 until we have only 2 nybbles left
Append the leftover 2 nybbles without further processing to the result (besides, 8 bits => 2 base64 symbols anyways, so no saving at all there)

The decoding is of course the reverse:

Grab 4 symbols
Convert the 4 symbols to 6 nybbles
Repeat from 1 until we have only 2 symbols left
Append the leftover 2 symbols to the result

And without further ado:

// These are the "base64url" character set from RFC 4648 §5
string B64U = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
string XDIGITS = "0123456789abcdef";

string PackUuid(key id) {
    string hex = llDumpList2String(llParseString2List((string)id, ["-"], []), "");
    string result;
    integer len = llStringLength(hex) - 2;
    integer offs;
    integer portion;
    integer j;
    integer i;
    for (;i<len;i+=6) {
        portion = (integer)("0x" + llGetSubString(hex, i, i+5));
        for (j=0;j<4;++j) {
            offs = portion & 0x3F;
            portion = portion >> 6;
            result += llGetSubString(B64U, offs, offs);
        }
    }
    return result + llGetSubString(hex, -2, -1);
}

string UnPackUuid(string packed) {
    string result;
    integer idx;
    integer threebytes;
    integer j;
    integer i;
    for (;i<20;i+=4) {
        threebytes = 0;
        for (j=3;j>=0;--j) {
            idx = i + j;
            threebytes = (threebytes << 6) | llSubStringIndex(B64U, llGetSubString(packed, idx, idx));
        }
        for (j=0;j<6;++j) {
            idx = (threebytes & 0xF00000) >> 20;
            result += llGetSubString(XDIGITS, idx, idx);
            threebytes = threebytes << 4;
        }
    }
    return (
        llGetSubString(result, 0, 7) + "-" + llGetSubString(result, 8, 11) + "-" + llGetSubString(result, 12, 15) + "-"
        + llGetSubString(result, 16, 19) + "-" + llGetSubString(result, 20, -1) + llGetSubString(packed, -2, -1)
        );
}

Edited January 21, 2023 by primerib1

primerib1 · January 20, 2023

A probably more performant one, using built-in funcs llIntegerToBase64() and llBase64ToInteger(), but the result of the packing is different from the above.

Also this method is not URL-safe; you might end up with the characters + and / , and those characters have special meanings when in a URL.

string XDIGITS = "0123456789abcdef";

string PackUuid2(key id) {
    string hex = llDumpList2String(llParseString2List((string)id, ["-"], []), "");
    string result;
    integer len = llStringLength(hex) - 2;
    integer portion;
    integer i;
    for (;i<len;i+=6) {
        // Must lshift by 2 to align the big-endian zeroes to multiples of 6
        portion = ((integer)("0x" + llGetSubString(hex, i, i+5))) << 2;
        result += llGetSubString(llIntegerToBase64(portion), 1, 4);
    }
    return result + llGetSubString(hex, -2, -1);
}

string UnPackUuid2(string packed) {
    string result;
    integer decoded;
    integer len = llStringLength(packed) - 2;
    integer idx;
    integer j;
    integer i;
    for(;i<len;i+=4) {
        // This is actually lshift by 8, then rshift by 2
        // The rshift is to neutralize the lshift performed during packing
        decoded = llBase64ToInteger("A" + llGetSubString(packed, i, i+3) + "A==") << 6;
        for (j=0; j<6; ++j) {
            idx = (decoded & 0xF0000000) >> 28;
            result += llGetSubString(XDIGITS, idx, idx);
            decoded = decoded << 4;
        }
    }
    return (
        llGetSubString(result, 0, 7) + "-" + llGetSubString(result, 8, 11) + "-" + llGetSubString(result, 12, 15) + "-"
        + llGetSubString(result, 16, 19) + "-" + llGetSubString(result, 20, -1) + llGetSubString(packed, -2, -1)
        );
}

Edited January 20, 2023 by primerib1

primerib1 · January 20, 2023

A note on these lines:

portion = ((integer)("0x" + llGetSubString(hex, i, i+5))) << 2;
result += llGetSubString(llIntegerToBase64(portion), 1, 4);

The left-shift by 2 makes the binary representation of the integer to be 6-bit aligned in such a way that when Base64-encoded, they look like this:

// [03:00] Object: : APHAMA==
// [03:00] Object: : AxA0yA==
// [03:00] Object: : ATkeeA==
// [03:00] Object: : Ak6OIA==
// [03:00] Object: : A/sO7A==

Note how every encoded value starts with "A" and ends with "A=="

Because of that, we can throw away the 1st char and the trailing 3 chars, getting only characters [1, 4].
(The thrown away characters are restored statically in the unpacking process)

Edited January 21, 2023 by primerib1

primerib1 · January 20, 2023

A possible further size optimization is to turn the last byte of UUID into a Unicode character, after adding 32 to it (to avoid the bug with characters < 32 ).

However, in UTF-8 this will save an additional byte only 37% of the time; 63% of the time you'll end up with 2 bytes.

(In UTF-16, though, you'll save 2 bytes, guaranteed)

Plus, you will end up with characters that might get corrupted / mangled if put in a URL.

Not to mention the additional script-space memory used to implement the encoding/decoding logic.

Because of the marginal improvement, but potential drawbacks, I decided to not process the last byte further and just append the hex digits "as is".

EDIT:

In case you want to know how to do it:

In the Packer side you need to use this:

    llChar(((integer)("0x" + llGetSubString(hex, -2, -1))) + 32);

Not too bad ... but on the UnPacker side you need to use this:

    integer itail = llOrd(llGetSubString(packed, -1, -1)) - 32;
    string stail = llGetSubString(XDIGITS, itail & 0xF, itail & 0xF);
    itail >>= 4;
    stail = llGetSubString(XDIGITS, itail & 0xF, itail & 0xF) + stail;

I'd say the bytes used to implement this last-byte encoding will likely exceed the bytes you save. Plus it makes the functions slightly slower. So, not worth it.

Edited January 20, 2023 by primerib1

primerib1 · March 4

I just realized that the difference between "URL safe" Base64 and the "normal, non-URL-safe" Base64 is just replacing "+" and "/" of the latter, with "-" and "_", respectively.

So if you need a fast but URL-safe Base64, you can use the one here

and do a global replace of "+" and "/" with "-" and "_", respectively, at the end of the PackUuid2 function, just before the 'return' statement.

Then, in the UnPackUuid2 function, perform the reverse at the very beginning of the function.

In code:

    }
    result = llReplaceSubString(result, "+", "-", 0);
    result = llReplaceSubString(result, "/", "_", 0);
    return result + llGetSubString(hex, -2, -1);
}

and

string UnPackUuid2(string packed) {
    packed = llReplaceSubString(packed, "-", "_", 0);
    packed = llReplaceSubString(packed, "_", "/", 0);
    string result;

And there you have it! Fast, yet URL-safe.

(yet another) Key Packing Method (22 chars, URL safe, JSON safe)

Recommended Posts

primerib1

Link to comment

Share on other sites

primerib1

Link to comment

Share on other sites

primerib1

Link to comment

Share on other sites

primerib1

Link to comment

Share on other sites

primerib1

Link to comment

Share on other sites

Please sign in to comment

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us

Forums

Blogs

Activity