primerib1 Posted January 20, 2023 Share Posted January 20, 2023 (edited) Sooo after much discussion about UUID packing in the LSD thread, I discovered a reliable method to downpack UUIDs to just 22 characters. Always 22 characters, never more, never less A saving of 14 bytes in LSD (38% saving) A saving of 58 bytes in script-space (57% saving) In addition, my method produces URL-safe characters, i.e., characters that won't be decoded as something else when used in a URL. And it is also JSON-safe, meaning that the packed result can be immediately put in a JSON structure without needing any escaping. The principle: Grab 6 nybbles (hex digits) from the UUID = 3 bytes = 24 bits Convert these 24 bits to 4 base64url symbols <= this is the 'secret sauce' Repeat from 1 until we have only 2 nybbles left Append the leftover 2 nybbles without further processing to the result (besides, 8 bits => 2 base64 symbols anyways, so no saving at all there) The decoding is of course the reverse: Grab 4 symbols Convert the 4 symbols to 6 nybbles Repeat from 1 until we have only 2 symbols left Append the leftover 2 symbols to the result And without further ado: // These are the "base64url" character set from RFC 4648 §5 string B64U = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"; string XDIGITS = "0123456789abcdef"; string PackUuid(key id) { string hex = llDumpList2String(llParseString2List((string)id, ["-"], []), ""); string result; integer len = llStringLength(hex) - 2; integer offs; integer portion; integer j; integer i; for (;i<len;i+=6) { portion = (integer)("0x" + llGetSubString(hex, i, i+5)); for (j=0;j<4;++j) { offs = portion & 0x3F; portion = portion >> 6; result += llGetSubString(B64U, offs, offs); } } return result + llGetSubString(hex, -2, -1); } string UnPackUuid(string packed) { string result; integer idx; integer threebytes; integer j; integer i; for (;i<20;i+=4) { threebytes = 0; for (j=3;j>=0;--j) { idx = i + j; threebytes = (threebytes << 6) | llSubStringIndex(B64U, llGetSubString(packed, idx, idx)); } for (j=0;j<6;++j) { idx = (threebytes & 0xF00000) >> 20; result += llGetSubString(XDIGITS, idx, idx); threebytes = threebytes << 4; } } return ( llGetSubString(result, 0, 7) + "-" + llGetSubString(result, 8, 11) + "-" + llGetSubString(result, 12, 15) + "-" + llGetSubString(result, 16, 19) + "-" + llGetSubString(result, 20, -1) + llGetSubString(packed, -2, -1) ); } Edited January 21, 2023 by primerib1 1 1 Link to comment Share on other sites More sharing options...
primerib1 Posted January 20, 2023 Author Share Posted January 20, 2023 (edited) A probably more performant one, using built-in funcs llIntegerToBase64() and llBase64ToInteger(), but the result of the packing is different from the above. Also this method is not URL-safe; you might end up with the characters + and / , and those characters have special meanings when in a URL. string XDIGITS = "0123456789abcdef"; string PackUuid2(key id) { string hex = llDumpList2String(llParseString2List((string)id, ["-"], []), ""); string result; integer len = llStringLength(hex) - 2; integer portion; integer i; for (;i<len;i+=6) { // Must lshift by 2 to align the big-endian zeroes to multiples of 6 portion = ((integer)("0x" + llGetSubString(hex, i, i+5))) << 2; result += llGetSubString(llIntegerToBase64(portion), 1, 4); } return result + llGetSubString(hex, -2, -1); } string UnPackUuid2(string packed) { string result; integer decoded; integer len = llStringLength(packed) - 2; integer idx; integer j; integer i; for(;i<len;i+=4) { // This is actually lshift by 8, then rshift by 2 // The rshift is to neutralize the lshift performed during packing decoded = llBase64ToInteger("A" + llGetSubString(packed, i, i+3) + "A==") << 6; for (j=0; j<6; ++j) { idx = (decoded & 0xF0000000) >> 28; result += llGetSubString(XDIGITS, idx, idx); decoded = decoded << 4; } } return ( llGetSubString(result, 0, 7) + "-" + llGetSubString(result, 8, 11) + "-" + llGetSubString(result, 12, 15) + "-" + llGetSubString(result, 16, 19) + "-" + llGetSubString(result, 20, -1) + llGetSubString(packed, -2, -1) ); } Edited January 20, 2023 by primerib1 1 Link to comment Share on other sites More sharing options...
primerib1 Posted January 20, 2023 Author Share Posted January 20, 2023 (edited) A note on these lines: portion = ((integer)("0x" + llGetSubString(hex, i, i+5))) << 2; result += llGetSubString(llIntegerToBase64(portion), 1, 4); The left-shift by 2 makes the binary representation of the integer to be 6-bit aligned in such a way that when Base64-encoded, they look like this: // [03:00] Object: : APHAMA== // [03:00] Object: : AxA0yA== // [03:00] Object: : ATkeeA== // [03:00] Object: : Ak6OIA== // [03:00] Object: : A/sO7A== Note how every encoded value starts with "A" and ends with "A==" Because of that, we can throw away the 1st char and the trailing 3 chars, getting only characters [1, 4]. (The thrown away characters are restored statically in the unpacking process) Edited January 21, 2023 by primerib1 1 Link to comment Share on other sites More sharing options...
primerib1 Posted January 20, 2023 Author Share Posted January 20, 2023 (edited) A possible further size optimization is to turn the last byte of UUID into a Unicode character, after adding 32 to it (to avoid the bug with characters < 32 ). However, in UTF-8 this will save an additional byte only 37% of the time; 63% of the time you'll end up with 2 bytes. (In UTF-16, though, you'll save 2 bytes, guaranteed) Plus, you will end up with characters that might get corrupted / mangled if put in a URL. Not to mention the additional script-space memory used to implement the encoding/decoding logic. Because of the marginal improvement, but potential drawbacks, I decided to not process the last byte further and just append the hex digits "as is". EDIT: In case you want to know how to do it: In the Packer side you need to use this: llChar(((integer)("0x" + llGetSubString(hex, -2, -1))) + 32); Not too bad ... but on the UnPacker side you need to use this: integer itail = llOrd(llGetSubString(packed, -1, -1)) - 32; string stail = llGetSubString(XDIGITS, itail & 0xF, itail & 0xF); itail >>= 4; stail = llGetSubString(XDIGITS, itail & 0xF, itail & 0xF) + stail; I'd say the bytes used to implement this last-byte encoding will likely exceed the bytes you save. Plus it makes the functions slightly slower. So, not worth it. Edited January 20, 2023 by primerib1 1 Link to comment Share on other sites More sharing options...
primerib1 Posted March 4 Author Share Posted March 4 I just realized that the difference between "URL safe" Base64 and the "normal, non-URL-safe" Base64 is just replacing "+" and "/" of the latter, with "-" and "_", respectively. So if you need a fast but URL-safe Base64, you can use the one here and do a global replace of "+" and "/" with "-" and "_", respectively, at the end of the PackUuid2 function, just before the 'return' statement. Then, in the UnPackUuid2 function, perform the reverse at the very beginning of the function. In code: } result = llReplaceSubString(result, "+", "-", 0); result = llReplaceSubString(result, "/", "_", 0); return result + llGetSubString(hex, -2, -1); } and string UnPackUuid2(string packed) { packed = llReplaceSubString(packed, "-", "_", 0); packed = llReplaceSubString(packed, "_", "/", 0); string result; And there you have it! Fast, yet URL-safe. Link to comment Share on other sites More sharing options...
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now