Base 256 Compression

Fillo Farber · June 21, 2011

I'm looking for a way to compress 7 digit floats that range from 0.000001 to 9.999999 . I ran across the following Base 256 compression script in the WIKI. I thought I could adapt it to my use but I can't seem to get it to work correctly as shown in the WIKI. The return values I get all have a leading 0 (zero) in front of each character. *EDIT* Also any ideas as to how to convert this to use with a base 10 input would be greatly appreciated. I don't need those hyphens inserted but I need the decimal point. *END EDIT* Thanks in advance!

string lslBase256Chars = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";
string lslHexChars = "0123456789abcdef";
 
string compressKey(key _key) {
    string k = llToLower((string)llParseString2List((string)_key, ["-"], []));
    integer l = llStringLength(k);
    integer j;
    integer v;
 
    string out = "";
    integer i = 0;
    do {
        j = i + 1;
        v = (llSubStringIndex(lslHexChars, llGetSubString(k, i, i)) << 4) | 
            llSubStringIndex(lslHexChars, llGetSubString(k, j, j));
        out = (out = "") + out + llGetSubString(lslBase256Chars, v, v);
    } while ((i += 2) < l);
 
    return (out = k = "") + out;
}
 
key decompressKey(string k) {
    integer l = llStringLength(k);
    string out = "";
    integer v; integer j; integer x;
    integer i = 0;
 
    do {
        v = llSubStringIndex(lslBase256Chars, llGetSubString(k, i, i));
        j = v >> 4; x = v & 0xF;
        out = (out = "") + out + 
            llGetSubString(lslHexChars, j, j) + 
            llGetSubString(lslHexChars, x, x);
    } while ((++i) < l);
 
    return (key)llInsertString(
        llInsertString(
            llInsertString(
                llInsertString((out = "") + out, 8,"-"),
                13,
                "-"
            ),
            18,
            "-"
        ),
        23,
        "-"
    );
}

Fillo Farber · June 21, 2011

I think I found the issue with the leading zeros. I changed the one line in the output function from this:

out = (out = "") + out + llGetSubString(lslHexChars, j, j) + llGetSubString(lslHexChars, x, x);

to this:

out = (out = "") + out + llGetSubString(lslHexChars, x, x);

Void Singer · June 21, 2011

when you say compress, what exactly do you mean? floats store that much precision in only 6 bytes, so you aren't going to get much better than that. if you mean as text, then a key crusher isn't what you want.

Fillo Farber · June 21, 2011

Yes I want to compress the text and the only example I could find was this base 256 function. Thought I had it working but I found an error. What did you have in mind Void?

Kaluura Boa · June 21, 2011

Altho I fail to see the point, you can compress a float with a base 256 algorithm.

Here are 2 functions to convert a float to a 4-char string and back:

This should work (assuming there are no typo).

string Base256Chars = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uuCompress(float f){    integer k = (integer)f;    string result = llGetSubString(Base256Chars, k, k);    f -= (float)k;    string flt = (string)f;    integer i = 2;    for (; i <= 6; i += 2)    {        k = (integer)llGetSubString(flt, i, i + 1);        result += llGetSubString(Base256Chars, k, k);    }    return result;}float uuDecompress(string zz){    string result;    integer i = 1;    for (; i <= 3; ++i)    {        integer k = llSubStringIndex(Base256Chars, llGetSubString(zz, i, i));        result += llGetSubString("0" + (string)k, -2, -1);    }    return (float)llSubStringIndex(Base256Chars, llGetSubString(zz, 0, 0))             + (float)("0." + result);}default{    touch_start(integer total_number)    {        float ff = llFrand(256.0);        string comp = uuCompress(ff);        llOwnerSay((string)ff + " --> " + comp + " --> " + (string)uuDecompress(comp));    }}

Note that these functions handle only positive numbers from 0 to 255.999999. (Handling negative numbers is possible with a reduction of the range of the supported numbers.)

Fillo Farber · June 21, 2011

Kaluura, I can not get that script to work properly. I fixed one line to read "f -= (float)k;" as I assume that's what you intended. I put the base 256 char set in a notecard because it can't be read properly after compiled within the script. With these changes it appears to work with inputs up to 99.000000 then something goes wrong. Except for the integer portion of the input number, it appears that your script compression is limited to an output that represents a range of up to 99 per position - using only the first 99 chars of the 256 char set. (there's that number 99 again - hmm) Just guessing, but might it be possible to represent a float as large as 10.000000 with just 3 base 256 chars by using the full set throughout?

Void Singer · June 21, 2011

it isn't pretty but this should work for ranges 0.0-15.99999

string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uFlt2Str256( float vFltNum ){	integer vIntTmp;	return	  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 16), vIntTmp ) +	  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 4096) & 255, vIntTmp ) +	  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 1048576) & 255, vIntTmp );}float u256Str2Flt( string vStrFlt ){	return	  ((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 )) << 16) +	   (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) << 8) +	   (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 )))) / 1048576.0;}

Fillo Farber · June 22, 2011

Void Singer wrote:

it isn't pretty but this should work for ranges 0.0-15.99999

string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ"; string uFlt2Str256( float vIntFlt ){ integer vIntTmp; return  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 16), vIntTmp ) +   llGetSubString( gStr256, vIntTmp = (integer)(vIntNum * 4096) & 255, vIntTmp ) +   llGetSubString( gStr256, vIntTmp = (integer)(vIntNum * 1048576) & 255, vIntTmp ); } float u256Str2Flt( string vStrFlt ){ return((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 )) << 24) + (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) << 16) + (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 )))) / 1048576.0; }

This script seems to have the potential to do what I'm looking for

The variable vIntNum I assume should have been vIntFlt? I made that fix but it doesn't seem to work. I'm getting good results on inputs from 0.000001 through 0.000010 but after that it goes wrong. Here's some examples:

Input = : 0.000010

Flt2Str Result = : 00a

u256Str2Flt Result = : 0.000010

Input = : 0.000011

Flt2Str Result = : 00b

u256Str2Flt Result = : 0.000010

Input = : 0.000012

Flt2Str Result = : 00c

u256Str2Flt Result = : 0.000011

Input = : 1.000000

Flt2Str Result = : g00

u256Str2Flt Result = : 256.000000

The Flt2Str function might be working fine but the u256Str2Flt is returning the same value for 00a and 00b as shown above. I don't have an understanding of the bitwise operators so I'm not sure what's going on. Any ideas?

Void Singer · June 22, 2011

I overshot on the second function, corrected above (I think)

Fillo Farber · June 22, 2011

* Update * Not sure if you read before this edit but here's an update: I'm finding errors still. The first two examples show that the 2nd function returns the same results for 00w & 00x, then in the next two examples the 1st function seems to put out just two chars at times. Hope this is enough info for you to spot the problem. I really appreciate your assistance... Thank you!

Input = : 0.000031
Flt2Str Result = : 00w
u256Str2Flt Result = : 0.000031

Input = : 0.000032
Flt2Str Result = : 00x
u256Str2Flt Result = : 0.000031

Input = : 9.913982
Flt2Str Result = : ÜÝê
u256Str2Flt Result = : 9.913982

Input = : 9.913983
Flt2Str Result = : ÜÝ
u256Str2Flt Result = : 9.913818

Void Singer · June 22, 2011

just tested by copy paste, your original results (the 9.x range) are showing a different string than the copied function...

the lower inputs, are showing truncation errors, that llCeil seems to fix at first blush. if we could bit hack floats directly it'd be a lot easier (instead of going through some madness like strife's FUI-IUF functions)

try this instead...

string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uFlt2Str256( float vFltNum ){    integer vIntTmp;    return      llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 16), vIntTmp ) +      llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 4096) & 255, vIntTmp ) +      llGetSubString( gStr256, vIntTmp = llCeil(vFltNum * 1048576) & 255, vIntTmp );}float u256Str2Flt( string vStrFlt ){    return      ((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 )) << 16) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) << 8) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 )))) / 1048576.0;}

ETA:
that doesn't actually match the full precision (7.3) of a float of that range, but the tail was always iffy... if it fails or needs perfect precision then it'll need to be 4 characters, and multiply/shift the ranges another 256/8bits so that the tail gets properly converted.

Kaluura Boa · June 22, 2011

I said "assuming there is no typo".

Anyway, I fixed the function and made a real script around my functions. It works, even for numbers greater than 99.

[05:42]  Object: 160.565100 --> ã?:9 --> 160.565100[05:43]  Object: 150.431700 --> Ù(h™ --> 150.431700[05:43]  Object: 120.982200 --> »¥l~ --> 120.982200[05:43]  Object: 149.425500 --> Ø'=S --> 149.425500[05:43]  Object: 115.898900 --> ¶``& --> 115.898900[05:43]  Object: 238.102100 --> ıal' --> 238.102100[05:43]  Object: 117.986400 --> ¸¥F: --> 117.986400

(Those numbers are weird. I do not know why they all end up with "00". llFrand() should not do that.)

It is however true that if you do not use numbers greater than 99.999999, my functions will use only the 100 first chars of 'Base256Char'. We can do better with 3 chars only but the range of the compressable numbers falls to 0 to 7.999999.

Here is the script (Garantied without typo and working):

string Base256Chars = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uuCompress(float f){    integer k = (integer)f;    f -= (float)k;    k = (k & 0x7) << 21; // from 0 to 7    string flt = (string)f;    integer b = 14;    integer i = 2;    for (; i <= 6; i += 2)    {        k = k | ((integer)llGetSubString(flt, i, i + 1) << b);        b -= 7;    }    string result;    for (b = 16; b >= 0; b -= 8)    {        i = (k & (0xFF << b)) >> b;        result += llGetSubString(Base256Chars, i, i);    }    return result;}float uuDecompress(string zz){    integer k = 0;    integer b = 16;    integer i = 0;    for (; i <= 2; ++i)    {        k = k | (llSubStringIndex(Base256Chars, llGetSubString(zz, i, i)) << b);        b -= 8;    }    float flt = (float)((k & 0xE00000) >> 21);    string result = "0.";    for (b = 14; b >= 0; b -= 7)    {        result += llGetSubString("0" + (string)((k & (0x7F << b)) >> b), -2, -1);    }    return flt + (float)result;}default{    touch_start(integer total_number)    {        float ff = llFrand(8.0);        string comp = uuCompress(ff);        llOwnerSay((string)ff + " --> " + comp + " --> " + (string)uuDecompress(comp));    }}

Some testing in-world:

[07:14]  Object: 5.265731 --> éßâ --> 5.265731[07:14]  Object: 4.209079 --> È*V --> 4.209079[07:15]  Object: 7.435775 --> ĭğĎ --> 7.435775[07:15]  Object: 0.858818 --> l¯i --> 0.858818[07:15]  Object: 3.041920 --> ¤9× --> 3.041920[07:15]  Object: 6.091103 --> ąLÆ --> 6.091103

It works!

Some binary babble (if you understand that kind of language): To encode a number from 0 to 99 in binary, we need only 7 bits. So we split the "tail" behind the decimal point in 3 parts and we shift them to fill the 21 lowest bits of an integer (32 bits). The integer part of the number is shifted into the 23-21 bits. So we have an integer with 24 significant bits that we split/shift in 3 numbers of 8 bits (from 0 to 255) which give us indices to concatenate 3 chars from 'Base256Chars'.

Decompression simply works the other way around. From the indices of the 3 chars, we re-build the integer with 24 significant bits that we can then split/shift to re-write the float.

Fillo Farber · June 22, 2011

Well I'm glad I found a challenge for you folks! I've come up with my own method which "appears" to work with floats as high as10 using 3 base256 chars with no loss. I was getting losses in the least significant digit so I manipulated that one as a string before converting to an integer. The weird thing is that when I added a comparison of the input vs output variables it reports that they are different although they seem to be identical. i.e. it is telling me that 0.000010 does not equal 0.000010 . Here's the script that includes the test routine. Excuse my kindergarten methodology but I find it easier to deal with small lines of code at a time. I tend to get lost in the parenthesis. (I should add that I plagiarized part of the code from Void's work)

string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uFlt2Str256( float vFltNum ){        string vStr = (string) vFltNum;    integer index = llSubStringIndex(vStr,".");    vStr=(string) llDeleteSubString(vStr, index,index);     integer vIntTmp=(integer)vStr;    integer vIntTmp1 = llFloor(vIntTmp/65536);    integer vIntTmp2 = llFloor((vIntTmp-(vIntTmp1*65536))/256);    integer vIntTmp3 = vIntTmp-((vIntTmp1*65536)+(vIntTmp2*256));    return      llGetSubString(gStr256, vIntTmp1, vIntTmp1) +      llGetSubString(gStr256, vIntTmp2, vIntTmp2) +      llGetSubString(gStr256, vIntTmp3, vIntTmp3);      }float u256Str2Flt( string vStrFlt ){    return       ((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 ))*65536) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) *256) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 ))))/1000000.0;}default{    state_entry()    {         float i;         for (i = 0; i< 10000; i=i+.000001)        {            string testIn = uFlt2Str256(i);            float testOut = u256Str2Flt(testIn);            if (testOut!=i)            {                llOwnerSay((string)testOut+" "+(string)i);            }        }    }     }

Void Singer · June 22, 2011

Fillo Farber wrote:

The weird thing is that when I added a comparison of the input vs output variables it reports that they are different although they seem to be identical. i.e. it is telling me that 0.000010 does not equal 0.000010

this is because of how floats are stored... all those leading zeros are ignored, so you can get fractional parts when using straight multiplication to raise the value. which is what you saw in the original 31/32 comparison using mine, and what you are seeing in the comparison values in your recent test... test display truncates the trailing digits past 7 places, so you don't see that they don't match.

a pure float union integer and then a conversion to base 256 would perfectly encode a float of any size and perfectly decode it in 4 charaters, but the code is a bit bulky in lsl... in other most languages it's built in because they directly convert the bits from one type to the other.

please note that for storage purposes, any string representation in LSL of a float or integer is actually going to take more memory than the actual numerical value... in fact, even for most transport purposes.

Fillo Farber · June 22, 2011

You said they have leading zeros and trailing digits. There are both? Can we strip those to format the reconstituted float to match original input value? I understand that this does not save memory. The purpose of this function is to store up to 85 floats in an object description - which is pretty cool

Void Singer · June 22, 2011

the native format of floats strips leading zeros, until it gets to the first set bit which it then peels off and assumes it's there, and the number of binary places is encoded as a small bit field. at 4 characters it's possible to duplicate.

but there may be a small problem with your plans... the description field is limited by bytes, not characters.... and not all the characters in that series are 1 byte (optimally they could be, but that would mean using the ascii set, and not all of those will print to the description)

kaluura's code is more in that direction, and should have the room to encode the shifts for sub-normal numbers (below 1) for perfect conversion of printed values (although you may still end up with 0.0 != 0.0). alternatively, if you throw in a cast to and from string, your values will always match up

Fillo Farber · June 22, 2011

"and not all the characters in that series are 1 byte " ewww, I didn't realize that... I was just working on casting to string as you mentioned which did work. I found another issue though - when we reach the number 98 we seem to hit a problem with that char in the char set. It doesn't pick it up for some reason.

Yes, Kaluura's is workable but since that script is using the full 256 char set as well I'll have to reserve at least 2 bytes memory for each char. (also the issue as with the 99th char)

Do you know of any issues using llIntegerToBase64?

Fillo Farber · June 22, 2011

Doing some tests with the base 64 function. An output range of base 64 chars that is AAAAAA== to AA///w== represents integers 0 to 1048575 . Within that range I can strip the leading AA and trailing == to reduce them to 4 char representations of the inputs. Above 104875 we need 5 chars and so on. I can work within the range of 4 chars for my purpose. Unless there are some other issues, I should be able to store 254/4 floats in an object name and description combined, correct? Too bad about the base 256 issues because that would have been very compact.

I really appreciate the help from both of you thus far. I'm a bitwise'er about this topic now.

Zauber Exonar · July 6, 2011

I tried using these functions for compressing some floating point data, but unfortunately the functions aren't able to handle negative numbers. Trying to compress the float -0.707107, for example, results in 16.662240. Can these functions be adapted to compress negative floats as well as positives?

Void Singer · July 7, 2011

at that level of resolution, you'd only get one or the other... however strife two functions FUI and IUF which convert floats to and from integer, which can easily be converted to hex at 8 characters or base64 at 6 characters and still have exact 32bit floats. compressing farther requires assumption and limitng the range

Base 256 Compression

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Please sign in to comment

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us