You are currently in the Forum Archive. All content within this area is Read-Only and cannot be modified. Active Forums can be found here.
Reply
Honored Resident
Haravikk Mistral
Posts: 95

Converting list of integers into a base64 string

I've been revisiting the problem of converting base64 into "byte" data and vice-versa recently as part of my cryptography libraries. At present I have a function that works which takes each character of input, compares it to an alphabet, and adds it to a list of integers. This is fine, and works for hex as well, but being per-character it's not as fast as it could be.

However, it occurred to me that I can do this more quickly using llBase64ToInteger() and llIntegerToBase64(), as I can retrieve or produce 30-bits of data at a time with these without having to do any overly fancy modification, and without having to store a copy of the base64 alphabet at all!

To this end I've come up with the following function that takes a list of integers, and a bit-count (to address anything that doesn't fit properly into 32-bit words) and produces base64, it works, but I have a feeling I've managed to overcomplicate it, as I'm tracking loads of different variables in order to account for how many bits I need to make up a batch of 30, and how many I've used from the current word, as well masking the result for the final chunk of data to avoid producing too many characters.

string Base64FromBytes(list bytes, integer bits) {
     string base64 = "";
     {
          integer remaining = bits;
          integer words = bytes != []; integer word; integer x = 0;
          integer shift = 32; integer bitbuf = 0; integer buf = 0;
          integer width = 4; integer mask = 0xFFFFFFFC; integer s; integer t;
          integer m;
          while (remaining > 0) {
               while (buf < 30) { // Bits required
                    if (shift < 32) { // Bits left in current word
                         s = 32 - buf;
                         // Add chunk of word to bit-buffer, horrible masking required to
                         // simulate right-shift without repeating upper bit
                         bitbuf = bitbuf |
                              (((word << shift) >> buf) &
                                   ~(-1 * (buf != 0) << (32 - buf)));
                         t = 32 - shift;
                         if (t > s) t = s;
                         buf += t;
                         if (buf > 32) buf = 32;
                         shift += s;
                    } else { // No bits left in word
                         if (x < words) { // Load next word
                              if (x >= 8) { // Keep the list-length sane
                                   bytes = llList2List((bytes = []) + bytes, x, -1);
                                   words -= x;
                                   x = 0;
                              }
                              word = llList2Integer(bytes, x++);
                              shift = 0;
                         } else { // No more bits to load, restrict what we have
                              mask = -1 << (32 - remaining);
                              width -= (30 - remaining) / 6;
                              jump break;
                         }
                    }
               }
               @break;
               base64 += llGetSubString(llIntegerToBase64(bitbuf & mask), 0, width);
               buf -= 30;
               remaining -= 30;
               bitbuf = bitbuf << 30;
          }
     }
     return base64;
}

I've been staring at it for far too long now in any event, so I'm hoping someone can see the simplifications I'm just plain missing, as I suspect that there should be a more elegant way to put this together, probably without the inner loop. The complication arises in that the list will contain 32-bit integers, while only the first five characters of base64 produced by llIntegerToBase64() are usable as the 6th character would require modification.

I don't have the reverse function (convert base64 into integers) yet, but I expect it will be largely the same but in reverse.

Void Singer
Posts: 7,475
Solutions: 64
Registered: ‎05-28-2009

Re: Converting list of integers into a base64 string

Reply to Haravikk Mistral - view message

perhaps I'm just being dense.... but is all that tracking even worth the 3bits youd' lose over not fully packing them?

- Farewell to your ports and good luck to you all
Honored Resident
Haravikk Mistral
Posts: 95

Re: Converting list of integers into a base64 string

Reply to Haravikk Mistral - view message

In my specific case I'm interested in this for use with AES encryption code, which works on lists of integers, treating every 4 words as a single 128-bit block of 4x4 bytes. It's also supposed to be able to handle hexadecimal input (since that's better for storing the key and input vector) which of course fits 32-bit words perfectly.

So unfortunately I can't just fudge it and ignore the empty 2 bits of each word, I have to line them up properly. Like I say as well, I'm trying to replace an existing function that works per-character with one that should be able to work on 5 in a single go, as it eliminates 4 function calls for every 5 characters of input, which is a huge benefit to performance!

Void Singer
Posts: 7,475
Solutions: 64
Registered: ‎05-28-2009

Re: Converting list of integers into a base64 string

Reply to Haravikk Mistral - view message

my thought was encode each integer separately, and string the results together... then instead of reading it back with one b64 conversion, chop the string into 4 6 character pieces, and decode those.... it's not quite as elegent, but it's probably faster than tracking your shifts, and certainly faster than doing it per character.

- Farewell to your ports and good luck to you all
Honored Resident
Haravikk Mistral
Posts: 95

Re: Converting list of integers into a base64 string

Reply to Haravikk Mistral - view message

You mean take the result of each unmodified llIntegerToBase64() call and just join them together (minus the equals signs)? The problem is that the last (6th) character will always contain 4 trailing, unused bits, so the resulting string will only work correctly when processed in the same way to recreate the integer list (every 6 chars as one integer).

The purpose of converting to base64 though is to communicate with external sources, whose standard methods for base64 conversion will typically convert directly into byte arrays or similar, so I'd end up giving them the redundant data to filter out which puts the burden on them, not to mention increases the size of the base64 string produced by ~10% due to all the null data. And like I say, I'm trying to replace a function that produces a correct result without trailing bits on each word, but it's doing much the same thing but on a per-character level. Annoyingly the shifting for the per-character version is simpler, but it has so many unnecessary function calls that it really does need replacing as it's less than half the speed.

In any event, since I need to do the algorithm myself on the LSL end anyway, I think it's better to have it done correctly as my list of integers are meant to represent byte-data, and by leaving trailing bits in I'd be breaking the base64 standard =)

Handling the shifts isn't that slow as such, not compared to the overhead of the function calls to fetch the integer, produce the base64, and extract the part that I need, I just have the feeling that there must be an easier way to handle the shifting as the difference is only ever 2 bits at a time.

Void Singer
Posts: 7,475
Solutions: 64
Registered: ‎05-28-2009

Re: Converting list of integers into a base64 string

Reply to Haravikk Mistral - view message

well if you're going to require straight compliance with the outside sounce, there's no helping it...

b = bits, c = characters (6bits / character)

0b + 5c + 2b (4b remainder to the next) 6c read

4b + 4c +4b (2b remainder to the next) 5c read

2b + 5c + 0b  (no remainders ) 5c read

so your frequency is 3 looking at it that way...
read 5,
if (loop % 3)
_read  >> 2 * (loop % 3 & 1)
_drop remainder in next list value
else
_read << 2
_read += read 1 >> 4
_drop remainder in next list value

looking at it as only bits, then rounding up to your value to whole characters you get llCeil( 0 / 6.0 ) start + llCeil( 32 / 6.0 ) end, which may be able to be broken down more easily... check where you are in the cycle to get your shifts... honestly the first logic would seem to make more sense, since you prefill the next variable with the remainder instead of re reading it.

- Farewell to your ports and good luck to you all