Jump to content

Understanding bitfields


Nikiee
 Share

You are about to reply to a thread that has been inactive for 3258 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

Recently I've decided that I want to optimize my code further by collapsing several integers to a single bitfield (signed-32).

I keep wanting to store properties or "states" as integers and check those when needed, for example:

integer gender = 1;
integer class = 5;
integer outfit = 2; default { state_entry() {
... } }

The globals and values are arbitrary, but this is what I do. This ends up using 48 bytes of memory for simple checks that are done sparingly. What I've tried doing to compress the memory usage is this:

 

integer specs;

default
{
   state_entry()
   {
      integer gender = 1;
      integer class = 5 <<4;
      integer outfit = 2 <<8;
      specs = gender|class|outfit; // result: 519, binary: 0010 0000 0111
      ...
   }
}

Since a 1-digit integer can be stored in 4 bits, I could compress up to 8 (128 bytes) separate integers into one (16 bytes).

But now when I'm trying to figure out a way to check for those individual values, I feel like I'm doing an unnecessary amount of math and/or checks to get there. This is where I'm stumped, and hoping that someone more familiar with bitfields could explain a good way to do this.

Link to comment
Share on other sites

seems you know quite a bit already but I try to explain all for anyone else who reads as well ok

+

1 bit can store 2 values: 0..1
2 bits can store 4 values: 0..3
3 bits 8 values: 0..7
4 bits 16 values: 0..15

and so on

the number of values doubles with each bit added

+

so if for example

gender is 2 values in range: 0 = Female, 1 = Male.

class is say 8 values in range: 0 = Class A, 1 = Class B, 2 = Class C, ... , 7 = Class H

outfit is 4 values in range: 0 = Outfit 1, 1 = Outfit 2, 2 = Outfit 3, 3 = Outfit 4

then there is a 6 bit record from the left

bits-  6        5 4 3   2    1  
data- [gender] [class] [outfit]

some example functions

 

// save/store record in a integerinteger dbRecord(integer gender, integer class, integer outfit){    // shorthand method    return ((gender & 0x1) << 5) | ((class & 0x7) << 2) | (outfit & 0x3);         // longhand method so can see more easy what is happening    // & mask to make sure data value is in range. then shift right into position    gender = (gender & 0x1) << 5;  // 0x1 = binary 1. 5 = bit width of class and outfit         class  = (class  & 0x7) << 2;  // 0.7 = binary 111. 2 = bitwidth of outfit    outfit = (outfit & 0x3);       // 0x3 = binary 11.    return gender | class | outfit; // OR (add) them together and return as integer dbRecord} // get/fetch data from recordinteger dbGetGender(integer record){   return (record >> 5) & 0x1;  // shift left by 5 and mask with binary 1}integer dbGetClass(integer record){  return (record >> 2) & 0x7;  // shift left by 2 and mask with binary 111}integer dbGetOutfit(integer record){  return record & 0x3;  // mask with binary 11}

 

 eta: t[yp

  • Like 1
Link to comment
Share on other sites

integer dbRecord(integer gender, integer class, integer outfit){    // longhand method so can see more easy what is happening    // & mask to make sure data value is in range. then shift right into position    gender = (gender & 0x1) << 5;  // 0x1 = binary 1. 5 = bit width of class and outfit         class  = (class  & 0x7) << 2;  // 0x7 = binary 111. 2 = bitwidth of outfit    outfit = (outfit & 0x3);       // 0x3 = binary 11.}

The way you explained how many values each bit can store, and the way you ordered dbRecord() had my head stuck for quite a while. But I think I understand what you're doing with the values now. Neat!

Just as a note for myself (and clarification for others?), the function would be "better" (in this situation) formatted like this:

integer dbRecord(integer class, integer outfit, integer gender){	class =	 (class & 0x7)  << 3;   // 0x7 = binary 111xxx  (3 = width of GENDER and OUTFIT)	outfit = (outfit & 0x3) << 1;   // 0x3 = binary 11x     (1 = width of GENDER)	gender = (gender & 0x1);        // 0x1 = binary 1       (No shifting because it's the first)	return gender | class | outfit;	// OR them together and return as integer dbRecord}

There would never be a need for more than 2 values for gender, clothing could be 3 layers (as an example), and classes we could have 7 or more.

That aside, this is a very nice way of dealing with any amount of properties, especially since you could store more than 8 values using this method.

Link to comment
Share on other sites

yes is quite interesting is bitwise

+

also can level up the codes

if a record is say 6 bits then can store 5 records in a 32 bit integer (with 2 bits spare). So it can act like a little table of 5 records

i just make a longhand example mod of this

 

// recnum is 0-based in the range [0..4]. 6 = width of recordinteger dbTable(integer table, integer recnum, integer gender, integer class, integer outfit){   gender = (gender & 0x1);   class  = (class & 0x3) << 1;   outfit = (outfit & 0x7) << 3;      integer record = outfit | class | gender;   integer index = 6 * recnum;   // remove old record   table = table ^ (((table >> index) & 0x3F) << index); // 0x3F is binary mask 111111   // insert new record   table = table | (record << index);      return table;}integer dbGetGender(integer table, integer recnum){   integer index = 6 * recnum;    return (table >> index) & 0x1;}integer dbGetClass(integer table, integer recnum){   integer index = (6 * recnum) + 1;    return (table >> index) & 0x3;}integer dbGetOutfit(integer table, integer recnum){   integer index = (6 * recnum) + 3;    return (table >> index) & 0x7;}

 

 

+

or can store 4 records in the 32bit integer for 24 bits (4 * 6) and use the other 8 bits as control flags. Like 4 of the control bits could mark each record as live or deleted

but i leave that part as a exercise for who ever might want to do that

Link to comment
Share on other sites

integer set_bitfield(integer s, integer w, integer p, integer d){

d = d & ~((1<<w)-1<<p); // clear the old bitfield

d = d | (s & (1<<w)-1) << p; // insert the new bitfield

return(d);

}

 

integer get_bitfield(integer w, integer p, integer d){

return((d>>p) & ((1<<w)-1));

}

 

default

{

touch_start(integer total_number)

{

 

integer d;

d=set_bitfield(7,3,0,d);

d=set_bitfield(123,7,6,d);

llSay(0,(string)d);

llSay(0,(string)get_bitfield(7,6,d));

llSay(0,(string)get_bitfield(3,0,d));

}

}

 I'd penned up a long treatise on bitwise manipulations in response to your last post in this thread, but it got vaporized when you deleted that post. I'm off to run errands, so I'll present only the code example I worked up to allow bitfield insertions and leave it to you to tear it apart.

;-).

And I do wonder how large a list of integers you must have before the space savings  justifies the added code space and increased execution time.

Link to comment
Share on other sites

Bah, the first time I delete a comment anywhere because I felt dumb and then this happens. :< Sorry!

Anyway, if this was all done in a single small script, it wouldn't really be that much more optimal for just a couple integers, but the point of the exercise (for me) is to get an understanding of bitfields and learning to operate them.

That said, having stared at the functions for a while, I'm having trouble making sense of them because I don't understand what the variables are for. (I get that it's on purpose, but!)

Link to comment
Share on other sites

Meaning:  s, w, p and d ?

in:

set_bitfield(integer s, integer w, integer p, integer d)

 Neither do I and I would have to analyze the script to find out, which I am not intending

I merely do bit logic from case to case when it is convenient

:smileysurprised::):smileyvery-happy:

Link to comment
Share on other sites

integer set_bitfield(integer s, integer w, integer p, integer d){

d = d & ~((1<<w)-1<<p); // clear the old bitfield

d = d | (s & (1<<w)-1) << p; // insert the new bitfield

return(d);

}

 

integer get_bitfield(integer w, integer p, integer d){

return((d>>p) & ((1<<w)-1));

}

 

default

{

touch_start(integer total_number)

{

 

integer d;

d=set_bitfield(7,3,0,d);

d=set_bitfield(123,7,6,d);

llSay(0,(string)d);

llSay(0,(string)get_bitfield(7,6,d));

llSay(0,(string)get_bitfield(3,0,d));

}

}

s = source integer

w = bitfield width

p = bitfield position

d = destination integer

To insert a bitfield into something, you must replace whatever bits are there with the new ones. That means bits that were zero might become one, bits that were one might become zero, and bits might remain unchanged. There is no single logical operator that can do all those things.

You've already worked out the clearing of bits via "and"(&) and the setting of them via "or"(|). There's a third operator that inverts a bit, which is handy. That's the "exclusive or" or "not"(~) operator. We also need a way to specify which bits to set/clear/invert, and you've worked out how to do that with the shift operators ">>" and "<<".

The function set_bitfield() takes "width" bits of a "source" integer nd inserts them into a "destination" integer, starting "position" bits from the left. The function doesn't check to make sure you've specified enough width to contain source. If you insert only 3 bits of the number 15 (binary 1111) you'll get back only seven (111). If you specify a nonsense position, you'll get a nonsense result.

The function get_bitfield() returns an integer extracted from "width" bits of the "destination" (which I should probably have called "source" but didn't, to maintain consistency with set_bitfield(). Again, no checking is made on the arguments, so you can try to do silly things like extract a 500 bit field from a 32-bit integer, which ain't gonna work.

Okay, line by line...

d = d & ~((1<<w)-1<<p); // clear the old bitfield

We need space in "d" for the bitfield. We can't just insert bits, we can only clear/set/invert them. So the way I do it is to clear out a field of the destination w bits wide. For that we need a mask value that's got zeros in the field and ones elsewhere. And we need to place that field in the correct position. In your own code, you've already used & to clear away all the bits that are not in your intended bitfield. Here's an example line from your code...

class =	 (class & 0x7)  << 3;   // 0x7 = binary 111xxx  (3 = width of GENDER and OUTFIT)

It would be handy to have a way to create that mask of 0x7, which has a width of 3. That's actually easy to do. If we shift a solitary one bit left by 3 places, we get 8. One less than that is 7. See the magic there? The expression (1<<w)-1 produces a field of ones w bits wide. Now we just need to shift it into position via <<p. There's our mask, but it's the inverse of what we want, which is a field of zeroes for clearing via the & operator. The ~ operator does the inversion for us, giving us ~((1<<w)-1<<p). All that's left is to & that with d to clear out our desired field in preparation for inserting the bit field from s.

Next line...

d = d | (s & (1<<w)-1) << p; // insert the new bitfield

Once again, we need a mask to select only w bits from the source, clearing out all the others. But this time the mask is like those you used in your code, a field of ones with zero elsewhere (1<<w)-1. We & that with s to keep only the desired bits (s & (1<<w)-1). Now we've got to shift that into position via << p. And finally, we want to | that bitfield into the space we previously cleared in d.

That's set_bitfield.

Now let's do the single line of get_bitfield()...

return((d>>p) & ((1<<w)-1));

You'll recognize that (1<<w)-1 again. It's going to create a mask that will (via &) clear away all but the lower w bits of whatever its &'d with. For that to work for us, we need to move the desired bits of our intended bitfield to the right so they become the low bits. d>>p does just that.

The code in touch_start() just exercises the two functions, to prove (I hope) that they work.

It was not my intent to give you a double vision headache, but you can see why I didn't take the time to recraft my explanation before running my errands. And if my explanation makes things worse, that's either because I just got out of bed briefly for a sip of root beer and I'm half asleep, or because I'm a terrible coder and a worse explainer.

;-).

  • Like 4
Link to comment
Share on other sites

That is incredibly helpful and I'm so thankful that you took the time to go over all of that twice. <3

And I ended up surprising myself by having guessed w, p, and d right, and half-guessed the purpose of s.

Once I get home, I'm going to play around with the function and apply it into some made-up code of my own to see what breaks.

Link to comment
Share on other sites

Hokay. After a lot of trial and error, and heavy use of decimal2binary(), I think I finally have an alright grasp of how the operators work!

For anybody else who might stumble accross this thread, I'll leave the practice script I ended up with by this point. Personally I learn the fastest when I have a consistent base that I use as an example, so with slight modifications, this is how I ended up using the examples by all of you.

Some extra formatting for readability, too. (But the comments don't look quite as good without color-coding.)

 

// CREATE INITIAL STRUCTUREinteger db_record(integer class, integer faction, integer outfit, integer gender){   class =   (class   & 0x7) <<6; // 0x7 = binary 111xxxxxx   (6 = width of GENDER and OUTFIT and FACTION)   faction = (faction & 0x7) <<3; // 0x7 = binary 111xxx      (3 = width of GENDER and OUTFIT)   outfit =  (outfit  & 0x3) <<1; // 0x3 = binary 11x         (1 = width of GENDER)   gender =  (gender  & 0x1);     // 0x1 = binary 1           (No shifting because it's the right-most)   //                             // x's replaced by 0 when shifted, then replaced by next value   return class | faction | outfit | gender;}// GET VALUEinteger get_flag(integer w, integer x){ // x = Amount of bits to shift in the x "axis." check db_record for x.   return ( (db >>x) & ((1<<w)-1) );}// SET VALUEinteger set_flag(integer i, integer w, integer x){ // i = integer to set   db =   db &     ~(    (1<<w)-1  <<x); // Mask   return db = db | (i & (1<<w)-1) <<x;}integer db;default{   state_entry()   {      //class 5, faction 7, fully clothed, male      db = db_record(5, 7, 3, 1);      set_flag(1, 3, 6);  // 0x7 = 111      set_flag(1, 3, 3);  // 0x7 = 111      set_flag(1, 2, 1);  // 0x3 = 11      set_flag(0, 1, 0);  // 0x1 = 1      llOwnerSay("current flags: "         +(string)get_flag(3, 6)+" "         +(string)get_flag(3, 3)+" "         +(string)get_flag(2, 1)+" "         +(string)get_flag(1, 0));   }}

 

 I also wrote this down because it was by far the most helpful set of sentences in the whole thread for me:

//It would be handy to have a way to create that mask of 0x7.//If we shift a solitary one bit left by 3 places, we get 8.//The expression (1<<w)-1 produces a field of ones w bits wide.// 1 = 0001// 0001 <<3   = 1000 (8)// 8-1 = 7    = 0111 (0x7)

 

Link to comment
Share on other sites

i just add a whatelse to the mix (:

+

the bitwidth method is only size-optimal when the ranges to be stuffed/encoded are each a power of two

a alternative that can be used is a arithmetic method

example comparison

                 BITWIDTH ARITHMETIC
rangeA: [0..1]   1 bit.   log2(2) = 1 bit.     
rangeB: [0..2]   2 bits.  log2(3) = 1.59 bits.
rangeC: [0..3]   2 bits.  log2(4) = 2 bits
rangeD: [0..4]   3 bits.  log2(5) = 2.33 bits.

BITWIDTH requires 8 bits to encode these ranges
ARITHMETIC only requires 7 bits (6.907)

rangesA..D can be encoded in the set [0..119]. 120 elements. log2(120) = 6.907 bits

+

instead of bit position and width, ari uses high and low bounds for each range
 
bounds for the above ranges to be encoded:
     
        LO  HI
rangeA: 1    2
rangeB: 2    6
rangeC: 6   24 
rangeD: 24 120

like bitwidth stuffing the ranges can be in any order. I just do by size so is consistent with the rest of the posts here

+

lsl example codes

 

integer encode(integer src, integer hi, integer lo, integer dst){    return (dst / hi * hi) + (src * lo) + (dst % lo);}integer decode(integer dst, integer hi, integer lo){   return dst % hi / lo;}default{   touch_start(integer total_number)   {      integer dst;      dst = encode(0, 2, 1, dst);      dst = encode(1, 6, 2, dst);      dst = encode(2, 24, 6, dst);      dst = encode(3, 120, 24, dst);      integer src;		      src=decode(dst, 2, 1);      llOwnerSay((string)src);      src=decode(dst, 6, 2);      llOwnerSay((string)src);      src=decode(dst, 24, 6);      llOwnerSay((string)src);      src=decode(dst, 120, 24);      llOwnerSay((string)src);			   }}

 

eta: tpyo

Link to comment
Share on other sites

That's actually the first method I learned for squishing more stuff into a word. It's most useful in extremely tight memory situations where execution time is relatively less important. Encoding/decoding requires multiplication and division. Big computers do that in hardware, tiny ones (the kind I play with) do it in software. BItfields require only shifts and logical operations.

I first learned that method on Dad's old PDP-11, the first computer I ever programmed. The full ASCII character set occupies eight bits, and so you can stuff only two characters in a 16-bit word. Memory space is precious when you have only 64KBytes. To save space, DEC created a character set that contained only the uppercase alphabet, 0-9, space, period, dollar sign and percent sign. That's 40 characters in total (50 in octal, the character set was called  "Radix-50"). 40*40*40 = 64000 and 2^16=65536, so you can pack three Rad-50 characters into a 16-bit word.

To reduce the chances somebody might call me on the phone for an explanation in three years, and to make it easier to change range sizes, I'd make the following tweaks to your code...

 

RangeA = 2; // 0-1RangeB = 3; // 0-2RangeC = 4; // 0-3RangeD = 5; // 0-4
LoA = 1;HiA = RangeA*LoA;LoB = HiA;HiB = RangeB *LoB;LoC = HiB;HiC = RangeC*LoC;LoD = HiC;HiD = RangeD*LoD;
integer encode(integer src, integer hi, integer lo, integer dst) { return (dst / hi * hi) + (src * lo) + (dst % lo);
// (dst / hi * hi) remove our range and all ranges below
// + (src * lo) multiply our range into position and add it in
// + (dst % lo) restore ranges below}
integer decode(integer dst, integer hi, integer lo) { return dst % hi / lo;
// % removes ranges above ours
// / normalize our range and remove ranges below}
default { touch_start(integer total_number) { integer dst; dst = encode(0, HiA, LoA, dst); dst = encode(1, HiB, LoB, dst); dst = encode(2, HiC, LoC, dst); dst = encode(3, HiD, LoD, dst); integer src; src=decode(dst, HiA, LoA); llOwnerSay((string)src); src=decode(dst, HiB, LoB); llOwnerSay((string)src); src=decode(dst, HiC, LoC); llOwnerSay((string)src); src=decode(dst, HiD, LoD); llOwnerSay((string)src); }}

And I changed your brace usage because I like bearded men (Kernigan, Ritchie, Thompson, Dad), but not enough to completely agree with them (hence my putting a function's opening brace on the same line as the function name).

When I use bitfields in memory, it's usually to implement single bit flags, where the space savings is substantial. Otherwise I use them to parse the fields of control registers for peripheral interfaces. You might have a serial port control register with a one bit field to specify the number of stop bits (1/2), two bits to specify character length (5/6/7/8 bits), a bit to specify whether there should be a parity bit for error detection, another bit to specify whether to invert the data stream, etc. It's a lot easier to understand StopBits = 3 than ControlRegister |= 0b00110000.

All this code talk has me itching to put my lab back together and do some noodling.

;-).

  • Like 1
Link to comment
Share on other sites

yes. is a good idea to explain stuff more fully. which you do quite well. so big ups for you (:

+

is pretty good I think when start having chats about general programming. For sure most people want to know more specific SL stuff but sometimes get a post by someone like Nikiee wanting to chat about more general things. advance their understandings


+

just about bit stuffing (packing, compression, etc)

the most well-known use of bitwidth is Huffman coding. Is used in ZIP, JPEG, MP3, and quite a few others as well

http://en.wikipedia.org/wiki/Huffman_coding

arithmetic coding is quite a bit slower.  For 2 reasons

Like you say was bc of use of mul/div but in more recent times the hardware has got better at this. which you also mention

the second reason is that when ari is used at the bit level (rather than the multi-byte or byte level) it more often allows us to model closer to the Shannon entropy of a message

Shannon: http://en.wikipedia.org/wiki/Entropy_(information_theory)

so quite a lot of implementations go with bit level. A notable exception is range encoding which uses byte level

http://en.wikipedia.org/wiki/Range_encoding

is shown tho that huffman and range coding are subsets of arithmetic coding. Trading speed for space

bitwidth shifting (which we been doing here also) is also a subset of arithmetic. With same trade off. speed for space

+

a cool thing is the Hutter Prize. (the best efforts so far all use arithmetic coding)

Marcus Hutter proposes that when defining AI (artificial intelligence), the shortest MDL (minimum distance length) of a message is the most useful 

http://prize.hutter1.net/

that while is no way to compute the MDL for the general case bc: http://en.wikipedia.org/wiki/Kolmogorov_complexity

that the matter is worth researching as it (the work) gives us a better understanding of how we might advance AI. and Mr Hutter put his money up to help with this work

Link to comment
Share on other sites


Madelaine McMasters wrote:

 

And I changed your brace usage because I like bearded men (Kernigan, Ritchie, Thompson, Dad), but not enough to completely agree with them (hence my putting a function's opening brace on the same line as the function name).

;-).

you are so a sinner

q; (:

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 3258 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...