Jump to content

Data Types, “density”, and Size


Love Zhaoying
 Share

You are about to reply to a thread that has been inactive for 1675 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

I am working on a script (long term project) that will use data to represent commands. As always, size is a concern. Therefore, I want to represent the commands using a data type that can “pack” a lot of data for the number of bytes  required for the data type.

Which do you think stores the most for the least cost? Integer, float,  vector, rotation, or string? The data will be stored in a list, which is also a consideration.

Here are some thoughts: Integers could store data bitwise to represent different command types and modes. Floats could too, but of course any precision issues would be a bad thing.  If vectors or rotations give you a “bonus factor” of data per # bytes used, that would be a good choice (avoiding the precision issues). I’m trying to get away from strings for this use-case.

Link to comment
Share on other sites

My initial (and uneducated) guess is that integers are going to be the best choice due to size and simplicity. While you might be able to use floats (or sets of floats in the case of vector/rotation), I don't think there's any "extra combinations" you can get out of them compared to integers. (Unless maybe if you're some kind of demigod and are able to design your vector/rotation-based commands so that you can do cross-math on them and still get coherent results.) But of course, if you don't use all 32 bits of an integer, you've wasted the ones you didn't use. It really depends on what your "command" is. Is it just "if X, do Y"? Or does your command include variable data as well?

Assuming command + data -- The simplest solution is to just devote X bits of the integer to some value and just have the command / parameter laid out in segments.

For example, here are the first 8 bits of an integer, where the first 3 bits are used to determine the command, which determines how the other values should be used:

 ┏━━━━━━━━ parameter 2
    ┏━━━━ parameter 1
      ┏━ command
━┻━━  ━┻━
0000 0 000

You would access the values by doing something like:

// input = 151 (1001 0 111)
integer command = input & 7;          // 7 (111)
integer param1 = (input >> 3) & 1;    // 0 (0)
integer param2 = (input >> 4) & 15;   // 9 (1001)

(If you don't understand what's happening there, look up "bit-masking" and the "bitwise AND" (&) operator.)

And from there you can use that information how ever you need them. The problem with this is that you're extremely limited by what exactly you can pass as a parameter, even if you have only 1-2 parameters that can take up the whole 29-30 leftover bits. You'll have to almost hard-code the values that the parameters should represent in the script that receives the commands. The good thing is that based on the command, you can check for parameters of different lengths. Command 1 could have no parameters, command 2 could have three parameters, and command 7 could have one parameter. It's up to you.

P.S. I usually default to "general you" since there's more than two people reading along with us!

Edited by Wulfie Reanimator
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

I understand perfectly, programmer 32 years. I am planning to use the “numeric” command representation only for the command itself. The parameters will be secondary fields of appropriate types (string, int, float, etc.)

The idea for vectors / rotations sprang from: if certain commands WOULD have 2 or 3 numeric parameters or a subcommand, the data structure would probably have some savings vs. additional list entries. ‘Cause a list is all we got. Darn it!

Link to comment
Share on other sites

37 minutes ago, Love Zhaoying said:

I understand perfectly, programmer 32 years. I am planning to use the “numeric” command representation only for the command itself. The parameters will be secondary fields of appropriate types (string, int, float, etc.)

The idea for vectors / rotations sprang from: if certain commands WOULD have 2 or 3 numeric parameters or a subcommand, the data structure would probably have some savings vs. additional list entries. ‘Cause a list is all we got. Darn it!

Then I have to ask a question: How many commands do you expect there to potentially be?

Besides that, I don't think there's any benefit to using vectors in a list vs floats or integers. (Ref to the Wiki, I assume all values are in bytes but these seem quite high... but they can't be bits either?)

For one, ints and floats are both 15 bytes. Vector is 31 bytes. Putting ints/floats in a list has an additional cost of 7 bytes each, vector gets 46 bytes (1 + 3 * 15). So a list of three floats is 66 bytes and a list of one vector is 77 bytes. That doesn't sound good.

I'm a bit unclear on what "list memory usage" means exactly, but based on what I know about pure C, my interpretation is that:

string   4 + 1 per character

Means 4 bytes (int) is for a pointer to the first character in the string, and each character (char) takes 1 byte. And by that logic...

Edit: Actually I don't know anymore. Why would integer take up 7 bytes then? *big head-scratch*

But I digress.

Edited by Wulfie Reanimator
Link to comment
Share on other sites

30 minutes ago, Love Zhaoying said:

Lets say.. 50 commands. 

Aren’t Unicode strings at least 2 bytes per character..?

I don't wanna think about it, the wiki page is too vague and who knows how much of it is accurate (or what the specific implementations are).

Back on topic, in that case even the simplest integer from 1 to 50 (without any bitwise wizardry) would give you the most memory savings. It seems to have the smallest memory footprint and the least overhead in a list.

Edited by Wulfie Reanimator
  • Like 1
Link to comment
Share on other sites

Example use-case:

First four bits: Command class

0001=Load

0010=Store

0011=Move

Next 3 bits: Register

000=n/a

001=Accumulator

010=X-reg

011=Y-reg

100=Mem location (heap list)

etc.

Next 3 bits: Mode

001=Value

010=index X

011=index Y

100=indirect indexed

101=indexed indirect 

Next 3 bits: Type

001=int

010=float

012=string

etc.

Next int following command is value or location to load..

Link to comment
Share on other sites

3 minutes ago, sirhc DeSantis said:

The art of the ZX81

1 REM (internal char set interpreted as Z80 possibly with POKEs)

2 Something about USR

Runs away with nightmares, those wre the days eh.

 

I was pretty sure the ZX series used variants of the 6502..

On Apple 2 series, you entered the mini-assembler with “call -151”.

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 1675 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...