# Data Types, “density”, and Size

Please take a moment to consider if this thread is worth bumping.

## Recommended Posts

I am working on a script (long term project) that will use data to represent commands. As always, size is a concern. Therefore, I want to represent the commands using a data type that can “pack” a lot of data for the number of bytes  required for the data type.

Which do you think stores the most for the least cost? Integer, float,  vector, rotation, or string? The data will be stored in a list, which is also a consideration.

Here are some thoughts: Integers could store data bitwise to represent different command types and modes. Floats could too, but of course any precision issues would be a bad thing.  If vectors or rotations give you a “bonus factor” of data per # bytes used, that would be a good choice (avoiding the precision issues). I’m trying to get away from strings for this use-case.

##### Share on other sites

My initial (and uneducated) guess is that integers are going to be the best choice due to size and simplicity. While you might be able to use floats (or sets of floats in the case of vector/rotation), I don't think there's any "extra combinations" you can get out of them compared to integers. (Unless maybe if you're some kind of demigod and are able to design your vector/rotation-based commands so that you can do cross-math on them and still get coherent results.) But of course, if you don't use all 32 bits of an integer, you've wasted the ones you didn't use. It really depends on what your "command" is. Is it just "if X, do Y"? Or does your command include variable data as well?

Assuming command + data -- The simplest solution is to just devote X bits of the integer to some value and just have the command / parameter laid out in segments.

For example, here are the first 8 bits of an integer, where the first 3 bits are used to determine the command, which determines how the other values should be used:

``` ┏━━━━━━━━ parameter 2
┃   ┏━━━━ parameter 1
┃   ┃  ┏━ command
━┻━━ ┻ ━┻━
0000 0 000```

You would access the values by doing something like:

```// input = 151 (1001 0 111)
integer command = input & 7;          // 7 (111)
integer param1 = (input >> 3) & 1;    // 0 (0)
integer param2 = (input >> 4) & 15;   // 9 (1001)```

(If you don't understand what's happening there, look up "bit-masking" and the "bitwise AND" (&) operator.)

And from there you can use that information how ever you need them. The problem with this is that you're extremely limited by what exactly you can pass as a parameter, even if you have only 1-2 parameters that can take up the whole 29-30 leftover bits. You'll have to almost hard-code the values that the parameters should represent in the script that receives the commands. The good thing is that based on the command, you can check for parameters of different lengths. Command 1 could have no parameters, command 2 could have three parameters, and command 7 could have one parameter. It's up to you.

P.S. I usually default to "general you" since there's more than two people reading along with us!

Edited by Wulfie Reanimator
• 1
• 1
##### Share on other sites

I understand perfectly, programmer 32 years. I am planning to use the “numeric” command representation only for the command itself. The parameters will be secondary fields of appropriate types (string, int, float, etc.)

The idea for vectors / rotations sprang from: if certain commands WOULD have 2 or 3 numeric parameters or a subcommand, the data structure would probably have some savings vs. additional list entries. ‘Cause a list is all we got. Darn it!

##### Share on other sites

Bitwise yes, just like how old assembly language commands and modes (indexed, accumulator, etc.) by bit.

##### Share on other sites

37 minutes ago, Love Zhaoying said:

I understand perfectly, programmer 32 years. I am planning to use the “numeric” command representation only for the command itself. The parameters will be secondary fields of appropriate types (string, int, float, etc.)

The idea for vectors / rotations sprang from: if certain commands WOULD have 2 or 3 numeric parameters or a subcommand, the data structure would probably have some savings vs. additional list entries. ‘Cause a list is all we got. Darn it!

Then I have to ask a question: How many commands do you expect there to potentially be?

Besides that, I don't think there's any benefit to using vectors in a list vs floats or integers. (Ref to the Wiki, I assume all values are in bytes but these seem quite high... but they can't be bits either?)

For one, ints and floats are both 15 bytes. Vector is 31 bytes. Putting ints/floats in a list has an additional cost of 7 bytes each, vector gets 46 bytes (1 + 3 * 15). So a list of three floats is 66 bytes and a list of one vector is 77 bytes. That doesn't sound good.

I'm a bit unclear on what "list memory usage" means exactly, but based on what I know about pure C, my interpretation is that:

`string   4 + 1 per character`

Means 4 bytes (int) is for a pointer to the first character in the string, and each character (char) takes 1 byte. And by that logic...

Edit: Actually I don't know anymore. Why would integer take up 7 bytes then? *big head-scratch*

But I digress.

Edited by Wulfie Reanimator
##### Share on other sites

Lets say.. 50 commands.

Aren’t Unicode strings at least 2 bytes per character..?

##### Share on other sites

30 minutes ago, Love Zhaoying said:

Lets say.. 50 commands.

Aren’t Unicode strings at least 2 bytes per character..?

I don't wanna think about it, the wiki page is too vague and who knows how much of it is accurate (or what the specific implementations are).

Back on topic, in that case even the simplest integer from 1 to 50 (without any bitwise wizardry) would give you the most memory savings. It seems to have the smallest memory footprint and the least overhead in a list.

Edited by Wulfie Reanimator
• 1
##### Share on other sites

Example use-case:

First four bits: Command class

0010=Store

0011=Move

Next 3 bits: Register

000=n/a

001=Accumulator

010=X-reg

011=Y-reg

100=Mem location (heap list)

etc.

Next 3 bits: Mode

001=Value

010=index X

011=index Y

100=indirect indexed

101=indexed indirect

Next 3 bits: Type

001=int

010=float

012=string

etc.

Next int following command is value or location to load..

##### Share on other sites

if you are happy to trade speed for memory then can pack/unpack the bits of a string

example here:

##### Share on other sites

10 hours ago, Mollymews said:

if you are happy to trade speed for memory then can pack/unpack the bits of a string

example here:

Hmm..that seems less efficient memory-wise, as each UTF-8 char is at least 2 bytes. But thank you!

##### Share on other sites

1 hour ago, Love Zhaoying said:

Hmm..that seems less efficient memory-wise, as each UTF-8 char is at least 2 bytes. But thank you!

the encoding method means that a list is not needed.  Can use a string instead of a list

• 3
##### Share on other sites

43 minutes ago, Mollymews said:

the encoding method means that a list is not needed.  Can use a string instead of a list

👍🏾😻

• 1
##### Share on other sites

The art of the ZX81

1 REM (internal char set interpreted as Z80 possibly with POKEs)

Runs away with nightmares, those wre the days eh.

• 1
##### Share on other sites

3 minutes ago, sirhc DeSantis said:

The art of the ZX81

1 REM (internal char set interpreted as Z80 possibly with POKEs)

Runs away with nightmares, those wre the days eh.

I was pretty sure the ZX series used variants of the 6502..

On Apple 2 series, you entered the mini-assembler with “call -151”.

##### Share on other sites

I'd use integer wherever possible, it's the most compact & lossless data type we have on hand.

##### Share on other sites

42 minutes ago, Kyrah Abattoir said:

I'd use integer wherever possible, it's the most compact & lossless data type we have on hand.

Thanks, I finally found the LSL wiki page on memori use. Interesting that it’s different for locals, and globals. List info is given too.