Data Types, “density”, and Size

Love Zhaoying · September 19, 2019

I am working on a script (long term project) that will use data to represent commands. As always, size is a concern. Therefore, I want to represent the commands using a data type that can “pack” a lot of data for the number of bytes required for the data type.

Which do you think stores the most for the least cost? Integer, float, vector, rotation, or string? The data will be stored in a list, which is also a consideration.

Here are some thoughts: Integers could store data bitwise to represent different command types and modes. Floats could too, but of course any precision issues would be a bad thing. If vectors or rotations give you a “bonus factor” of data per # bytes used, that would be a good choice (avoiding the precision issues). I’m trying to get away from strings for this use-case.

Wulfie Reanimator · September 19, 2019

My initial (and uneducated) guess is that integers are going to be the best choice due to size and simplicity. While you might be able to use floats (or sets of floats in the case of vector/rotation), I don't think there's any "extra combinations" you can get out of them compared to integers. (Unless maybe if you're some kind of demigod and are able to design your vector/rotation-based commands so that you can do cross-math on them and still get coherent results.) But of course, if you don't use all 32 bits of an integer, you've wasted the ones you didn't use. It really depends on what your "command" is. Is it just "if X, do Y"? Or does your command include variable data as well?

Assuming command + data -- The simplest solution is to just devote X bits of the integer to some value and just have the command / parameter laid out in segments.

For example, here are the first 8 bits of an integer, where the first 3 bits are used to determine the command, which determines how the other values should be used:

 ┏━━━━━━━━ parameter 2
 ┃   ┏━━━━ parameter 1
 ┃   ┃  ┏━ command
━┻━━ ┻ ━┻━
0000 0 000

You would access the values by doing something like:

// input = 151 (1001 0 111)
integer command = input & 7;          // 7 (111)
integer param1 = (input >> 3) & 1;    // 0 (0)
integer param2 = (input >> 4) & 15;   // 9 (1001)

(If you don't understand what's happening there, look up "bit-masking" and the "bitwise AND" (&) operator.)

And from there you can use that information how ever you need them. The problem with this is that you're extremely limited by what exactly you can pass as a parameter, even if you have only 1-2 parameters that can take up the whole 29-30 leftover bits. You'll have to almost hard-code the values that the parameters should represent in the script that receives the commands. The good thing is that based on the command, you can check for parameters of different lengths. Command 1 could have no parameters, command 2 could have three parameters, and command 7 could have one parameter. It's up to you.

P.S. I usually default to "general you" since there's more than two people reading along with us!

Edited September 19, 2019 by Wulfie Reanimator

Love Zhaoying · September 19, 2019

I understand perfectly, programmer 32 years. I am planning to use the “numeric” command representation only for the command itself. The parameters will be secondary fields of appropriate types (string, int, float, etc.)

The idea for vectors / rotations sprang from: if certain commands WOULD have 2 or 3 numeric parameters or a subcommand, the data structure would probably have some savings vs. additional list entries. ‘Cause a list is all we got. Darn it!

Love Zhaoying · September 19, 2019

Bitwise yes, just like how old assembly language commands and modes (indexed, accumulator, etc.) by bit.

Wulfie Reanimator · September 19, 2019

37 minutes ago, Love Zhaoying said:

I understand perfectly, programmer 32 years. I am planning to use the “numeric” command representation only for the command itself. The parameters will be secondary fields of appropriate types (string, int, float, etc.)

The idea for vectors / rotations sprang from: if certain commands WOULD have 2 or 3 numeric parameters or a subcommand, the data structure would probably have some savings vs. additional list entries. ‘Cause a list is all we got. Darn it!

Then I have to ask a question: How many commands do you expect there to potentially be?

Besides that, I don't think there's any benefit to using vectors in a list vs floats or integers. (Ref to the Wiki, I assume all values are in bytes but these seem quite high... but they can't be bits either?)

For one, ints and floats are both 15 bytes. Vector is 31 bytes. Putting ints/floats in a list has an additional cost of 7 bytes each, vector gets 46 bytes (1 + 3 * 15). So a list of three floats is 66 bytes and a list of one vector is 77 bytes. That doesn't sound good.

I'm a bit unclear on what "list memory usage" means exactly, but based on what I know about pure C, my interpretation is that:

string   4 + 1 per character

Means 4 bytes (int) is for a pointer to the first character in the string, and each character (char) takes 1 byte. And by that logic...

Edit: Actually I don't know anymore. Why would integer take up 7 bytes then? *big head-scratch*

But I digress.

Edited September 19, 2019 by Wulfie Reanimator

Love Zhaoying · September 19, 2019

Lets say.. 50 commands.

Aren’t Unicode strings at least 2 bytes per character..?

Wulfie Reanimator · September 19, 2019

30 minutes ago, Love Zhaoying said:

Lets say.. 50 commands.

Aren’t Unicode strings at least 2 bytes per character..?

I don't wanna think about it, the wiki page is too vague and who knows how much of it is accurate (or what the specific implementations are).

Back on topic, in that case even the simplest integer from 1 to 50 (without any bitwise wizardry) would give you the most memory savings. It seems to have the smallest memory footprint and the least overhead in a list.

Edited September 19, 2019 by Wulfie Reanimator

Love Zhaoying · September 19, 2019

Example use-case:

First four bits: Command class

0001=Load

0010=Store

0011=Move

Next 3 bits: Register

000=n/a

001=Accumulator

010=X-reg

011=Y-reg

100=Mem location (heap list)

etc.

Next 3 bits: Mode

001=Value

010=index X

011=index Y

100=indirect indexed

101=indexed indirect

Next 3 bits: Type

001=int

010=float

012=string

etc.

Next int following command is value or location to load..

Mollymews · September 20, 2019

if you are happy to trade speed for memory then can pack/unpack the bits of a string

example here:

Love Zhaoying · September 21, 2019

10 hours ago, Mollymews said:

if you are happy to trade speed for memory then can pack/unpack the bits of a string

example here:

Hmm..that seems less efficient memory-wise, as each UTF-8 char is at least 2 bytes. But thank you!

Mollymews · September 21, 2019

1 hour ago, Love Zhaoying said:

Hmm..that seems less efficient memory-wise, as each UTF-8 char is at least 2 bytes. But thank you!

the encoding method means that a list is not needed. Can use a string instead of a list

Love Zhaoying · September 21, 2019

43 minutes ago, Mollymews said:

the encoding method means that a list is not needed. Can use a string instead of a list

👍🏾😻

sirhc DeSantis · September 21, 2019

The art of the ZX81

1 REM (internal char set interpreted as Z80 possibly with POKEs)

2 Something about USR

Runs away with nightmares, those wre the days eh.

Love Zhaoying · September 21, 2019

3 minutes ago, sirhc DeSantis said:

The art of the ZX81

1 REM (internal char set interpreted as Z80 possibly with POKEs)

2 Something about USR

Runs away with nightmares, those wre the days eh.

I was pretty sure the ZX series used variants of the 6502..

On Apple 2 series, you entered the mini-assembler with “call -151”.

Kyrah Abattoir · September 24, 2019

I'd use integer wherever possible, it's the most compact & lossless data type we have on hand.

Love Zhaoying · September 24, 2019

42 minutes ago, Kyrah Abattoir said:

I'd use integer wherever possible, it's the most compact & lossless data type we have on hand.

Thanks, I finally found the LSL wiki page on memori use. Interesting that it’s different for locals, and globals. List info is given too.

Data Types, “density”, and Size

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Please sign in to comment

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us