LSL Feature Request: LLSIMD

Jabadahut50 · April 27, 2023

Hey there scripters,

I have made a new feature request over on the JIRA for a new LSL function I think could greatly benefit us.

https://jira.secondlife.com/browse/BUG-233810

I'd love to hear ya'lls opinions, critiques, ideas on how to improve the feature request, etc.

Regards,

Your friendly neighborhood man you've never met before...

*edited because I clearly failed to spell Regards correctly*

Edited April 27, 2023 by Jabadahut50

Quistess Alpha · April 27, 2023

Uhm, name one specific case/algorithm where this function would provide significant improvement.

We already have https://wiki.secondlife.com/wiki/LlListStatistics

Coffee Pancake · April 28, 2023

Thats my question too .. If you're doing the kind of computation that would benefit from this, probably shouldn't be doing it in a prim.

Why am I getting crypto vibes...

Quistess Alpha · April 28, 2023

53 minutes ago, Coffee Pancake said:

Why am I getting crypto vibes...

I hadn't even considered how easy it might be to abuse SL for a cryptomining scheme. . .

Is that why breedables are so awful?

Love Zhaoying · April 28, 2023

Maybe is for matrices?
Create Matrix in SL..

Marvin Benelli · April 28, 2023

How would I need to use this function to speedup this:

c = a + b;

something like:

c = llList2Integer(llSIMDMath,1,[a,b,0,0],”+”),0);

I like my scripts to perform but feel this is not the right way to achieve that goal. Instead, I would improve the LSL compiler in this example to use the SIMD possibilities provided by AWS in the generated bytecode for the + operand

Quistess Alpha · April 28, 2023

6 hours ago, Marvin Benelli said:

Instead, I would improve the LSL compiler in this example to use the SIMD possibilities provided by AWS in the generated bytecode for the + operand

For sake of argument, how do you know the LSL compiler doesn't ~already optimize basic operations?

Jabadahut50 · April 28, 2023

19 hours ago, Quistess Alpha said:

Uhm, name one specific case/algorithm where this function would provide significant improvement.

We already have https://wiki.secondlife.com/wiki/LlListStatistics

llListStatistics is a great function but as far as I'm aware SL doesn't employ any form of SIMD in LSL implicitly let alone explicitly as is being asked with this request, as such it is in itself a good example of how this could provide an improvement. If you have a list of 100 or so values, llListStatistics would (when compiled) call up roughly 100 op codes for adds where as with SIMD, it'd create far fewer op codes and the op codes it does call will be doing the same amount of work as llListStatistics but much of it simoultenously.

15 hours ago, Coffee Pancake said:

Thats my question too .. If you're doing the kind of computation that would benefit from this, probably shouldn't be doing it in a prim.

Why am I getting crypto vibes...

14 hours ago, Quistess Alpha said:

I hadn't even considered how easy it might be to abuse SL for a cryptomining scheme. . .

Is that why breedables are so awful?

Had not considered crypto in SL either. Anyone trying to create some form of crypto currency inside of second life I think would likely be violating some form of TOS rule. 🤷‍♂️

This function could benefit cryptographic communications scripts by speeding up the cryptographic hashing with bitwise XOR but other than that I don't see any value social nor monetary in anything "crypto" related in SL.

9 hours ago, Love Zhaoying said:

Maybe is for matrices?
Create Matrix in SL..

Would be interesting for SL to add proper 2d (or even 3d) arrays but I think they're pretty set on sticking with the data types we have.

7 hours ago, Marvin Benelli said:

How would I need to use this function to speedup this:

c = a + b;

something like:

c = llList2Integer(llSIMDMath,1,[a,b,0,0],”+”),0);

I like my scripts to perform but feel this is not the right way to achieve that goal. Instead, I would improve the LSL compiler in this example to use the SIMD possibilities provided by AWS in the generated bytecode for the + operand

It would not benefit that one. The proposed function is for speeding up bulk math/logic operations by doing the same operation on 2, 3, or 4 equations at a time.

you would want to assign the output of llSIMD to a list variable first (either appending or prepending it) and then from there you could move it to individual variables if the list variable is not preferred.

I do somewhat agree with your second point that, yes, an implicit SIMD would be nice but unless they where very clear about how it's implemented it'd be hard for us to know exactly how to implement it. Giving an explicit method of implementing it in this proposed method makes it much easier to know exactly when you are benefiting

41 minutes ago, Quistess Alpha said:

For sake of argument, how do you know the LSL compiler doesn't ~already optimize basic operations?

They likley do optimize basic operations to some degree. However I could not find anything about SIMD. I suppose it's certainly possible they do, but I could not find any documentation about it anywhere.

Love Zhaoying · April 28, 2023

When I read the request, my first thought was - "only one operation"? I'd think a series of operations (+, -, etc.) applied to each result for the next calculation/sequence would be more useful.

Jabadahut50 · April 28, 2023

4 minutes ago, Love Zhaoying said:

When I read the request, my first thought was - "only one operation"? I'd think a series of operations (+, -, etc.) applied to each result for the next calculation/sequence would be more useful.

It would but this is a fundamental limitation of how SIMD (the cpu architecture feature this function is aiming to utilize) functions. It packs multiple data points into larger registers and then runs the single input on the multiple bits of data.

Edited April 28, 2023 by Jabadahut50
edited for clarification

Love Zhaoying · April 28, 2023

12 minutes ago, Jabadahut50 said:

It would but this is a fundamental limitation of how SIMD (the cpu architecture feature this function is aiming to utilize) functions. It packs multiple data points into larger registers and then runs the single input on the multiple bits of data.

It's what I figured. Sorry, but I would not vote for your change, as I do not see a suitable use-case for a string of calculations as you describe (all using the same operator).

Jabadahut50 · April 28, 2023

2 minutes ago, Love Zhaoying said:

It's what I figured. Sorry, but I would not vote for your change, as I do not see a suitable use-case for a string of calculations as you describe (all using the same operator).

a few examples off the top of my head:

checking multiple binary flags at once with bitwise instructions

large math sums of long lists

speeding up bitwise XOR hashing

To each there own but I feel anywhere where you can break dependency chains and need to do the same thing on several different bits of data at once it can provide quite the boost. It's admittedly considered to many a form of late stage development optimization by many in other development environments by many coders but I feel if you code with it in mind it can provide significant performance boosts for where it is useful. There are plenty of built in LSL functions that do niche things but when used properly can provide increase performance.

Wulfie Reanimator · April 28, 2023

Reading a bit about SIMD, it does seem useful for cases where you need lots of computation that can be done in parallel. My mandelbrot renderer could use some speed. 😋 My (very old) multi-legged walker could probably solve each leg at the same time... maybe not.

If anything, llListStatistics could be updated with new options... it can't even subtract! Whether they specifically use SIMD or something else on the CPU doesn't matter, as long as it's faster than LSL and produces the documented outcome.

Edited April 28, 2023 by Wulfie Reanimator

Quistess Alpha · April 28, 2023

18 minutes ago, Wulfie Reanimator said:

Whether they specifically use SIMD or something else on the CPU doesn't matter, as long as it's faster than LSL and produces the documented outcome.

^^ Agreed. as a general 'LSL principal' I'd rather have functions that provide some "general and obvious use" rather than something with a known specific implementation.

I could see something like llStridedListStatistics, maybe being nice, do all the operations, and also be able to them on groups of n entries, either within the group or between groups.

Ex.

list result  = llStridedListStatistics(LIST_STAT_SUM,2,FALSE,[1,2,3,4,5,6]); // expect [1+3+5,2+4+6];
list result2 = llStridedListStatistics(LIST_STAT_SUM,2,TRUE,[1,2,3,4,5,6]); // expect [1+2,3+4,5+6];

and leave specific implementation to the devs.

Jabadahut50 · April 28, 2023

27 minutes ago, Quistess Alpha said:
^^ Agreed. as a general 'LSL principal' I'd rather have functions that provide some "general and obvious use" rather than something with a known specific implementation.

I could see something like llStridedListStatistics, maybe being nice, do all the operations, and also be able to them on groups of n entries, either within the group or between groups.

Ex.
list result  = llStridedListStatistics(LIST_STAT_SUM,2,FALSE,[1,2,3,4,5,6]); // expect [1+3+5,2+4+6];
list result2 = llStridedListStatistics(LIST_STAT_SUM,2,TRUE,[1,2,3,4,5,6]); // expect [1+2,3+4,5+6];
and leave specific implementation to the devs.

This is a much more elegant solution than my suggestion actually. A strided list with known built in SIMD optimizations under the hood (as long as we can get all the operations and not JUST those mentioned in the liststatistics method) would better fit the overall nature of the LSL language while still getting the result I was looking for with my suggestion.

Jabadahut50 · April 28, 2023

41 minutes ago, Quistess Alpha said:
^^ Agreed. as a general 'LSL principal' I'd rather have functions that provide some "general and obvious use" rather than something with a known specific implementation.

I could see something like llStridedListStatistics, maybe being nice, do all the operations, and also be able to them on groups of n entries, either within the group or between groups.

Ex.
list result  = llStridedListStatistics(LIST_STAT_SUM,2,FALSE,[1,2,3,4,5,6]); // expect [1+3+5,2+4+6];
list result2 = llStridedListStatistics(LIST_STAT_SUM,2,TRUE,[1,2,3,4,5,6]); // expect [1+2,3+4,5+6];
and leave specific implementation to the devs.

I gotta head to bed (3rd shifter and gotta work tonight) but do you mind if I rewrite my jira submission more around your suggestion (crediting you of course) or would you rather make the jira suggestion?

Quistess Alpha · April 28, 2023

1 minute ago, Jabadahut50 said:

I gotta head to bed (3rd shifter and gotta work tonight) but do you mind if I rewrite my jira submission more around your suggestion (crediting you of course) or would you rather make the jira suggestion?

you can edit yours or something, I don't particularly mind, or care much about accreditation.

Coffee Pancake · April 29, 2023

As I mentioned int he JIRA .. without an actual specific use case, this is dead.

We're not going to get new scripting functions because they might offer cool performance yet vaguely useful gains, if there are no specific examples that need those gains to function (and even then, a specific function to accelerate specific use cases is more likely to succeed).

speeding up compression and decompression algorithms (Common algorithms in the 64kb limited enviornment of LSL)

Which we shouldn't be doing. We would be universally better served by pushing for bigger scripts

speeding up physics calculations (For attachments or in cases where the havok physics system does not provide the desired behavior)

Avatars are a client side phenomenon, as such it's is impossible to do anything server side that might imply physicality. It will never, ever, match the observed avatar state.

If you wish to simulate physics independent of havok using object updates, it doesn't matter how fast the math is. The object update part makes it garbage, which is why almost nothing tries to pull this off.

speeding up multiple bitwise flag checks

Kind of an edge case .. the overwhelming majority of scripts are wasting entire integers to store booleans.

speeding up bulk initialization of starting values

Eeehhhh ..

speeding up any general bulk math or logic that does not have a dependency chain

for what use case ...

Madelaine McMasters · April 29, 2023

Bleuhazenfurfle, in the JIRA comments, points out the FATAL flaw in any desire to invoke SIMD instructions in LSL. LSL does not support arrays. The overhead of unpacking one SIMD register's worth of data from a list, then repacking the result, will dwarf any advantage of the SIMD instruction. The true value of SIMD instructions comes from their use in loops that walk through arrays of data already in SIMD format.

This request is DOA.

Love Zhaoying · April 29, 2023

13 minutes ago, Madelaine McMasters said:

Bleuhazenfurfle

If I missed this thread, I never would have seen this awesome name!

Bleuhazenfurfle · April 30, 2023

I'm not so sure it's DOA… As I said in my Jira response, it still beats the pants off doing it in a loop in LSL. And as was pointed out above, SIMD maybe could still come into play, possibly, but it's certainly not going to be more than an implementation detail.

The list thing, though, that is a problem. Of which I had two-ish minds:

Adding a list-outputting version of llListStatistics (llListMathematics?), taking two input lists, and an operator, and producing one output list. Not going to get much efficiency out of it, but, still much better than a loop in LSL — after all, llListStatistics doesn't provide anything we can't already do in LSL either, but I'm frequently sure glad it's there.
Also, llListMathematicsStrided (and perhaps a matching llListStatisticsStrided) for doing the same on a strided list — the major benefit being it could potentially handle both horizontal and vertical striding (negative stride length indicating vertical rather than horizontal, perhaps) across multi-argument operations (eg. sum groups of 5 numbers, average over groups, find the maximum among each group, etc.). The major disadvantage, of course, being you'll usually have to compose that strided list for each individual operation — hence, allowing a choice of stride direction. Even better if we can provide a separate list of the striding (similar to the specifiers list for llGetJSON), allowing it to pluck the numbers out of an existing strided list containing other data (hopefully, often entirely avoiding the need to recompose a new strided list).
An entire RPN style bulk calculator with internal register heap, ala llGetPrimitiveParams. This form would have a fighting chance of getting the efficiency, but I doubt it'd ever fly because it could REALLY rack up memory something shocking — UNLESS…

The memory horrors of the RPN calculator style could be mitigated by trading memory for time instead; essentially, the distinct operations build a stack of iterators, which only at the end then gets evaluated and the result returned as a list. It's going to be about as far away from SIMD efficiency as you could possibly get, but, probably STILL better than a loop in LSL. (Seriously, have you ever timed those things…?!?) As reference, D's use of ranges, and pretty much every functional language ever, behaves much like this. (Along with the benefit of a decent optimising compiler, but still…)

The problem, is it lends itself to the same reason that was floating around for why we didn't get regex until recently — this one command can burn a frightful amount of time. Since regex is a thing in LSL now (at least, minimally, with interest in more), I presume either they've figured out how to make it suspendable if it overruns the scripts timeslot, or, they figure the worst case isn't tooo bad that it should be safe enough. In the former case, this might still have a chance. In the latter case, keeping it strictly stack based might be sufficient

Another thing that would go a long way to making this more sane, is a mixing function to construct a strided list from two source lists, or, from the concatenation of two or more source lists. Of some amusement, my idea for the RPN calculator function could actually fill that role, also.

Edited April 30, 2023 by Bleuhazenfurfle

Love Zhaoying · April 30, 2023

1 hour ago, Bleuhazenfurfle said:

I'm not so sure it's DOA… As I said in my Jira response, it still beats the pants off doing it in a loop in LSL. And as was pointed out above, SIMD maybe could still come into play, possibly, but it's certainly not going to be more than an implementation detail.

The list thing, though, that is a problem. Of which I had two-ish minds:

Adding a list-outputting version of llListStatistics (llListMathematics?), taking two input lists, and an operator, and producing one output list. Not going to get much efficiency out of it, but, still much better than a loop in LSL — after all, llListStatistics doesn't provide anything we can't already do in LSL either, but I'm frequently sure glad it's there.

Also, llListMathematicsStrided (and perhaps a matching llListStatisticsStrided) for doing the same on a strided list — the major benefit being it could potentially handle both horizontal and vertical striding (negative stride length indicating vertical rather than horizontal, perhaps) across multi-argument operations (eg. sum groups of 5 numbers, average over groups, find the maximum among each group, etc.). The major disadvantage, of course, being you'll usually have to compose that strided list for each individual operation — hence, allowing a choice of stride direction. Even better if we can provide a separate list of the striding (similar to the specifiers list for llGetJSON), allowing it to pluck the numbers out of an existing strided list containing other data (hopefully, often entirely avoiding the need to recompose a new strided list).

An entire RPN style bulk calculator with internal register heap, ala llGetPrimitiveParams. This form would have a fighting chance of getting the efficiency, but I doubt it'd ever fly because it could REALLY rack up memory something shocking — UNLESS…

The memory horrors of the RPN calculator style could be mitigated by trading memory for time instead; essentially, the distinct operations build a stack of iterators, which only at the end then gets evaluated and the result returned as a list. It's going to be about as far away from SIMD efficiency as you could possibly get, but, probably STILL better than a loop in LSL. (Seriously, have you ever timed those things…?!?) As reference, D's use of ranges, and pretty much every functional language ever, behaves much like this. (Along with the benefit of a decent optimising compiler, but still…)

The problem, is it lends itself to the same reason that was floating around for why we didn't get regex until recently — this one command can burn a frightful amount of time. Since regex is a thing in LSL now (at least, minimally, with interest in more), I presume either they've figured out how to make it suspendable if it overruns the scripts timeslot, or, they figure the worst case isn't tooo bad that it should be safe enough. In the former case, this might still have a chance. In the latter case, keeping it strictly stack based might be sufficient

Another thing that would go a long way to making this more sane, is a mixing function to construct a strided list from two source lists, or, from the concatenation of two or more source lists. Of some amusement, my idea for the RPN calculator function could actually fill that role, also.

Having recently written PEMDAS evaluator code, I like the idea of an RPN calculator.

Marvin Benelli · April 30, 2023

Wolfie triggered me with his remark on parallel computations for fractals. I remember doing that (long time ago) on a specialized image processor that had “vector graphics”, meaning you could apply an operation on a vector of pixels. The typical application was to stuff an entire screen row in a vector, do the fractal calculation on the vector and use the resulting color vector to paint the row on screen.

For applying this in SL i am thinking of updating scripts in multiple instances of a product (eg, a vendor, a tp portal, a tipjar, a group inviter) within a sim. Nowadays you would have to identify all objects to be updated and then loop over the identified set to update each individually. That could be done in parallel.

So maybe:

list x;

Integer op;

list y;

list z = llListProcess(x,op,y)

could be useful which would be functionally identical to (but executed in parallel)

z = [];

for(i=0;i<llGetListLength(x);i++)

z += [op(xi,yi)];

I still believe we should leave it the responsibility of the LSL compiler to decide when and where to use SIMD instructions, just like we also dont give it clues about what (graphics) processor instructions to use.

I also agree with Coffee that practical use cases should come first

Edited April 30, 2023 by Marvin Benelli

Bleuhazenfurfle · April 30, 2023

2 hours ago, Marvin Benelli said:

list z = llListProcess(x,op,y)

could be useful which would be functionally identical to (but executed in parallel)

That's basically what I was calling llListMathematics, to go with llListStatistics, though I'd put the op first, pesonally. I would suggest one small change, if one list is shorter than the other, the cycle the items in a ring. ie. (Obviously not LSL):

integer Al = length(A), Bl = length(B), i = max(Al, Bl);
list Z[i]; while (i--) Z[i] = op(A[i % Al], B[i % Bl]);

This gives the effect of the SIMD N:1 form, along with some tricks like the sign inversions common in matrix math, without having to feed in an entire matrix worth of 1's and -1's.

8 hours ago, Love Zhaoying said:

Having recently written PEMDAS evaluator code, I like the idea of an RPN calculator.

When I was in school, we used to call it BODMAS… And this "new math" business, is just what new people learn in school, while the grown ups "old math" is just that thing more commonly referred to as "algebra". Here's the secret though; ÷ and × work just the same in algebra, as it does in "new math" — I guess people just think "new math" makes them sound smarter than "kids math". Aaaanyhow… Since you bring it up, a quick summary of what I came up with:

True RPN has a couple issues; it looks ugly, it's unfamiliar, it's complicated with it's own set of operations just to manage the stack, it's hard to keep track of everything, it's virtually unreadable, and generally just a PITA. Also, it doesn't work given the data types we have; do you REALLY want to have to use strings for your operators?!? But if you use integers, then how do you tell them apart from the values? And what if you want to let it work with strings, vectors, and rotations too? (You can add strings, at least.) So right off, we swap the parameters around to resemble llGetPrimitiveParams rules. We now have something that looks much better — but you still actually can't read it. Which is why I mentioned registers before. I ended up augmenting the stack with named registers, with the "null register" (an empty string) being the traditional RPN stack, and every instruction got a destination register, so it ends up looking more like Assembly language. And if you have an empty string variable named STACK, and a column index were appropriate (1-based, with negative indexes forming a horizontal slice instead of a vertical one) you get this:

RPN_INPUT, /*result table*/ "input", /*stride*/ 2, /*length*/ 18, /*data*/ "A",5,"B",7,"C",9,"D",3,"E",1,"F",4,"G",2,"H",6,"I",8,
RPN_REGISTER, /*result table*/ STACK, /*source column*/ "input",-2, // need to take a horizontal slice (one row with the 9 scores)
RPN_GEOMETRIC_MEAN, /*result column*/ STACK,0, /*source table*/ "input", // horizontally collapses the entire table (one row, so one value)
RPN_GREATER_THAN, /*result column*/ STACK,0, /*lhs column*/ "input,2, /*rhs column*/ STACK,1, // single-row rhs is repeated cyclically
RPN_FILTER, /*result table*/ STACK, /*source table*/ "input", /*selection column*/ STACK,1 // filters entire table rows, output left on stack

So that's what I ended up with before I put the idea aside. It's still RPN-esque, still has a stack so you don't have to name EVERYTHING, but avoids the worst of the stack fiddling (never need DUP or SWAP or the rest, because you just use named registers for the annoying ones), and you can skip the stack entirely with a few creatively reusable names like "A", "B", "C"…

Of interest, GEOMETRIC_MEAN (also ADD, and similar) horizontally collapse an entire table, since we want to process a column of numbers, instead of a table of rows of numbers, the REGISTER allows us to "slice" the values we want out of the source data, choosing whether to leave it as a column (positive index), or form it into a row instead (negative index, which is also why it doesn't get a destination column index). And functions that return a single column, can write that column back into an existing table, or replace it entirely by specifying column 0 (in the case of STACK, it pushes a new table) which also makes my OCD happier about the 1-based indexing. These basic operations were also kind of interesting, since they horizontally collapse tables into a single column, but sometimes you want them to work on the columns of several distinct tables. I had a table builder function (basically, a mass llList2List/llListReplaceList combo taking a range of source table columns, and using them to replace a range of destination table columns), and another one that slurps up the top N entries off the stack (N=0 to just take them all), and glues them into a single table. But using either (possibly several times) before each of those operations seemed kind of icky — though it definitely seems the most flexible option, while being syntactically clean. (For example; [RPN_STACK_BUILDER, 3, RPN_ADD] would take the top three entries off the stack, glue them together into one big table, and then collapse it horizontally by summing all the values in each row.) I guess it even actually kind of looks more RPNish like that (this would be horrible if it was building the new table just to be consumed, but using the iterator stacking method I don't think it actually makes a practical difference).

All in all, I was mostly rather happy witht he result. Still don't think it'd ever become a thing, though. (Also, I rather like my name, too. )

Edited April 30, 2023 by Bleuhazenfurfle

Marvin Benelli · April 30, 2023

I was never into RPN that much, but that’s of ofcourse subjective.

For the operation parameter we could use the function id of the lsl functions and then a list for the operands. So we could then do things like:

x = llProcessList(LOADSCRIPTPIN,[gTargets, script, pin, running, start]) which would update all objects in the list gTargets.

x would be [] because llLoadScriptPin does not return a value.

Similar for sending messages to other objects within a sim using llRegionSayTo:

x = llProcessList(REGIONSAYTO,gTargets,[channel, message]);

LSL Feature Request: LLSIMD

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Please sign in to comment

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us