Jump to content
Linden Lab

Parent/Child object Script Communications

Recommended Posts

Posted (edited)
Quote

As part of our ongoing efforts to improve script performance, we recently made changes to how scripts are scheduled and events are delivered. [...] We have now made further improvements that should prevent most of those problems, but even with those fixes there will be some changes in the timing and order of how scripts are run.

Can you go into any specific details about those changes? (Talk techy to me, LL.)

Or at least, what can we as scripters expect that is different from before, or is this just a performance increase?

Why did content break, or was it only because of those already existing race-conditions?

Edited by Wulfie Reanimator

Share this post


Link to post
Share on other sites
20 minutes ago, Wulfie Reanimator said:

Can you go into any specific details about those changes? (Talk techy to me, LL.)

Or at least, what can we as scripters expect that is different from before, or is this just a performance increase?

Why did content break, or was it only because of those already existing race-conditions?

I'm not going to go into details about what changed beyond saying that some events (notably chat listens) were being collected and distributed once every script execution for every script. They are now queued immediately to the subscribed script(s) removing some work from the scheduler. 

The window between the object_rez event and on_rez() has become more variable and the delivery of chat messages to a channel has become faster. So, scripters that already use a handshake as recommended in the blog post will not notice any difference and their scripts will continue to work. However if the rezzers are relying on a timeout to determine when their rezzed object is ready, those scripts may break. 

on_rez() in multiple scripts should all occur within a server frame or two of each other (depending on the script load of the server), however the order in which they occur remains arbitrary. 

  • Thanks 7

Share this post


Link to post
Share on other sites

A nice solution would be to have the API pass a string, instead of just an integer, when rezzing something. Then all the info the rezzed object needs to start could be packed into a string, using the CSV or JSON functions. No race conditions and no handshaking.

Share this post


Link to post
Share on other sites

I've seen quite some failures over time and never trusted undocumented timings or that things happen in an expected order.
So I'm not affected here.
Having a string as rez parameterr would simplify rezzing quite a bit though.

Share this post


Link to post
Share on other sites
10 hours ago, animats said:

A nice solution would be to have the API pass a string, instead of just an integer, when rezzing something. Then all the info the rezzed object needs to start could be packed into a string, using the CSV or JSON functions. No race conditions and no handshaking.

agree that being able to pass a string would be useful for the one-way case

altho Rider's case is about the two-way, the rezzer validating the presence of the rezzee before doing stuff (like Inventory Drop) on the rezzee

in the one way case when we need a bit more than an integer's worth of data, then currently a way is for rezzee to read the rezzer's object description field

Share this post


Link to post
Share on other sites
45 minutes ago, Mollymews said:

... currently a way is for rezzee to read the rezzer's object description field

... tempting scripters to invent their very own race conditions

Share this post


Link to post
Share on other sites
2 hours ago, Qie Niangao said:

... tempting scripters to invent their very own race conditions

can be yes, depending on how is done

the main issue is when a rezzer is rezzing objects on a fast timer, where is possible for a unique message intended for rez_object_1 to be replaced by a unique message for rez_object_2 before rez_object_1 retrieves its own message. We can know when this happens by using a unique identifier for each rez_object instance passed as the rez parameter. Example:

// rezzer

rez_id++;

string msg = (string)rez_id + ":Some message for this object";

llSetObjectDesc(msg);
llRezObject(... rez_id);


// rezzee

on_rez(integer param)
{
   key parent = llList2Key(llGetObjectDetails(llGetKey(), [OBJECT_REZZER_KEY]), 0);
   list desc = llParseString2List(llList2String(llGetObjectDetails(parent, [OBJECT_DESC]), 0), [":"], []);
   if (param == (integer)llList2String(desc, 0))
   {
      string msg_for_me = llList2String(desc, 1);
   }
   else
   {  // is not for me
      ... do fallback position: open a channel to rezzer ...
   }
}

 

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

@animats & @Nova Convair

"Let's pass a string instead" is kind of out of the scope for this topic/update, though.

Besides, it's an impossible request in its current form because changing or adding a string to on_rez would (almost, if adding) definitely be a breaking change.

Edited by Wulfie Reanimator
  • Like 1

Share this post


Link to post
Share on other sites
12 hours ago, Wulfie Reanimator said:

Besides, it's an impossible request in its current form because changing or adding a string to on_rez would (almost, if adding) definitely be a breaking change.

Changing parameters on an already published API is not really a possibility.  It would absolutely be a breaking change.

Share this post


Link to post
Share on other sites
16 minutes ago, Rider Linden said:

Changing parameters on an already published API is not really a possibility.  It would absolutely be a breaking change.

Right. SL doesn't have the Microsoft tradition of adding a new API endpoint with a suffix, such as GetFileAttribute and GetFileAttributeEx, for a breaking change that has to be backwards compatible. That's one way out of that problem. It's not worth the hassle for this, though.

Share this post


Link to post
Share on other sites
9 hours ago, Rider Linden said:

Changing parameters on an already published API is not really a possibility.  It would absolutely be a breaking change.

 

9 hours ago, animats said:

Right. SL doesn't have the Microsoft tradition of adding a new API endpoint with a suffix, such as GetFileAttribute and GetFileAttributeEx, for a breaking change that has to be backwards compatible. That's one way out of that problem. It's not worth the hassle for this, though.

Not really follow this, but as far as LSL and rezzing is concerned, adding another parameter isn't rocket science.

Rezzer key is the most recent example of data added that never existed before.

It had to be added to prims as a detail, it had to be added to the rez queue to be passed on.

To facilitate a "start_string" param, you would need a new rez function that accepts the input, a change in the rez queue handler to accept the new data to pass on.

In the rezzed object, you would need a function similar to llGetStartParameter(), say llGetStartString().

Is it worth it making this kind of change? Most likely not as, depending on data size, current methods like checking rezzer desc, using chat or using KVP are good enough.

Share this post


Link to post
Share on other sites

Asynchronous tasks. You have to write them both assuming that you know nothing about the state of the other task that it hasn't explicitly told you.

I'd lay odds that just about every beginning coder has written themselves at least one race condition before these rules got properly wired into their heads and got burned by it.

We've a lot of gifted and creative folks in SL who are not professional coders, who did not come to LSL with the mental bruises to teach them why "this is always the way you do it, even if it looks like a simpler way will work." There's bound to be a lot of this out there on the grid, and it is the nature of race conditions that they are usually exposed as performance improves.

That these LSL coders are in this position of having their scripts break under a performance improvement is not LL's fault, but nor would it be fair to say that it is entirely the coders fault either. This is their encounter with that painful lesson that almost all more experienced coders have had in their own history - it is part of the "experience" that causes a 'less experienced coder to become a more experienced one.

It's like triggering an animation when somebody sits on the object "knowing" that animation perms are automatically granted by a seated avatar instead of requesting them and waiting for a run-time-permissions event. Exactly the same race condition.

 

Ultimately, since the performance improvements which expose these race conditions are now critical, I don't see that LL will have any choice but to publicize this as widely as possible for a short time and then bite the bullet and roll the update, letting the chips fall where they may for scripts that subsequently reveal themselves to have been mis-coded. Sadly, this means that where an object's scripter is no longer around or no longer supporting the object, if it contains this kind of error in its scripts that object will likely be junk. We've been there before. It wasn't pretty then and it won't be pretty now, but for all the inactive scripters whose legacy objects fall into obsolescence there will be others who are active and will create objects to fill the void, coding them to avoid this pitfall.

  • Like 1

Share this post


Link to post
Share on other sites

Is there a chance that the update here is making it so that some scripts don't execute at all, or are delayed indefinitely?  I've noticed a bunch of RLV things breaking on Dolly Dreams, which gets fixed with a region restart.  But when things stop working, it's like script roulette.  To be clear, these are scripts without a race condition, but simple things like a menu not popping up when requested by a link message spawned from a touch event, as if the script were not executing at all.  Are you sure this is working as intended, and every script is subscribed to the appropriate listens?

Share this post


Link to post
Share on other sites

Link messages seem to be completely reliable unless you overflow the event queue for a script. The queue for each script has a capacity of 64 messages. Exceed that and you lose link messages and other events, with the newest events being lost. I've tested this, and it really is exactly 64 messages.

Remember that link messages are broadcasts - every script in the prim gets all link messages sent to that prim. So mixing high-traffic messages to fast scripts with low-traffic messages to slow scripts in the same prim can cause trouble. If you delay too long in a script, with llSleep or many calls to built-ins with long delays, you can lose messages. Also, every script that accepts link messages must always have enough free memory for the biggest message sent to its prim by any script, or you will get a stack/heap collision.

Within those constraints, all problems I've had with link messages have been from problems in my own code. I'm running an elaborate NPC system with heavy message traffic which will time out and send error messages if it loses a link message, and it's not reporting such errors.

I'm not sure about listens. They have throttling rules and a filtering system. Anyone know for sure?

Share this post


Link to post
Share on other sites
1 hour ago, Sarah Passerine said:

Is there a chance that the update here is making it so that some scripts don't execute at all, or are delayed indefinitely? 

Seems unlikely. The region you mention is currently pretty heavily script-loaded (6700 scripts with nearly 800 events per second) so of course there's no Spare Time at all and only about 2/3 of scripts run each frame -- and that's with only a few avatars in the sim. It's not the worst ever, but finding ways to shed some of those scripts would probably help. (In recent months we've seen some general flakiness about how efficiently sims run scripts from one restart to another, seemingly at random, but this one seems to be dealing about as well as can be expected with this volume of scripts, and there's not much except scripts competing for frame time.)

Still, it doesn't seem bad enough to cause scripts to be dropping queued events. The update was supposed to include some efficiency improvements in listen (as well and sensor) events so it's possible some timings have changed.

(I notice this specific problem is reported in a jira in case anybody wants to watch that.)

I haven't seen any other recent reports of listen nor link_message bugs, but anything is possible. There is this recent jira related to RLV, but it may have been the same race condition, arising in communications with script-rezzed attachments. That also seems to be the case with this jira. And then there is this recent report of script failures in a particular product (not RLV related and apparently not the rezzed listen race condition but seemingly fixed by the same cure).

Share this post


Link to post
Share on other sites
3 hours ago, Qie Niangao said:

The region you mention is currently pretty heavily script-loaded (6700 scripts with nearly 800 events per second) so of course there's no Spare Time at all and only about 2/3 of scripts run each frame -- and that's with only a few avatars in the sim.

Yes, it's a heavily scripted sim, and that's why it might be a good test bed to show where the new way of handling scripts is not working.  So far, this particular issue has caused my product updated to break nocopy items owned by others due to scripts failing to communicate, and caused general uncertainty with all sorts of gadgets, both worn and rezzed.  

Share this post


Link to post
Share on other sites
12 minutes ago, Sarah Passerine said:

Yes, it's a heavily scripted sim, and that's why it might be a good test bed to show where the new way of handling scripts is not working.  So far, this particular issue has caused my product updated to break nocopy items owned by others due to scripts failing to communicate, and caused general uncertainty with all sorts of gadgets, both worn and rezzed.  

This issue is real but totally unrelated to the new scheduling system. "Scripts don't execute at all, or are delayed indefinitely" is a problem that has always existed in heavy-lag sims and situations. When Scripts Run is below 60-30 %, the problems become very obvious for very obvious reasons.

Share this post


Link to post
Share on other sites
On 10/7/2019 at 8:41 PM, Lucia Nightfire said:

 

Is it worth it making this kind of change? Most likely not as, depending on data size, current methods like checking rezzer desc, using chat or using KVP are good enough.

But wouldn't giving us gridwide experience s let us pass semaphores via the experience data?

I'm working on a project now that would be so much better if I could count on using my experience, but I was not in the beta and so mine is not gridwide. 

Edited by Erwin Solo

Share this post


Link to post
Share on other sites
2 hours ago, Erwin Solo said:

But wouldn't giving us gridwide experience s let us pass semaphores via the experience data?

Not exactly sure what you mean. KVP r/w is asynchronous. Although you technically can read and update a KVP entry IF you don't want/care about confirmation or the possibility of write failure, which can happen for reasons out of your control.

As far as an end-user product reading a static/known grid-scope experience's KVP entry that is altered via the original creator's server, yes, that is a common application.

One of my products queries an update server's url via grid scope KVP read. The entry is changed by the in-world server whenever its url becomes deactivated and has to be reset. This is to avoid having to use other external means or SL's dreaded llEmail() which has its own failures and is typically object UUID locked in a distributed content environment.

Edited by Lucia Nightfire

Share this post


Link to post
Share on other sites

You can use key/value pairs in a locked way using the llUpdateKeyValue function with "checked" set to TRUE. That's a "compare and swap" operation in computer science terms, a way to avoid race conditions in asynchronous operations. When you want to set something, you do an update with what you think the old value is. If some other script changed it while you weren't looking, you get an error back from the data server. Then you can get the current value and try again.

  • Like 2

Share this post


Link to post
Share on other sites

My preferred method is different to the one explained in the blog post.

  • On state_entry, the rezzee sets up a listen for everything, in a preset channel. The object is taken to inventory with this listener active. The on_rez event does NOT reset the script (or there's no on_rez event whatsoever).
  • In the object_rez event, the rezzer sends the data to the rezzee via the same preset channel that the rezzee is listening on.
  • For security, the rezzee's listen event filters out the messages that don't come from the rezzer, by comparing (string)llGetObjectDetails(llGetKey(), (list)OBJECT_REZZER_KEY) with the id in the listener and returning if they don't match.
  • If no more messages are expected, when the message is successfully received the rezzee can now remove the listener. If more messages are expected, the rezzee can optionally set up a new listen with a filter for the key of the rezzer, and then shut down the old listener. The latter step is unnecessary, though: it would merely prevent script execution for messages that don't come from the rezzer.

This method is faster and uses less memory because there is less back-and-forth communication, which is why I prefer it. In particular, it removes the need for a listener in the rezzer in the frequent case of rezzer-to-rezzee communication only. It relies on the fact that the listen queue of the rezzee exists when the rezzer receives the object_rez event, regardless of whether the rezzee has already started executing or not. It works because while the rezzee is waiting for a chance to execute, the communication event is waiting in the queue, so it will eventually be received.

Rezzee:

integer ListenHandle;

default
{
    state_entry()
    {
        ListenHandle = llListen(1234, "", "", "");
    }

    listen(integer chan, string name, key id, string msg)
    {
        if (id != (string)llGetObjectDetails(llGetKey(), (list)OBJECT_REZZER_KEY))
            return;

        llOwnerSay(msg); // or some other message processing here

        llListenRemove(ListenHandle);
    }
}

Rezzer:

default
{
    touch_start(integer n)
    {
        llRezAtRoot(llGetInventoryName(INVENTORY_OBJECT, 0),
            llGetPos()+<0,0,1>, <0,0,0>, <0,0,0,1>, 1);
    }

    object_rez(key id)
    {
        llRegionSayTo(id, 1234, "message");
    }
}

But given that the official method endorsed by LL is different and more complex, I'd like to know if this simplified method is guaranteed to work in future.

Edited by Sei Lisa

Share this post


Link to post
Share on other sites
1 hour ago, Sei Lisa said:

It relies on the fact that the listen queue of the rezzee exists when the rezzer receives the object_rez event, regardless of whether the rezzee has already started executing or not.

I don't think this is guaranteed in future, particularly if they re-work the whole scheduler/event-dispatching approach (to try to reduce the cost of idle scripts). It might, but I'd hate to have a lot of product depending on it.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...