Jump to content

Order of events after a sim restart


animats
 Share

You are about to reply to a thread that has been inactive for 1478 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

Here's something that's been puzzling me, and is now starting to make sense. Every day or two, some of my animesh NPCs report that they've stalled. I get emails for that. This morning I got this:

 

The object 'Debug relay' has sent you a message from Second Life:
Testi v20.14 (AHQ) in trouble at Vallone <237.79840, 27.58077, 36.72852>: bhvscheduler: Stalled and reset. Last command: {"request":"pathbegin","requestid":8019,"target":"00000000-0000-0000-0000-000000000000","regioncorner":"<462592.000000, 306944.000000, 0.000000>","goal":"<236.000000, 14.000000, 36.000000>","stopshort":0.000000,"speed":2.500000}
[07:02:03] (Testi v20.14 (AHQ)) Path maze solver task: Maze solver finished, pathid 16697, seg 2. Free mem: 3376
[07:02:03] (Testi v20.14 (AHQ)) Path maze solver task: Maze route: (18,20) (18,21) (22,21) (22,20)
[07:02:07] (Testi v20.14 (AHQ)) Path maze solver task: Maze solver finished, pathid 16697, seg 4. Free mem: 3376
[07:02:07] (Testi v20.14 (AHQ)) Path maze solver task: Maze route: (18,20) (18,21) (18,20) (18,19) (19,19) (21,19) (21,20) (22,20)
[08:09:37] (Testi v20.14 (AHQ)) Testi v20.14 (AHQ) in trouble at Vallone <237.79840, 27.58077, 36.72852>: bhvscheduler: Stalled and reset. Last command: {"request":"pathbegin","reques...


The object 'Debug relay' has sent you a message from Second Life:
Salli v20.10 (OC) in trouble at Vallone <215.48090, 54.49610, 36.72841>: bhvscheduler: Stalled and reset. Last command: {"request":"pathbegin","requestid":17409,"target":"00000000-0000-0000-0000-000000000000","regioncorner":"<462592.000000, 306944.000000, 0.000000>","goal":"<235.000000, 54.000000, 36.000000>","stopshort":0.000000,"speed":2.500000}
[07:01:52] (Salli v20.10 (OC)) bhvpatrol: Path update: -210 obstacle:
[07:01:52] (Salli v20.10 (OC)) bhvpatrol: Stopped by fixed obstacle:
[07:01:52] (Salli v20.10 (OC)) bhvpatrol: Failed to reach goal, idle. Path update status: -210
[07:02:02] (Salli v20.10 (OC)) bhvpatrol: Patrol to <235.00000, 54.00000, 36.00000>
[07:02:07] (Salli v20.10 (OC)) Path plan task: No walkable during planning at <216.23090, 54.53387, 35.46311>
[08:09:37] (Salli v20.10 (OC)) Salli v20.10 (OC) in trouble at Vallone <215.48090, 54.49610, 36.72841>: bhvscheduler: Stalled and reset. Last command: {"request":"pathbegin","request...



The object 'Debug relay' has sent you a message from Second Life:
Kathi  v20.14 (OC) in trouble at Vallone <235.03360, 54.00143, 36.73669>: bhvscheduler: Stalled and reset. Last command: {"request":"pathbegin","requestid":19109,"target":"00000000-0000-0000-0000-000000000000","regioncorner":"<462592.000000, 306944.000000, 0.000000>","goal":"<206.000000, 54.000000, 36.000000>","stopshort":0.000000,"speed":2.500000}
[07:01:33] (Kathi  v20.14 (OC)) Path move task: Stopped by obstacle Salli v20.10 (OC) status: -2007
[07:01:33] (Kathi  v20.14 (OC)) Path start task: Retrying. Distance 21.485930m. Retry 1
[07:01:59] (Kathi  v20.14 (OC)) bhvpatrol: Path update: 0 obstacle:
[07:02:00] (Kathi  v20.14 (OC)) bhvpatrol: Path update: 0 obstacle:
[07:02:10] (Kathi  v20.14 (OC)) bhvpatrol: Patrol to <206.00000, 54.00000, 36.00000>
[08:09:37] (Kathi  v20.14 (OC)) Kathi  v20.14 (OC) in trouble at Vallone <235.03360, 54.00143, 36.73669>: bhvscheduler: Stalled and reset. Last command: {"request":"pathbegin","reques...



 

This is my own debug output, so it won't tell readers much. The important part is the timestamps. Each NPC has a debug log of recent events in a circular buffer, and when something goes wrong, the NPC phones home and sends the log . Three different NPCs reported that they timed out at 08:09:37 this morning, without anything going wrong before that. Since the NPCs are independent of each other, that probably means something happened to the region. The log shows everything going along normally around 7:02 AM. That message comes from a timer event. The timer event checks how long the current operation was in progress, and if it started more than 10 minutes ago, measured by "Unix time" of day, an error is reported and the script resets.

 

Second Life's status page says:

Scheduled - We will be performing rolling restarts for regions on the RC Channels on Wednesday, April 8th beginning at 7:00 AM PST.

So there was a restart around that time. With about an hour of downtime, apparently.

The NPC code checks for region restarts:

    changed(integer change)
    {
        if (change & CHANGED_REGION_START)                      // region restart
        {   llResetScript(); }                                  // reset the NPC completely
    }

I thought that would prevent getting bogus timeout messages after each region restart. But it doesn't. If the changed event reports a region restart before the timer event came in, there would be no error message. If the timer event comes in first, I get a false "stalled" error. Looks like a race condition between the timer event and the region restart event.

Added appropriate code to detect this situation and distinguish it from a real stall. I may be getting some real stalls for other reasons, so I need to filter out the bogus reports caused by ordinary region restarts.

 

 

Link to comment
Share on other sites

A queued timer event can most certainly trigger before a changed event.

This is why in apps such as my "Region Start & Environment Reader" I check for a change in region start time in the timer event only.

There is also a very rare case where the changed event won't trigger with region restarts.

I use CHANGED_REGION_START in the changed event to stop and restart the timer in case SVC-8165 ever happens again.

 

Link to comment
Share on other sites

There is no defined ordering between the changed event and the timer event; they can happen in either order.

I recently made a small script specifically to record how long a region is inactive during a restart and ran into this same issue. I solved it by recording the last restart time in a global variable from the changed event handler:

lastRestartTime = (integer)llGetEnv("region_start_time");

and then in my timer event handler I ignore timer events if the last restart time doesn't match:

    timer()
    {
        // when we're coming back up, the timer may fire before the changed event
        // so don't record the tick if the last restart time doesn't match
        // this will be fixed in the changed event handler
        if (lastRestartTime == getRestartTime())
        {
            recordLastTick();
        }
    }


 

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 1478 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...