Jump to content

Obscure question: when does the simulator send EstablishAgentCommunication to the viewer?


animats
 Share

You are about to reply to a thread that has been inactive for 68 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

OK, more boring stuff to finish out the year.

Here's an annotated log of several successful double region crossings, and one failed one. This is from Sharpview.

 

====> Beginning of successful double region crossing <====
====> Thes are all at the same corner.
====>  [Bronlen] (462848,306688)
====>  [Charlesville] (462848,306944)
====>  [Vallone] (462592,306944)
====>  [Kama Center] (462592,306688)
====> This first one is from Vallone to Bronlen via Charlesville.

02:49:40 [WARN] From (462592,306944), event CrossedRegion: MsgServerEvent { region_handle: (462592,306944), event: CrossedRegion(EventCrossedRegion { look_at: <0.392944,-0.919562,0>, position: <1.00946,0.236431,35.6884>, socket_addr: 34.222.128.86:12035, seed_capability: "https://simhost-05f87cd2d2a7b0b7a.agni.secondlife.io:12043/cap/be1edc3a-5ba8-4b50-7ffc-b75380dcec9c", region_box: RegionBox { region_handle: (462848,306944), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:49:40 [WARN] CrossedRegion: leaving region (462592,306944), entering live region [Charlesville] (462848,306944)
02:49:40 [WARN] CrossedRegion from (462592,306944) to (462848,306944), distance <256,0,0>
02:49:40 [WARN] CrossedRegion from [Vallone]
====> Vallone sim has handed over control of the avatar to Charlesville. Fine.
====> These errors are some late object updates straggling in from Vallone. That seems to be normal. Those are ignored.
02:49:40 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462848,306944)
02:49:40 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462848,306944)
02:49:40 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462848,306944)
02:49:40 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462848,306944)
02:49:40 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462848,306944)

====> Here are some seemingly bogus EnableSimulator messages. This is Charlesville telling the viewer to start talking to Bronlen and Kama Center.
====> The viewer has been talking to all four regions for several minutes at this point, so this is redundant. Harmless, though.

02:49:41 [ERROR] Event error: Duplicate add of region (462848,306688), event MsgServerEvent { region_handle: (462848,306944), event: EnableSimulator(EventEnableSimulator { region_box: RegionBox { region_handle: (462848,306688), size: [256, 256] }, socket_addr: 35.87.129.151:13013 }) }
02:49:41 [ERROR] Event error: Duplicate add of region (462592,306688), event MsgServerEvent { region_handle: (462848,306944), event: EnableSimulator(EventEnableSimulator { region_box: RegionBox { region_handle: (462592,306688), size: [256, 256] }, socket_addr: 34.215.138.142:13007 }) }
02:49:42 [WARN] From (462848,306944), event CrossedRegion: MsgServerEvent { region_handle: (462848,306944), event: CrossedRegion(EventCrossedRegion { look_at: <-0.332161,0.943223,0>, position: <4.8429,254.94,35.6402>, socket_addr: 35.87.129.151:13013, seed_capability: "https://simhost-043e6020eb4baf499.agni.secondlife.io:12043/cap/60ea19c3-0600-310b-3f10-56c9e008f05a", region_box: RegionBox { region_handle: (462848,306688), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:49:42 [WARN] CrossedRegion: leaving region (462848,306944), entering live region [Bronlen] (462848,306688)
02:49:42 [WARN] CrossedRegion from (462848,306944) to (462848,306688), distance <0,-256,0>
02:49:42 [WARN] CrossedRegion from [Charlesville]
02:49:42 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462848,306688)
02:49:43 [ERROR] Event error: Duplicate add of region (462592,306688), event MsgServerEvent { region_handle: (462848,306688), event: EnableSimulator(EventEnableSimulator { region_box: RegionBox { region_handle: (462592,306688), size: [256, 256] }, socket_addr: 34.215.138.142:13007 }) }


=====> Here's another successful double region crossing - Bronlen to Vallone via Charlesville, the reverse of the previous move.
=====> This is a motorcycle with avatar going forwards and backwards right across the region corner.
02:49:58 [WARN] From (462848,306688), event CrossedRegion: MsgServerEvent { region_handle: (462848,306688), event: CrossedRegion(EventCrossedRegion { look_at: <-0.504536,0.863391,0>, position: <0.915858,1.00684,35.6747>, socket_addr: 34.222.128.86:12035, seed_capability: "https://simhost-05f87cd2d2a7b0b7a.agni.secondlife.io:12043/cap/be1edc3a-5ba8-4b50-7ffc-b75380dcec9c", region_box: RegionBox { region_handle: (462848,306944), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:49:58 [WARN] CrossedRegion: leaving region (462848,306688), entering live region [Charlesville] (462848,306944)
02:49:58 [WARN] CrossedRegion from (462848,306688) to (462848,306944), distance <0,256,0>
02:49:58 [WARN] CrossedRegion from [Bronlen]
02:49:58 [ERROR] Main agent object is in region (462848,306688) but current CrossedRegion region is (462848,306944)
02:49:59 [WARN] From (462848,306944), event CrossedRegion: MsgServerEvent { region_handle: (462848,306944), event: CrossedRegion(EventCrossedRegion { look_at: <-0.504536,0.863391,0>, position: <254.984,2.26188,35.6209>, socket_addr: 34.212.163.17:13019, seed_capability: "https://simhost-04148d0076d95659a.agni.secondlife.io:12043/cap/2de81e9e-b7a6-53aa-66c8-f6bd753aa6c2", region_box: RegionBox { region_handle: (462592,306944), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:49:59 [WARN] CrossedRegion: leaving region (462848,306944), entering live region [Vallone] (462592,306944)
02:49:59 [WARN] CrossedRegion from (462848,306944) to (462592,306944), distance <-256,0,0>
02:49:59 [WARN] CrossedRegion from [Charlesville]
02:49:59 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462592,306944)
02:49:59 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462592,306944)
02:49:59 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462592,306944)
02:49:59 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462592,306944)


====> And another successful double crossing, Vallone to Bronlen  via Charlesville.====
02:50:21 [WARN] From (462592,306944), event CrossedRegion: MsgServerEvent { region_handle: (462592,306944), event: CrossedRegion(EventCrossedRegion { look_at: <-0.0494177,0.998778,0>, position: <1.00549,0,35.6919>, socket_addr: 34.222.128.86:12035, seed_capability: "https://simhost-05f87cd2d2a7b0b7a.agni.secondlife.io:12043/cap/be1edc3a-5ba8-4b50-7ffc-b75380dcec9c", region_box: RegionBox { region_handle: (462848,306944), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:50:21 [WARN] CrossedRegion: leaving region (462592,306944), entering live region [Charlesville] (462848,306944)
02:50:21 [WARN] CrossedRegion from (462592,306944) to (462848,306944), distance <256,0,0>
02:50:21 [WARN] CrossedRegion from [Vallone]
02:50:21 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462848,306944)
...
02:50:22 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462848,306944)
02:50:28 [WARN] From (462848,306944), event CrossedRegion: MsgServerEvent { region_handle: (462848,306944), event: CrossedRegion(EventCrossedRegion { look_at: <-0.0494177,0.998778,0>, position: <2.10215,254.998,35.6366>, socket_addr: 35.87.129.151:13013, seed_capability: "https://simhost-043e6020eb4baf499.agni.secondlife.io:12043/cap/60ea19c3-0600-310b-3f10-56c9e008f05a", region_box: RegionBox { region_handle: (462848,306688), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:50:28 [WARN] CrossedRegion: leaving region (462848,306944), entering live region [Bronlen] (462848,306688)
02:50:28 [WARN] CrossedRegion from (462848,306944) to (462848,306688), distance <0,-256,0>
02:50:28 [WARN] CrossedRegion from [Charlesville]
02:50:28 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462848,306688)
...
02:50:29 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462848,306688)

====> And another successful double region crossing, back from Bronlen to Vallone via Charlesville
02:51:08 [WARN] From (462848,306688), event CrossedRegion: MsgServerEvent { region_handle: (462848,306688), event: CrossedRegion(EventCrossedRegion { look_at: <-0.744412,-0.667721,0>, position: <0,1.00766,35.6432>, socket_addr: 34.222.128.86:12035, seed_capability: "https://simhost-05f87cd2d2a7b0b7a.agni.secondlife.io:12043/cap/be1edc3a-5ba8-4b50-7ffc-b75380dcec9c", region_box: RegionBox { region_handle: (462848,306944), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:51:08 [WARN] CrossedRegion: leaving region (462848,306688), entering live region [Charlesville] (462848,306944)
02:51:08 [WARN] CrossedRegion from (462848,306688) to (462848,306944), distance <0,256,0>
02:51:08 [WARN] CrossedRegion from [Bronlen]
02:51:08 [ERROR] Main agent object is in region (462848,306688) but current CrossedRegion region is (462848,306944)
02:51:08 [ERROR] Main agent object is in region (462848,306688) but current CrossedRegion region is (462848,306944)
02:51:08 [ERROR] Main agent object is in region (462848,306688) but current CrossedRegion region is (462848,306944)
02:51:14 [WARN] From (462848,306944), event CrossedRegion: MsgServerEvent { region_handle: (462848,306944), event: CrossedRegion(EventCrossedRegion { look_at: <-0.744412,-0.667721,0>, position: <254.998,2.55342,35.6153>, socket_addr: 34.212.163.17:13019, seed_capability: "https://simhost-04148d0076d95659a.agni.secondlife.io:12043/cap/2de81e9e-b7a6-53aa-66c8-f6bd753aa6c2", region_box: RegionBox { region_handle: (462592,306944), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:51:14 [WARN] CrossedRegion: leaving region (462848,306944), entering live region [Vallone] (462592,306944)
02:51:14 [WARN] CrossedRegion from (462848,306944) to (462592,306944), distance <-256,0,0>
02:51:14 [WARN] CrossedRegion from [Charlesville]
02:51:14 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462592,306944)
02:51:14 [ERROR] Main agent object is in region (462848,306944) but current CrossedRegion region is (462592,306944)


====> Beginning of a failed double region crossing <====
====> This time, though, we start in Vallone and go through Kama Center, rather than Charlesville.  Stuck in Kama Center. Bronlen not talking. <====
====> Many unnecessary EnableSimulator messages.
====> Need to check if the server is trying to change the IP address or UDP port.
====> These messages are from Vallone, telling the viewer about Charlesville and Kama Center, which are neighbors of Vallone; they share an edge.
02:52:15 [ERROR] Event error: Duplicate add of region (462848,306944), event MsgServerEvent { region_handle: (462592,306944), event: EnableSimulator(EventEnableSimulator { region_box: RegionBox { region_handle: (462848,306944), size: [256, 256] }, socket_addr: 34.222.128.86:12035 }) }
02:52:15 [ERROR] Event error: Duplicate add of region (462592,306688), event MsgServerEvent { region_handle: (462592,306944), event: EnableSimulator(EventEnableSimulator { region_box: RegionBox { region_handle: (462592,306688), size: [256, 256] }, socket_addr: 34.215.138.142:13007 }) }
02:52:20 [WARN] From (462592,306944), event CrossedRegion: MsgServerEvent { region_handle: (462592,306944), event: CrossedRegion(EventCrossedRegion { look_at: <0.0256889,0.99967,0>, position: <255,254.989,35.6911>, socket_addr: 34.215.138.142:13007, seed_capability: "https://simhost-0e6e8ee1f4104630e.agni.secondlife.io:12043/cap/6ec6de3f-73fd-03a5-fe35-f73f1cc09a23", region_box: RegionBox { region_handle: (462592,306688), size: [256, 256] }, session_id: a0f40a7c-a5f3-11ee-9453-0242ac110003, agent_id: b5fde908-eebf-4505-b686-ca74c3f91979 }) }
02:52:20 [WARN] CrossedRegion: leaving region (462592,306944), entering live region [Kama Center] (462592,306688)
02:52:20 [WARN] CrossedRegion from (462592,306944) to (462592,306688), distance <0,-256,0>
02:52:20 [WARN] CrossedRegion from [Vallone]
02:52:21 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462592,306688)
02:52:21 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462592,306688)
02:52:21 [ERROR] Main agent object is in region (462592,306944) but current CrossedRegion region is (462592,306688)
====> Got to Kama Center OK. Then stuck.
====> Now this is Kama Center telling the viewer about its neighbors, Bronlen and Vallone.
====> Same UDP port as previously used for Bronlen, so the UDP connection is not being restarted.
02:52:22 [ERROR] Event error: Duplicate add of region (462848,306688), event MsgServerEvent { region_handle: (462592,306688), event: EnableSimulator(EventEnableSimulator { region_box: RegionBox { region_handle: (462848,306688), size: [256, 256] }, socket_addr: 35.87.129.151:13013 }) }
02:52:22 [ERROR] Event error: Duplicate add of region (462848,306944), event MsgServerEvent { region_handle: (462592,306688), event: EnableSimulator(EventEnableSimulator { region_box: RegionBox { region_handle: (462848,306944), size: [256, 256] }, socket_addr: 34.222.128.86:12035 }) }

====> First crossing was successful, second one never started.

The main thing to note about this is that nothing unusual happened. It just stopped working. Notes:

  • The second crossing at a region corner just never happened, according to the info the viewer sees. All the messages for the first crossing are correct and on the happy path. There's just nothing about a second crossing. To the viewer, it looks like the avatar just never left the second region for the third.
  • There's some "noise". There are unnecessary EnableSimulator events. Sometimes those show up on successful crossings. Sometimes they don't show up. There are also unnecessary RegionHandshake messages. It's not in this log, but there are also unnecessary KillObject messages for objects leaving a sim. Sometimes those are present, sometimes not. Viewers discard all this junk routinely. Henri Beauchamp reports seeing this stuff, too.
  • The "noise" suggests imperfect synchronization between neighboring simulators. We can't tell from the outside, but this may be a symptom of some inter-simulator communication issue. That's speculative. However, it's something concrete that the simulator-side devs could look at. When we all looked hard at EstablishAgentCommunication problems, that turned up things that needed to be fixed. This may be similar. I have a vague suspicion that there may be some kind of low-level inter-simulator timeout or reset which causes some loss of shared state, and those excess messages are side effects of recovery from that. But that's speculation on my part.
  • Sharpview, Firestorm, and Cool VL all log similar message patterns. I've looked at Firestorm logs in the past, and Henri Beauchamp has looked at Cool VL logs. Nothing seems to be wrong. Successful crossings and failed crossings look similar right up to where progress stops.
  • Viewer <-> Sim protocol for all this seems sound. The lockstep handoff for ChangedRegion/CompleteAgentMovement works fine. The state change for RegionHandshake/RegionHandshakeReply works fine. The system by which objects move from region to region works fine. There's a brief moment of confusion because control handoff (who has the keyboard and mouse?) and object movement (where is the avatar?) are not synchronized, but those fall into sync in a second or so.The architects of these parts did a good job.

I was hoping to find something clearly wrong that could be fixed or worked around viewer side. Or even just see different behavior in  Sharpview than what the C++ viewers provide, which would give us some ideas to follow up. Nope.

What's probably needed to fix this is to find four regions on the beta grid where there's a clear corner and very little activity. Then log everything the simulators do while an avatar on a vehicle is moved back and forth across the corner. Then look over the logs in great detail. Somewhere in the sim to sim traffic just before a stall, there will be something of interest.

It's not simulator to viewer.

 

Link to comment
Share on other sites

1 hour ago, animats said:

What's probably needed to fix this is to find four regions on the beta grid where there's a clear corner and very little activity. Then log everything the simulators do while an avatar on a vehicle is moved back and forth across the corner. Then look over the logs in great detail. Somewhere in the sim to sim traffic just before a stall, there will be something of interest.

Ideally, you'd need four neighbouring regions in a square, empty of any object, for a start: then you could try and compare your viewer logs with the Cool VL Viewer's with the proper DEBUG tags activated: you won't have all the objects updates spam then, and would only see those related with your own avatar.

Then, you could see what message is missing from your viewer, or what race condition happens...

Link to comment
Share on other sites

1 minute ago, Henri Beauchamp said:

Ideally, you'd need four neighbouring regions in a square, empty of any object, for a start: then you could try and compare your viewer logs with the Cool VL Viewer's with the proper DEBUG tags activated: you won't have all the objects updates spam then, and would only see those related with your own avatar.

Then, you could see what message is missing from your viewer, or what race condition happens...

I don't think we'll see this in the viewer traffic. This looks like a sim to sim problem, which we, as users, cannot see.

  • Like 1
Link to comment
Share on other sites

5 minutes ago, animats said:

I don't think we'll see this in the viewer traffic. This looks like a sim to sim problem, which we, as users, cannot see.

I'm pretty sure it's a missing viewer message or bad timing/order in message exchanges, on the contrary...

Unless Monty is pranking you and coded some specific server bogus reply when it detects Sharpview. 🤣

Edited by Henri Beauchamp
Link to comment
Share on other sites

Just now, Henri Beauchamp said:

I'm pretty sure it's a missing viewer message or bad timing in message exchanges, on the contrary...

Huh. What are you thinking is missing or mis-timed?

For anyone who is really, really bored, and knows how to read viewer logs, here's Sharpview's log of a few successful double region crossings followed by a failed crossing. Formats are different, but it's not that different from what you'd get in an LL or TPV log.

Notes:

  • Lines beginning "From:" are traffic from sim to viewer. "Message" means it was a UDP message. Just the message type is printed, because many of those are big. "Event" means it came in via an event message poller. The two numbers in parentheses are the location of the sim in-world that sent the message.
  • Lines beginning "To:" are UDP traffic from viewer to sim. All the data is displayed.
  • Look for CrossedRegion to see activity around region crossings. In the vicinity of those events, the other activity that makes a region crossing go will show up.
  • Also look for RegionHandshake and EnableSimulator messages. All four regions at the corner have been live for minutes at this point, so there's no good reason for those to have been sent to the viewer.
  • This is a viewer under development, and you'll see messages about message processing not yet implemented.

Yes, I know this is all terminally boring, but it's how stuff gets fixed.

(I'm still getting over some non-COVID flu-like virus, I'm not going anywhere, and it's rained a lot lately.)

  • Like 1
Link to comment
Share on other sites

1 hour ago, animats said:

Huh. What are you thinking is missing or mis-timed?

This is a pure logic deduction, I'm afraid... And I see no other possible reason for such an issue.

The problem is that there is too much ”noise” going on when experimenting in a full fledged environment (with sims full of objects) to detect the issue, thus my suggestion for starting with empty sims and see what happens then.

Edited by Henri Beauchamp
Link to comment
Share on other sites

10 hours ago, Henri Beauchamp said:

This is a pure logic deduction, I'm afraid... And I see no other possible reason for such an issue.

That's what I was hoping to find in all those logs - some message missing or wrong. But it doesn't look that way. A failed double region crossing looks like one successful crossing followed by the second crossing not starting at all.

10 hours ago, Henri Beauchamp said:

The problem is that there is too much ”noise” going on when experimenting in a full fledged environment (with sims full of objects) to detect the issue, thus my suggestion for starting with empty sims and see what happens then.

Yes. This is going to require tests in quiet sims on the beta grid with LL's internal sim to sim logging turned on. But we know what to look for now - viewer sees first crossing went fine, second crossing never started.

  • Like 1
Link to comment
Share on other sites

I've added more logging in Sharpview, and I can now see region crossing failures in some detail.

This is all about double region crossings - single crossings are working fine. Double region crossings are working most of the time.

I'm seeing two failure modes so far.

First, there's the simulator explicitly kicking an avatar off a vehicle. This I log as:

19:51:34 [WARN] Region crossing: agent object parenting change from Some(264974690) to None in region [Bronlen] at Place { position_relative: <1.0641974,254.99876,37.0042>, rotation_relative: <-0,-0,-0.97578573,0.21872917> }

This says that an ObjectUpdate came in for the the avatar with no parent (local id = 0), when the avatar previously had a parent. This is an unsit - the avatar was sitting, and now it's not. A simulator initiated this. TPVs can look for this and log it. A normal "stand" would also generate this message, but that's not what's happening here. This does not happen, even temporarily, in successful region crossings. So this is an explicit wrong message from a sim.

This is something LL could check for simulator-side. If an avatar shows this kind of unsit, and it's on top of a region corner, the recent history of the simulator should show something wrong. So, knowing this, the bug is easier to find. I have more logging info if anyone needs it.

Second, there's another kind of error.

04:42:57 [WARN] Region crossing out of sync: agent object is in (462592,306944) but control is in (462848,306688)
04:42:57 [WARN] Region crossing out of sync: agent object is in (462592,306944) but control is in (462848,306688)
04:42:57 [WARN] Region crossing out of sync: agent object is in (462592,306944) but control is in (462848,306688)
04:42:58 [WARN] Region crossing out of sync: agent object is in (462592,306944) but control is in (462848,306688)
04:42:58 [WARN] Region crossing out of sync: agent object is in (462592,306944) but control is in (462848,306688)

There are three moving parts in a vehicle region crossing, as seen by the viewer. (Sim side has more moving parts, but we can't see them in a viewer.) The moving parts are the avatar object (the "main agent"), which is what gets drawn on screen, the vehicle (which is itself an object), and control (who's getting the keyboard and mouse input.). Those all change regions separately, and not all at once. The error above shows that control was passed to a new region but the avatar object didn't make the trip. It's normal to see that once or twice in a region crossing, but this time it got stuck.

This is also something both a viewer and the simulator can detect as an error. If that out of sync condition persists, something has gone wrong.

This is not a bug fix. This is not even the beginning of a bug fix. But it is, perhaps, the end of the beginning of looking for something concrete to fix. This problem is moving from mysterious and un-knowable toward becoming understandable.

On 12/30/2023 at 11:41 PM, Henri Beauchamp said:

I'm pretty sure it's a missing viewer message or bad timing/order in message exchanges, on the contrary...

Henri was right. We can now see something explicitly going wrong, not just an absence of activity.

  • Thanks 2
Link to comment
Share on other sites

5 hours ago, animats said:

First, there's the simulator explicitly kicking an avatar off a vehicle.

Oh dear...

OK, imagine 3 regions, Sim A, the source region, Sim B, to the East of Sim A, and Sim C, South of Sim B.

 

Standard Region Crossing:

The Vehicular Idiot, sits ion their vehicle, aims it at Sim B, and stamps on the loud pedal.

Vehicle hits Sim A/Sim B boundary

Sim A detaches the Idiot from the vehicle

Sim A tells Sim B to rez a copy of the vehicle and set it in motion

Sim A the deletes original copy

Sim A TELEPORTS the idiot across the border

Sim B tries to resit the Idiot on the vehicle

 

 Assuming the vehicle wasn't moving so fast that it's travelled more than 64 m before the Idiot arrives on Sim B, this should "just work".

 

Corner Double Region Crossing.

The Vehicular Idiot, sits ion their vehicle, aims it at Sim B, and stamps on the loud pedal.

Vehicle hits Sim A/Sim B boundary

Sim A detaches the Idiot from the vehicle

Sim A tells Sim B to rez a copy of the vehicle and set it in motion

Sim A deletes the original copy

The vehicle hits the Sim B/Sim C boundary

Sim B tells Sim C to rez a copy of the vehicle and set it in motion

Sim B deletes it's copy

Sim A TELEPORTS the idiot across the border

Sim B tries to resit the Idiot on the vehicle

THERE IS NO VEHICLE! Reset FAILED, the Idiot NEVER hit's the Sim B/Sim C boundary on their vehicle because there is NO vehicle. The Idiot is formally announced as being Un-sat.

Sim C says "Hey guys, what the fork am I supposed to do with this moving piece of vehicle shaped prim litter?"

 

None of the Regions involved can sit an Idiot on a vehicle that only exists on ANOTHER Region. Drive too close to a corner, too fast, and there's no vehicle for you to be re-sat on, because it's crossed another boundary without you.

 

Hence the classic SL Vehicle User Advice. Don't drive too fast, don't cross too close to a corner, NEVER cross corners FAST.

 

Link to comment
Share on other sites

Posted (edited)

That's not too far off. What we can't see as users is the sim to sim traffic. Looking at viewer traffic alone yields a blind-men-and-the elephant problem.

When the vehicle arrives in Sim C, it should be carrying with it enough information to know what avatars are supposed to be on board. It's not clear if avatars are pushed (Sim B sends them to sim C) or pulled (Sim C asks sim B to send them over.) If they are pushed, maybe a back-up pull mechanism is needed. Sim C to Sim B - I have the vehicle, send me Vehicular Idiot. Sim B might have to wait for Vehicular Idiot to fully arrive from A before starting the transfer.

I've been talking to the Other Simulator devs about how they do this, and doing some testing over there. They have set up some test regions. Maybe we can get this fixed for the Other Simulator, where we can see all the logs.

Edited by animats
Link to comment
Share on other sites

  • Lindens

Duplicated (or worse?) RegionHandshake messages came up in one of my deep digs earlier in this thread.  Found one source for them.  Code appears to be very deliberate in the addition but no one knows the explanation now.  And it wasn't documented.  Smells like simple carelessness or whitewashing of something not understood (two messages are better than no messages so...).

As you've shown, viewer and simulator restoring synchronization after an RC/TP takes time (it's a protocol exchange).  So there's always a race scenario where queued or stale or retransmitted data becomes invalid or irrelevant after the full exchange.  And the problem is worse when it's a 3-body problem (one viewer, two simulators).  All participants must be prepared to detect and reject these cases.

That situation exists between simulators as well.  The transition milestones represented by the messages between viewer and simulators aren't the only transitions behind the firewall.  For example, an avatar or vehicle making a crossing isn't fully functional until sometime after the destination simulator declares the move complete.  This internal transition can come before or long after the viewer has acknowledged the movement.  In the meantime, such an object is free to initiate *new* transitions such as an RC/TP or even just a logout (which requires updates to inventory).  This is part of the reason why 4-body transitions (one viewer, three regions) are so fraught.

Viewers do experience these problems in other ways and DMs are one such area.  A version of avatar presence is maintained as global state outside of the simulators.  One use of this data is to direct DMs to the correct simulator for viewer uptake.  But that service is subject to all the delay, ordering, retry, and other protocol issues of any other distributed system.  It's notion of presence can be stale or very wrong in certain cases.  So each message in a DM stream takes that information as a hint and proceeds to hunt down the actual avatar location according to simulators.  But that is also a protocol exchange and doesn't necessarily produce a correct result when transitions are in-flight.  In this case, the IM system gives up and tosses the message into persistent storage for later recovery.  And that's one way messaging does really, really surprising things.

 

  • Thanks 1
Link to comment
Share on other sites

3 hours ago, Monty Linden said:

That situation exists between simulators as well.  The transition milestones represented by the messages between viewer and simulators aren't the only transitions behind the firewall.  For example, an avatar or vehicle making a crossing isn't fully functional until sometime after the destination simulator declares the move complete.  This internal transition can come before or long after the viewer has acknowledged the movement.  In the meantime, such an object is free to initiate *new* transitions such as an RC/TP or even just a logout (which requires updates to inventory).

That's what I inferred, from what I could see viewer side; here is a quote of an IM to @animats about it:

Quote

[02/01/2024 13:02:15 PST] Henri Beauchamp: That's what I said: the issue is likely that the (pseudo) handshake between sim 1 and viewer and viewer and sim 2 is raced by sim 1 to sim 2 messages...

 

3 hours ago, Monty Linden said:

This is part of the reason why 4-body transitions (one viewer, three regions) are so fraught.

Perhaps we would need an ”arbitrator server” (a role that might be playable by the presence server, or whatever you call it): the latter would receive a copy of all the traffic during simulators handovers of an avatar/viewer, and when one of the involved parties is left in the blue (and gets a timeout waiting for a message), it would report to the arbitrator the step it got stuck at, and the latter would ”kick” the simulator(s) into resending the appropriate data to the appropriate party(ies)...

This would allow to keep the current protocol (to avoid breaking legacy viewers), while allowing updated servers and viewers alike to use a ”panic” mode where they ask for help to the arbitrator when left in the blue.

 

Alternatively, a new protocol with a robust handshake could be implemented. In this new protocol, you would have new sets of messages and each message is acknowledged by another ”cando” (or ”no go”, e.g when there is a ban wall in the way or a restarting sim) or ”done” message to the same pole. Example:

  1. Sim A detects that avatar is about to cross border or got a TP request to sim B. It sends a ”connect to sim B” order to viewer.
  2. Sim A sends ”take over this avatar” message with serialized data to sim B.
  3. Viewer replies ”cando” to sim A and executes order, connecting to sim B.
  4. Sim B sends ”cando” to sim A and deserializes the data.
  5. Sim B replies to viewer (after the avatar data is deserialized) with message ”got your data”.
  6. Viewer sends message ”connection done” to Sim A after it got ”got your data” from sim B.
  7. Sim A replies with ”OK, goodbye” to viewer.
  8. Viewer sends ”ready message” to Sim B (and drops comms with sim A when needed).
  9. Sim B proceeds with objects updates.

If any message is lost, it is then easy to resend it, since we have a fixed operations order and appropriate handshake.

 

Also, when vehicles are crossing sim corners very close to the corner (like in a 3m circle around the corner, or rather a radius depending on the vehicle speed), there should be an algorithm in sim servers to send the vehicle and avatar data directly to the opposite sim instead of sending it to an adjacent sim which will itself have to handover that object/avatar to the final destination sim only a couple seconds or less later (a 2 seconds stay in the intermediate sim is usually safe, but below this duration, your risks to fail the crossing increases exponentially). All it would take is to extrapolate based on the vehicle speed vector and see where the vehicle is going to ”land” in the end... This would avoid many double-sim-crossing failures with the current protocol unchanged.

Edited by Henri Beauchamp
  • Like 2
Link to comment
Share on other sites

5 hours ago, Monty Linden said:

Duplicated (or worse?) RegionHandshake messages came up in one of my deep digs earlier in this thread.  Found one source for them.  Code appears to be very deliberate in the addition but no one knows the explanation now.

A missing RegionHandshake message means a region won't appear in the viewer. So one can see how a fix like that might go in. Worse, from a sim-side perspective, if a viewer fails to send RegionHandshakeReply to the viewer (an early version of Sharpview did that), it leaves the avatar sim-side in a bad state. On the Other Simulator, that lasts until sim restart, and it takes two logins to clear. SL seems to time out after a while. Sims really need that RegionHandshakeReply. If there's some way a RegionHandshakeReply could be lost or not be processed sim-side, that might justify sending extra RegionHandshake messages to get things unstuck.

There's still that 10 seconds / 1 minute stall around EstablishAgentCommunication. That's the next handshake that has to complete to get a region talking to the viewer. Amusingly, entering the region with the avatar in Sharpview will consistently generate an EstablishAgentCommunication message from the simulator quickly. I do this all the time at the Morris corner on the beta grid to get all four regions to go live quickly. Looks awful, but consistently works for testing. So, whatever isn't sending EstablishAgentCommunication is ready to send it.

Link to comment
Share on other sites

6 hours ago, Monty Linden said:

That situation exists between simulators as well.  The transition milestones represented by the messages between viewer and simulators aren't the only transitions behind the firewall.  For example, an avatar or vehicle making a crossing isn't fully functional until sometime after the destination simulator declares the move complete.  This internal transition can come before or long after the viewer has acknowledged the movement.  In the meantime, such an object is free to initiate *new* transitions such as an RC/TP or even just a logout (which requires updates to inventory).  This is part of the reason why 4-body transitions (one viewer, three regions) are so fraught.

Yes. It's a hard problem.

The worst case I've seen in practice was when a busload of furries tried to do a Drivers of SL drive. At each region crossing, they'd lose one or two passengers. The passengers would then try to teleport and fly to the bus and get back on. That was with ordinary single region crossings.

One of the best arguments for fixing this is that it makes group social activities work better. Boating, group trips, etc. The pew-pew crowd gets very unhappy when they fall out of a tank or aircraft.

It's interesting that single crossings with multiple avatars on a vehicle fail badly. It's not clear why they should. Single crossings with single avatars on a vehicle are reasonably reliable. We know there's a transient overload on a sim when avatars enter and leave; every major sales event suffers from that. Is some response to that overload leaving something un-done? That might be worth investigating.

I've thought about viewer-side workarounds, such as detecting stuck avatars in the viewer, then asking for a teleport to the vehicle location followed by a sit. I don't really want to implement that, but it's an option.

  • Like 1
Link to comment
Share on other sites

1 hour ago, animats said:

There's still that 10 seconds / 1 minute stall around EstablishAgentCommunication. That's the next handshake that has to complete to get a region talking to the viewer. Amusingly, entering the region with the avatar in Sharpview will consistently generate an EstablishAgentCommunication message from the simulator quickly. I do this all the time at the Morris corner on the beta grid to get all four regions to go live quickly. Looks awful, but consistently works for testing. So, whatever isn't sending EstablishAgentCommunication is ready to send it.

Normally, EstablishAgentCommunication for a neighbouring sim is sent when your avatar camera FOV crosses that region's border. Maybe your viewer is not sending the data (agent position, draw distance, camera axis) the agent region is expecting to send the EstablishAgentCommunication for its neighbours...

On login, you could try and send two sets of data: agent position at the center of the sim, draw distance at 384m or so, camera axis pointing alternatively North West and South East, and see what happens...

Link to comment
Share on other sites

30 minutes ago, Henri Beauchamp said:

Normally, EstablishAgentCommunication for a neighbouring sim is sent when your avatar camera FOV crosses that region's border.

That's what seems to happen. If the camera isn't close enough to the edge, the region won't come up at all. Once it's close enough, sometimes there is a one minute delay. That happens even with the avatar stationary.

 

Link to comment
Share on other sites

I'm encouraged that this is getting attention.

There are a number of ordinary bugs, such as the ones Monty Linden has located in the event poller system. If the event poller loses events, things will break and stall. So that's a clear bug, and once fixed, some other random misbehavior may disappear.

This is not about packet loss, or, at least, in the absence of packet loss, things still break. Sharpview logs all reliable UDP message losses, and keeps retransmitting, with backoff, until packets get through. These region crossing problems appear even with zero lost UDP messages. This does not seem to be a network problem on the UDP side.

There are some touchy timing constraints. I've previously noted that a 2 second delay on sending RegionHandshakeReply after a login seems to fix some interest list problems. There's a temporary viewer/simulator ambiguity over where the avatar is after a teleport or login, as the simulator resolves "last", "home", landing points, and obstacles into a sane avatar location. Waiting 2 seconds helps. There may be other problems like that. The EstablishAgentCommunication delay, perhaps. Although that's very consistent when I try it at Morris/Dore on the beta grid.

The deepest problem seems to be double region crossings. Everything else is, I suspect, an ordinary bug. But double region crossings may be a design problem. The usual stuck state is where a vehicle is in the third region but avatar(s) are stuck in the second. The second region doesn't push the avatar to the third region for some reason, and the third region doesn't seem to have a way to pull the avatar from the second region.

Possible rule: object with sitters cannot cross a region boundary until all seated avatars both have object presence (ObjectUpdate, you can see the avatar) and control presence (CrossedRegion, keyboard and mouse events are going to the right place) in the same region as the vehicle. This would only affect crossings at region corners and really fast vehicles. (The F-16 on afterburner can cause a double region crossing situation by sheer speed.) This means a second crossing of a vehicle can't start until the first one has reached a complete, stable state. It's not a hard stop at every region crossing; the sailors hate that. It's a hard stop at the second region crossing, which was about to fail anyway.

  • Thanks 1
Link to comment
Share on other sites

5 minutes ago, animats said:

Possible rule: object with sitters cannot cross a region boundary until all seated avatars both have object presence (ObjectUpdate, you can see the avatar) and control presence (CrossedRegion, keyboard and mouse events are going to the right place) in the same region as the vehicle. This would only affect crossings at region corners and really fast vehicles. (The F-16 on afterburner can cause a double region crossing situation by sheer speed.) This means a second crossing of a vehicle can't start until the first one has reached a complete, stable state. It's not a hard stop at every region crossing; the sailors hate that. It's a hard stop at the second region crossing, which was about to fail anyway.

This is very interesting!

I wonder what happens if, for example, a tour bus (or any multi-seat vehicle) stops in the middle of a region crossing? For your possible rule, would that mean "nobody actually crossed yet" if "not everybody crossed yet"?

Link to comment
Share on other sites

Just now, Love Zhaoying said:

I wonder what happens if, for example, a tour bus (or any multi-seat vehicle) stops in the middle of a region crossing? For your possible rule, would that mean "nobody actually crossed yet" if "not everybody crossed yet"?

Avatars seated on a vehicle are child prims. Under normal conditions, they are all under the control of the same sim as the vehicle. When the root prim of the vehicle is 1m past the region edge, the crossing triggers. The avatars in the back of the bus have their control shifted to the new region, even if they're still located in the old region.

It's a weird situation. Avatars tend to fall off multi-passenger vehicles at single region crossings. That indicates yet another bug.

One bug at a time. We all have to keep pushing.

  • Thanks 1
Link to comment
Share on other sites

Watching this thread with interest.

I feel like part of the issue is that simulators seem to be authoritative about login sessions. ie. When you log in to SecondLife using the viewer, you're actually logging into a simulator. I wonder if that is really ideal, after all, should a simulator disconnect really lead to having no valid login session?

One thing I've always wanted for SecondLife personally is the ability to roam. That means being able to swap between a mobile viewer and a computer or laptop seamlessly, walk around town (which causes your IP to change) without being logged out.

If login sessions were untied from simulators and viewers, and instead centralised, it would probably be easier to correct simulators and viewers that have suffered from desync, whilst also providing residents with the possible benefit of being able to roam between devices in the future.

 

Link to comment
Share on other sites

Here's a video of what this is all about. This is another Sharpview test run.

https://video.hardlimit.com/w/77eo95FLSQF6waGuCtfPjV

It's just driving around, and watching the one minute delay for regions to come live. It's sometimes possible to skip the delay by crossing to an adjacent region. The current goal is to keep the nearest four regions live at reasonably high detail. So draw distance is 120 meters.

I still can't figure out why this delay happens. It's really frustrating. Logs tell me that an EstablishAgentCommunication message comes in one minute late. Henri Beauchamp only sees a 10 second delay. With Open Simulator, there's no delay. Very puzzled.

(There's much test code running here. This isn't anywhere near final. There's no camera motion filtering yet, so it's a rough, but realistic, ride. Motion interpolation is still very dumb. That's so the cosmetic algorithms don't hide underlying bugs. Content loading priority policy doesn't pay attention to which way you're looking or how fast you're going yet.)

The large amounts of empty land are not a bug. That's the Kama City land market.

Edited by animats
  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

A bit more info.

This has nothing to do with region corners. I've been trying along the edges of the Morris regions on the beta grid, out of draw range for the corners. I consistently get that one minute delay in Sharpview.

For several fails, I've seen the following pattern from the event poller:

  • Event EnableSimulator comes in successfully. My viewer starts the process of connecting to the region, and all the UDP stuff with region handshaking, etc. goes fine. The event poller in the viewer polls again, with a 90 second timeout.
  • Exactly 30 seconds later, the event poller returns a transport error, "Connection aborted / Unexpected EOF". The viewer side polls again.
  • Exactly 30 seconds after that, the event poller returns a 500 Internal Server Error status. The viewer side polls again.
  • 2 seconds after that, a valid, but duplicate, EnableSimulator event comes in via the event poller. The viewer side polls again.
  • One second after that, a valid EstablishAgentCommunication event comes in, providing the needed seed capability, and the viewer proceeds to request and get caps, after which the region goes live in the viewer. Success, one minute late.

This now looks like an event poller bug. I'm probably seeing it because I have a totally different HTTP client than the C++ viewers do.

Henri Beauchamp has reported shorter delays (see above) and strange delays of regions not appearing when expected are not unknown in other viewers. It's probably not just me. Monty Linden has looked at that poll reply code with suspicion. I think we're getting close.

That 500 Internal Server Error is very suspicious. Who generates that? Apache? The sim itself?

Edited by animats
typo
  • Thanks 1
Link to comment
Share on other sites

11 hours ago, animats said:

A bit more info.

This has nothing to do with region corners. I've been trying along the edges of the Morris regions on the beta grid, out of draw range for the corners. I consistently get that one minute delay in Sharpview.

For several fails, I've seen the following pattern from the event poller:

  • Event EnableSimulator comes in successfully. My viewer starts the process of connecting to the region, and all the UDP stuff with region handshaking, etc. goes fine. The event poller in the viewer polls again, with a 90 second timeout.
  • Exactly 30 seconds later, the event poller returns a transport error, ”Connection aborted / Unexpected EOF”. The viewer side polls again.
  • Exactly 30 seconds after that, the event poller returns a 500 Internal Server Error status. The viewer side polls again.
  • 2 seconds after that, a valid, but duplicate, EnableSimulator event comes in via the event poller. The viewer side polls again.
  • One second after that, a valid EstablishAgentCommunication event comes in, providing the needed seed capability, and the viewer proceeds to request and get caps, after which the region goes live in the viewer. Success, one minute late.

This now looks like an event poller bug. I'm probably seeing it because I have a totally different HTTP client than the C++ viewers do.

Henri Beauchamp has reported shorter delays (see above) and strange delays of regions not appearing when expected are not unknown in other viewers. It's probably not just me. Monty Linden has looked at that poll reply code with suspicion. I think we're getting close.

That 500 Internal Server Error is very suspicious. Who generates that? Apache? The sim itself?

In C++ viewers, we use libcurl, which ”silently” retries such 500 errors (basically re-posting the same poll to the server, but unknown from the viewer code itself).

If I were you, I'd implement a 26s timeout in Sharpview for event polls in SL, like I did to fight TP race conditions (also due to the tearing and retrying of the poll due to the server error message).

See my former messages about it in this thread, and the Cool VL Viewer code (in lleventpoll.cpp, mostly, for the event poll timeout and restart parts, but also llvieweregion.cpp, llviewerdisplay.cpp and llagent.cpp for the TP race workaround itself).

Edited by Henri Beauchamp
Link to comment
Share on other sites

After looking further, this doesn't seem to be timeout-related. When I try this at the region corner in Morris, I quickly get three EnableSimulator events in a row, which is correct. Then about four seconds later, after all the region handshake stuff is complete for three regions, I get only one EstablishAgentCommunication, for one of the three adjacent regions. Always the same one.

All that happens within seconds of login. Polling hasn't had a chance to time out yet.

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 68 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...