Jump to content

Obscure question: when does the simulator send EstablishAgentCommunication to the viewer?


animats
 Share

You are about to reply to a thread that has been inactive for 208 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

I have the happy path for attaching to neighbor regions working in Sharpview. The long poll seems to work well with a 90 second timeout viewer side. As Henri points out, not using libcurl seems to help. (Sharpview is all Rust code, so the lower levels are totally different from C++ land.)

I get that the seed capability is asynchronous. I realize that's part of the security system.

I haven't dealt with teleports yet. I'm currently looking at how objects cross region boundaries. I spend a lot of time watching the New Babbage trams making their rounds.

Miscellaneous questions:

  • What indicates a region restart to the viewer?
  • I accept EstablishAgentCommunication any time after sending UseCircuitCode. Sometimes it takes a while to show up, as in tens of seconds, and yes, it does seem to be driven by camera updates from the main region. Thanks for confirming that. As I said above, I thought I needed to do something to kickstart that. Logging in and just sitting there results in no EstablishAgentCommunication events from the simulator. I probably need to send some agent updates even if the agent isn't doing anything at first.
  • I notice that if I get an EnableSimulator but don't bring up the region, I keep getting EnableSimulator messages. Is it a problem for the simulators if I don't bring up a region viewer side the simulators would like to have go live? I'm considering a setup where I have a draw distance long enough to fully cover 4 regions around the main agent, but only the closest 4 regions are brought u in the viewer. Some kind of region impostor will be shown beyond the nearby regions. Is this going to create any problems simulator-side?
  • At logout, is CloseCircuit sufficient to close out child regions, or is some other handshaking required?

I really appreciate the answers posted above. Us TPV people have to try to figure this stuff out from logs and the viewer code, and that works well only for the happy path.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  • Lindens

These are going to require some digging.  Unexpected simulator restarts (crashes, hangs, host disappears) are mostly in the realm of undefined behavior.  The UDP circuit will timeout eventually and that's pretty authoritative - session is done at that point.  Clean restarts are mostly going to involve kicks.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Here's some logging info that provides a bit more insight. This is Wednesday evening on Aditi, and times are UTC. Selected log entries from Sharpview. Logged into Morris, standing near the Morris/Ahern/Bonafacio/Dore corner, a place I visit far too often.

Quote

06:06:47 [WARN] Region (255232,256256) state change from Idle to Discovered: New
06:06:47 [WARN] Region (255232,256256) state change from Discovered to Connected: Region handshake
06:06:47 [WARN] Region (255232,256256) state change from Connected to SeedCapabilityReceived: Have seed capability from login/teleport
06:06:48 [WARN] Region (255232,256256) state change from SeedCapabilityReceived to CapabilitiesReceived: Capabilities received.
06:06:48 [WARN] Region [Morris] (255232,256256) state change from CapabilitiesReceived to Live: Live
06:06:49 [WARN] Region (255232,256512) state change from Idle to Discovered: New
06:06:49 [WARN] Region (254976,256256) state change from Idle to Discovered: New
06:06:49 [WARN] Region (254976,256512) state change from Idle to Discovered: New
06:06:50 [WARN] Region (254976,256256) state change from Discovered to Connected: Region handshake
06:06:51 [WARN] Region (255232,256512) state change from Discovered to Connected: Region handshake
06:06:51 [WARN] Region (254976,256512) state change from Discovered to Connected: Region handshake
06:06:53 [WARN] Region (255232,256512) state change from Connected to SeedCapabilityReceived: Seed capability from establish agent communication event
06:06:53 [WARN] Region (255232,256512) state change from SeedCapabilityReceived to CapabilitiesReceived: Capabilities received.
06:06:53 [WARN] Region [Ahern] (255232,256512) state change from CapabilitiesReceived to Live: Live
06:07:51 [WARN] Region (254976,256256) state change from Connected to SeedCapabilityReceived: Seed capability from establish agent communication event
06:07:51 [WARN] Region (254976,256256) state change from SeedCapabilityReceived to CapabilitiesReceived: Capabilities received.
06:07:51 [WARN] Region [Bonifacio] (254976,256256) state change from CapabilitiesReceived to Live: Live
06:07:55 [WARN] Region (254976,256512) state change from Connected to SeedCapabilityReceived: Seed capability from establish agent communication event
06:07:55 [WARN] Region (254976,256512) state change from SeedCapabilityReceived to CapabilitiesReceived: Capabilities received.
06:07:55 [WARN] Region [Dore] (254976,256512) state change from CapabilitiesReceived to Live: Live

Here we're seeing the four regions being found by the viewer. They all get found. Morris gets found at login. The other three regions show as Discovered, which means EnableSimulator came in. Each of those regions quickly gets a RegionHandshake, and object updates start pouring in. (They go on an internal hold queue, though,  until the region is fully live.) Ahern goes live quickly; 2 seconds after the RegionHandshake, the EstablishAgentCommunication event comes in via the event poller. The region goes live and appears on screen.

But then, for a full minute, Bonafacio and Dore don't show up. They're stuck waiting for their EstablishAgentCommunication events. Note the times above. One minute later, the event poller in Sharpview reports an unexpected EOF on the poll request. Sharpview's poller has a 90 second timeout, so this is presumably a server-side timeout. So Sharpview does another poll, and now the EstablishAgentCommunication events for Dore and Bonafacio regions come in, and those regions quickly appear on screen properly.

So long polling isn't working properly here. With a long poll in progress, an event from the server did not cause an outstanding long poll to complete and quickly return the new event. The poll had to time out first, and the next poll retrieved the event.

Sharpview's polling is very simple - make HTTP request, wait 90 seconds. No libcurl retries. Uses the simplest Rust HTTPS crate: "ureq".

I have more logging info, and can make this happen on request. I've tried this three times. Moving the avatar around Morris near the corner, to force more agent updates, does not change this behavior. One minute delay, every time.

(I wonder if the libcurl retries are what makes this work for other viewers. A retry by the viewer would get the polling process un-stuck. That should not be necessary, but may be why it works in the mainstream viewers.)

Edited by animats
  • Like 1
Link to comment
Share on other sites

The problem is real, but it's more puzzling. I set the poll timeout on my side to 10 seconds. That gets me a timeout and re-poll every 10 seconds. But it still takes a full minute to get the EstablishAgentCommunication events. Puzzled. Somewhere server side, there is a 1 minute timer, apparently.

I'm doing something wrong, or a least differently, here.

Edited by animats
  • Like 1
Link to comment
Share on other sites

  • Lindens

I only have default logging active so can only compare some phases.  Ahern:

Quote

Oct 26 06:06:51 simhost-0ebe97adfa0d247d0.aditi.secondlife.io simulator[2447]: INFO: init_agent: init_agent - Creating new child agent <animats>
Oct 26 06:06:51 simhost-0ebe97adfa0d247d0.aditi.secondlife.io simulator[2447]: INFO:#AgentUsher LLAgentCommunication::grantSeedCapability: LLAgentCommunication::grantSeedCapability service URL: http://localhost:12040/service-proxy/cap/grant, private url http://localhost:13003/agent/<animats>/base-capabilities
Oct 26 06:06:53 simhost-0ebe97adfa0d247d0.aditi.secondlife.io simulator[2447]: INFO: LLInterestList::copyCacheState: global-interest-list provided 1331 cacheable subscriptions to for agent <animats>
Oct 26 06:06:53 simhost-0ebe97adfa0d247d0.aditi.secondlife.io simulator[2447]: INFO:#CapsInfo LLAgentCommunication::grantCapabilities: Starting grantCapabilities for agent <animats>

The last line indicates where you've invoked the seed capability and things proceed as expected.

Dore:

Quote

Oct 26 06:06:51 simhost-0ed8234b30a9d6270.aditi.secondlife.io simulator[8982]: INFO: init_agent: init_agent - Creating new child agent <animats>
Oct 26 06:06:51 simhost-0ed8234b30a9d6270.aditi.secondlife.io simulator[8982]: INFO:#AgentUsher LLAgentCommunication::grantSeedCapability: LLAgentCommunication::grantSeedCapability service URL: http://localhost:12040/service-proxy/cap/grant, private url http://localhost:13003/agent/<animats>/base-capabilities
Oct 26 06:06:53 simhost-0ed8234b30a9d6270.aditi.secondlife.io simulator[8982]: INFO: LLInterestList::copyCacheState: global-interest-list provided 5749 cacheable subscriptions to for agent <animats>
Oct 26 06:07:55 simhost-0ed8234b30a9d6270.aditi.secondlife.io simulator[8982]: INFO:#CapsInfo LLAgentCommunication::grantCapabilities: Starting grantCapabilities for agent <animats>

The seed caps are generated at the same time.  Interest list activity should indicate it is enabled (though I didn't check) so I think the RegionHandshakeReply is complete.  EAC should be clear to send.  But Dore's seed capability invocation by the viewer is delayed.  I'd normally blame the viewer for the delay.  :)  Lost EACs will be resent at a maximum rate of once per 5s but it must be driven by IL activity forwarded from main (Morris).  That is where I'd look.

(BTW, it may be beneficial to get a key logging setup going with your TLS support.  Viewer will get this in the future but being able to wireshark the https: stream and decode it now and in the future may be very useful.)

  • Thanks 1
Link to comment
Share on other sites

42 minutes ago, Monty Linden said:

I only have default logging active so can only compare some phases.  Ahern:

The last line indicates where you've invoked the seed capability and things proceed as expected.

Dore:

The seed caps are generated at the same time.  Interest list activity should indicate it is enabled (though I didn't check) so I think the RegionHandshakeReply is complete.  EAC should be clear to send.  But Dore's seed capability invocation by the viewer is delayed.  I'd normally blame the viewer for the delay.  :)  Lost EACs will be resent at a maximum rate of once per 5s but it must be driven by IL activity forwarded from main (Morris).  That is where I'd look.

(BTW, it may be beneficial to get a key logging setup going with your TLS support.  Viewer will get this in the future but being able to wireshark the https: stream and decode it now and in the future may be very useful.)

> RegionHandshakeReply is complete. 

Yes, it is, and object updates are pouring in. They go on a holding queue waiting for the EstablishAgentCommunication and the capability fetch for the asset capabilities. Once those are in, the region goes live and all the held object updates are processed, at which point the region snaps into visibility.

> But Dore's seed capability invocation by the viewer is delayed.  I'd normally blame the viewer for the delay. 

No, as soon as the viewer gets EstablishAgentCommunication, it makes the HTTPS request for the initial region capabilities. I have logging of that; I just didn't send an entire log file.

How does the viewer "invoke the seed capability"? RegionHandshakeReply is done, and we know the simulator got it, because object updates started coming in.

> but it must be driven by IL activity forwarded from main (Morris).  That is where I'd look.

"IL activity"? What is that?

There are definitely plenty of AgentUpdate messages being generated. The avatar can be moved around within Morris, and another viewer in Firestorm looking at the same spot sees it moving. Moving the avatar around, or keeping it still, does not seem to affect the 1 minute delay. Is there something else that the viewer must send to the main agent simulator?

I've tried making the event poller time out and retry every 15 seconds. That has no effect on this issue. I'll get several empty timed-out polls, then an EstablishAgentCommunication message. Usually I use a 90 second timer and let the simulator time out. Works the same either way. So it's not a consequence of the poller timeout. Maybe.

The 1 minute delay (± 2 secs) before EstablishAgentCommunication shows up is very consistent in all these scenarios.

More excerpts from the same log summarized previously:

06:06:53 [WARN] EventEstablishAgentCommunication, unparsed: Map({"seed-capability": String("CENSORED"), "agent-id": UUID(CENSORED), "sim-ip-and-port": String("54.202.5.63:13003")})
06:06:53 [WARN] Establish agent communication event: [MsgServerEvent { region_handle: (255232,256256), event: EstablishAgentCommunication(EventEstablishAgentCommunication { socket_addr: 54.202.5.63:13003, agent_id: CENSORED, seed_capability: "CENSORED" }) }]
06:06:53 [WARN] Establishing agent communication to (255232,256512)
06:06:53 [WARN] Region (255232,256512) state change from Connected to SeedCapabilityReceived: Seed capability from establish agent communication event
06:06:53 [WARN] Fetching capabilities for region (255232,256512)
...
06:06:53 [WARN] Fetched capabilities: RegionCapabilities(RegionCapabilities { start_time: Instant { tv_sec: 2291337, tv_nsec: 167534157 }, region_handle: (255232,256512), region_capabilities: {"GetMesh2": "CENSORED", "GetMesh": "CENSORED", "RenderMaterials": "CENSORED", "GetTexture": "http://asset-cdn.glb.aditi.lindenlab.com", "EventQueueGet": "CENSORED"} })
06:06:53 [WARN] Region (255232,256512) state change from SeedCapabilityReceived to CapabilitiesReceived: Capabilities received.
06:06:53 [WARN] Region [Ahern] (255232,256512) state change from CapabilitiesReceived to Live: Live

This is from the same Sharpview log as previously posted. There's been a 1 minute delay between RegionHandshake/RegionHandshakeReply for Ahern before EstablishAgentCommunication was received at the viewer end. That message was rerouted from the viewer's main agent region (Morris) to the neighbor region in the viewer (Ahern). That causes the viewer to request region capabilities "GetMesh2", etc, and those come back within the same second. Ahern then goes to Live state and appears on screen. So the delay is before EstablishAgentCommunication is received, not in fetching capabilities.

It's quite possible that Sharpview needs to send something more to wake up the simulator, but I don't know what.

Link to comment
Share on other sites

Thinking about it, I may have seen this, on rare occasions, in Firestorm. My home login location, in a second floor office with windows, overlooks three adjacent regions, and sometimes one of those regions comes up blank, then appears after a delay. If your login location doesn't overlook other regions, you might never see this. Comments?

  • Like 1
Link to comment
Share on other sites

1 hour ago, animats said:

Thinking about it, I may have seen this, on rare occasions, in Firestorm. My home login location, in a second floor office with windows, overlooks three adjacent regions, and sometimes one of those regions comes up blank, then appears after a delay. If your login location doesn't overlook other regions, you might never see this. Comments?

II remember hearing about a bug that happened before my time, from a friend, back 10? years ago.

Some minor change to the server code was made to "improve" communication between neighbouring regions, to rez in objects in your draw distance on those nearby regions.

Somehow, they messed up when dealing with regions DIAGONALLY to the south east and south west. All other directions worked fine, BUT, if you had another region to the SE or SW within your draw distance, your UDP packet rate went up by a factor of 1000%.

 

Appantently some people on "limited monthly bandwidth allowance" at the time, left SL, because due to living in the southern corners of the old Linden homes regions, they were burning a whole months internet bandwidth in a week, without realising it till the penalty overuse bills came in, just by standing in their Linden Homes.

 

So with regard to SOME regions taking excessive amounts of time to respond, is there a pattern to the relative direction, try your experiment somewhere in the middle of a group of 9 regions, say in Horizons, and see if the direction affects the response time.

 

It happened before, might have happened again, long shot but worth checking.

Link to comment
Share on other sites

  • Lindens
On 10/30/2023 at 11:27 PM, animats said:

> But Dore's seed capability invocation by the viewer is delayed.  I'd normally blame the viewer for the delay. 

No, as soon as the viewer gets EstablishAgentCommunication, it makes the HTTPS request for the initial region capabilities. I have logging of that; I just didn't send an entire log file.

How does the viewer "invoke the seed capability"? RegionHandshakeReply is done, and we know the simulator got it, because object updates started coming in.

This is where the delay shows up.  For Dore from the server side, at 06:06:51, the seed capability (cap-granting cap) is constructed and should be available in an EAC message.  At 06:07:55, the server handles an invocation of the seed capability (POST to the seed cap) which generates the full capability set.

The delay sits between those two points.  Will take as a given from the viewer log that viewer receives *an* EAC message by 06:07:55 but may not be the original message.  Was delay caused by:

  • Additional gate on sending first EAC message
  • Sim-to-sim loss of EAC message requiring regeneration
  • Sim-to-viewer loss of EAC message on EventQueue or elsewhere requiring regeneration
  • Delay in receiving or processing response somewhere
On 10/30/2023 at 11:27 PM, animats said:

> but it must be driven by IL activity forwarded from main (Morris).  That is where I'd look.

"IL activity"? What is that?

There are definitely plenty of AgentUpdate messages being generated. The avatar can be moved around within Morris, and another viewer in Firestorm looking at the same spot sees it moving. Moving the avatar around, or keeping it still, does not seem to affect the 1 minute delay. Is there something else that the viewer must send to the main agent simulator?

AgentUpdate messages to main region are required to drive updates into child regions.  Resulting Interest List activity there drives EAC message generation (and re-generation).  The only obvious artificial delay is the mentioned 5-second retry throttle.  The code is a tangle but there is no obvious 1min throttle involved.

 

On 10/30/2023 at 11:27 PM, animats said:

The 1 minute delay (± 2 secs) before EstablishAgentCommunication shows up is very consistent in all these scenarios.

More excerpts from the same log summarized previously:

06:06:53 [WARN] EventEstablishAgentCommunication, unparsed: Map({"seed-capability": String("CENSORED"), "agent-id": UUID(CENSORED), "sim-ip-and-port": String("54.202.5.63:13003")})
06:06:53 [WARN] Establish agent communication event: [MsgServerEvent { region_handle: (255232,256256), event: EstablishAgentCommunication(EventEstablishAgentCommunication { socket_addr: 54.202.5.63:13003, agent_id: CENSORED, seed_capability: "CENSORED" }) }]
06:06:53 [WARN] Establishing agent communication to (255232,256512)
06:06:53 [WARN] Region (255232,256512) state change from Connected to SeedCapabilityReceived: Seed capability from establish agent communication event
06:06:53 [WARN] Fetching capabilities for region (255232,256512)
...
06:06:53 [WARN] Fetched capabilities: RegionCapabilities(RegionCapabilities { start_time: Instant { tv_sec: 2291337, tv_nsec: 167534157 }, region_handle: (255232,256512), region_capabilities: {"GetMesh2": "CENSORED", "GetMesh": "CENSORED", "RenderMaterials": "CENSORED", "GetTexture": "http://asset-cdn.glb.aditi.lindenlab.com", "EventQueueGet": "CENSORED"} })
06:06:53 [WARN] Region (255232,256512) state change from SeedCapabilityReceived to CapabilitiesReceived: Capabilities received.
06:06:53 [WARN] Region [Ahern] (255232,256512) state change from CapabilitiesReceived to Live: Live

This is from the same Sharpview log as previously posted. There's been a 1 minute delay between RegionHandshake/RegionHandshakeReply for Ahern before EstablishAgentCommunication was received at the viewer end. That message was rerouted from the viewer's main agent region (Morris) to the neighbor region in the viewer (Ahern). That causes the viewer to request region capabilities "GetMesh2", etc, and those come back within the same second. Ahern then goes to Live state and appears on screen. So the delay is before EstablishAgentCommunication is received, not in fetching capabilities.

It's quite possible that Sharpview needs to send something more to wake up the simulator, but I don't know what.

Above fragment is from Ahern which is a child region that *doesn't* have the delay.  The capability setup ran through at the desired rate.  So there is a difference between children.  One appears to operate normally, two appear delayed.  We don't have enough logging enabled by default for me to tell from this run what the source of the delay is.

  • Thanks 1
Link to comment
Share on other sites

Thanks.

I'm looking hard at my logs right now. The stall may have started further back, at EnableSimulator. I get EnableSimulator for each neighbor region, and reply with UseCircuitCode and a CompleteAgentMovement to the neighbor region. Maybe I shouldn't be sending a CompleteAgentMovement to the neighbor region.

The neighbor region simulator sends the viewer a RegionHandshake without any delay, the viewer answers that with a RegionHandshakeReply, and update messages start. So the neighbor simulators seem to be happy, having accepted the connection and delivering updates.

One minute later, there's a duplicate EnableSimulator from the two neighbor regions that are not fully live yet, but not from the ones that are live and displayed. Each of those is immediately followed by the missing EstablishAgentCommmunication event. Then the region starts up.

It's starting to look like I did something wrong back at the EnableSimulator handshake. After UseCircuitCode, what's supposed to be sent to a neighbor region? There's a good chance I'm doing that wrong.

Puzzled that this all works for the first neighbor region, but not others. More later.

  • Like 1
Link to comment
Share on other sites

A bit more insight. If I set the draw distance to 64 meters, and log Sharpview in near the Ahern/Morris/Dore/Bonifacio corner on the beta grid, two of the regions take a full minute to appear. But if I log in from a point more than 64m from the region corner, and then get closer, crossing the 64m threshold triggers the appearance of the missing regions. So the simulator function that turns on regions as you get close is working, and something that apparently checks once a minute server side for the correct regions being up is working, but the initial state after login is not right.

I wonder if the one minute timer is related to making regions go away when you get far enough away. That's not immediate; there's some delay there, reasonably enough. to prevent frantic region connect/disconnect when close to the draw limit.

I probably need to do something to kick things into action right after login. But what? Time to go read SL viewer code again.

More later.

  • Thanks 1
Link to comment
Share on other sites

3 hours ago, animats said:

in near the Ahern/Morris/Dore/Bonifacio corner on the beta grid, two of the regions take a full minute to appear.

I'l ask again. Have you checked other corners, to see if the "take a full min" regions have anything in common as to DIRECTION, SL has already had ONE incident of a bug making SE/SW neighbour regions cause problems.

 

Check other corners, and see if the regions that have 1 min rez delays there are in similar directions or different directions, check also on the same corner of a DIFFERENT set of regions, 

Probably not relevant BUT you'd feel a fool if it turned out the problem ONLY happens on a North and East neighbours all along.

 

Link to comment
Share on other sites

Tests as requested:

This is the Dore (NW) Ahern (NE) Bonifacio (SW) Morris (SE) corner on the beta grid. The login region always comes up immediately.

  • Logged in at region center of Morris with 64m draw distance. Neighbor regions appear when 64m distance crossed. Logged into Morris near corner. Ahern appears immediately, Dore and Bonifacio after one minute. Have done this many times; repeatable.
  • Logged into Dore at region center with 64m draw distance. Neighbor regions appear when 64m distance crossed. Logged into Dore near corner. Morris appears immediately, other two after one minute.
  • Logged into Bonifacio at region center. Morris and Dore appeared when walked into range. Ahern took 1 minute. Logged into Bonifacio near corner. Morris appeared immediately. Other neighbors took one minute.
  • Logged into Ahern. Landed near corner due to landing point. Can't land at center. Bonifacio appeared immediately. Dore and Ahern took one minute.

The Other Simulator:

  • Logged in near region corner, all four regions come up immediately with no problems.

Note that region handshakes always take place, and object updates start coming in. It's only the EstablishAgentCommmunication message that doesn't show up. I suspect there's some undocumented thing I need to do after login to get things going, but I have no idea what. As far as I can tell from the code, I'm sending what the SL viewer sends.

I haven't yet reached the Wireshark level of reverse engineering desperation.

  • Thanks 1
Link to comment
Share on other sites

Since Monty Linden wrote that EstablishAgentCommunication is pumped by AgentUpdate from the viewer, I've been trying various things with Sharpview's sending of AgentUpdate messages. Turns out I was sending AgentUpdate as a reliable message. Switched to sending it as an unreliable message, and that helped. When idle, Sharpview sends it ever second. Twice, as hinted at by this code in the SL viewer.

This works a bit better. Not reliably. But sometimes a little motion of the viewpoint, motion that doesn't cross the view distance boundary, will cause adjacent regions to send their EstablishAgentCommunication messages.

In all this, AgentUpdate is doing its main job of moving the avatar around just fine. I can even log into the same area with another viewer and watch the avatar move. Whether sending reliably or unreliably, that works. It's only the mechanism that causes an AgentUpdate to be relayed to neighbor regions that seems to be not responding. Something in this area is very touchy and I'm not touching it properly, or at least not in exactly the same ways with the same timing that the C++ viewers do. There seems to be sensitivity to movement or timing that shouldn't matter.

Edited by animats
Link to comment
Share on other sites

3 hours ago, Monty Linden said:

Why on earth anyone would design a stitching protocol this way I can't imagine....

The protocol isn't that bad. It's a nicely robust approach. Simulators manage their connections to their neighbors and pass viewers from region to region without help from some central coordinator. So there's no single point of failure. Simulator failures take out only their own region. Servers can be added or deleted without affecting grid stability. The central services (login, baking, etc.) aren't on the time-critical paths. The entire SL grid hasn't failed in years. It's more reliable than most web services.

Most failure modes at the message level seem to be "stuck" situations where A is waiting for B and B is waiting for A. Teleport, region crossing, login, and neighbor connection troubles all look like that. The other major class of trouble is where A is out of sync with B but has no way of either detecting or fixing that. Interest list problems are like that. Both of those classes of problems are time-consuming to chase down. Much looking at logs is required. Tools for looking at logs can help.

Documentation and ordinary bugs are the big problems. This EstablishAgentCommunication thing looks to me like some ordinary kind of bug, complicated by lack of documentation on exactly how to do a neighbor connection properly. The Other Simulator uses the same protocol and doesn't have this problem when Sharpview talks to it.

Enough computer science philosophizing for one day. Onward!

  • Like 1
Link to comment
Share on other sites

Minor note on things I've tried:

  • The very first AgentUpdate always has a bogus camera_center value, because the viewer has to send an AgentUpdate before the simulator will send back ObjectUpdate messages that tell the viewer where the avatar is. So I've tried various initial guesses. <0,0,0>, <128,128,128>, and starting at <128,128,128> then slowly interpolating to the correct position. No effect on EstablishAgentCommunication.
  • Tried changing the "far" value in the initial AgentUpdate. 0, 1.0, and 400.0 all produce the same results. I get the impression that while the viewer does have to send that guessed AgentUpdate to get things started, the positions in it don't do much at the simulator end, which is reasonable enough. Both sides know those values are bogus.
  • Discovered that the simulator sends RegionHandshake twice. Confirmed with Henri Beauchamp that he sees that happening in his viewer. Now sending RegionHandshakeReply for both of them, which Henri says is necessary to prevent missing objects. Doesn't affect EstablishAgentCommunication.
  • Tried sending RegionHandshakeReply once a second while waiting for EstablishAgentCommunication, in hopes of waking things up at the simulator end. No effect on anything.
  • Tried sending AgentUpdate messages at faster and slower rates when not moving. No effect.
  • Tried sending AgentUpdate messages as both reliable and unreliable. No effect.

After all this, EstablishAgentCommunication comes in quickly for the first neighbor region, then at one minute for all the others, as before. None of the above has any effect on that behavior. Starting point is near the corner, far - 64.0 meters. The neighbor regions are sending ObjectUpdate messages as they should. It's just those EstablishAgentCommunication messages that contain the seed capabilities for the neighbor regions, passed back via the event poller of the login region, that don't show up on time.

All this is on the beta grid, at Morris.

It's so frustrating. Almost all the right stuff is happening. But I don't know the right incantation to make the last bit happen.

morriscorner.thumb.png.2ed799214e2c5aa5d809d3aa65b87893.png

Beautiful downtown Morris, beta grid. This is a region corner of Morris/Dore/Ahern/Bonifacio.

When I look at this with Sharpview, the login region, Morris, where the agent is now, comes up immediately. The region on the right comes up in a few seconds. The region on the left and the region ahead come up almost exactly one minute after start. It's that one minute delay that has me stumped.

(Some of the plants seem to be missing. They're present, just tiny. Each built-in Linden tree type has its own scale factor. They don't work like ordinary objects. I need to add more special cases for those legacy plants.)

 

Link to comment
Share on other sites

Went there, all four regions come up and everything rezzes in less than 20s. Here is the relevant log info with milliseconds accurate time stamps and UDP messages debugging enabled (repeated, irrelevant messages removed for clarity); maybe you will find what is missing in your viewer/servers communications....

But you could learn a lot by yourself via the use of the Cool VL Viewer logging features (that's how I managed to make threaded object cache reads work flawlessly):

  • Get the viewer, launch it, log in and go to the place you want in SL.
  • Disable ”Advanced” -> ”Caches” -> ”Threaded object cache reads” (this way the viewer will behave like LL's and other TPVs and the threads delay won't risk polluting the messages sequence).
  • Enable ”Advanced” -> ”Consoles” -> ”Precise timestamps in log file”.
  • Use the mini-map to help setting the draw distance so that you can see only the four sims in the FOV cone (192m did it for me at the approximate position of your screen shot).
  • Log off.
  • Relaunch the viewer, and at the login screen use ”Advanced” -> ”Debug tags”, and in the floater that opens, scroll down and check ”Messaging” in the list; you may also enable ”EventPoll” if you are interested in those in excess of what is already normally logged at INFO level. Close the floater. (*)
  • Log in, wait for everything to rez and your avatar to bake, then log off.
  • The CoolVLViewer.log will contain all the detailed login and rezzing sequences, with detailed log messages.

(*) Note: this floater allows to enable/disable DEBUG level messages in real time; once logged in there is also an ”Advanced” -> ”Consoles” -> ”Allow DEBUG messages” option (automatically checked when you did enable a debug tag), to disable/re-enable at once all the debug messages corresponding to the configured tag(s) when you deem appropriate, so to avoid too much/irrelevant spam in the log file (and console).

Edited by Henri Beauchamp
typos
  • Thanks 1
Link to comment
Share on other sites

Very helpful. Looking at Henri's log, it's very similar to what I get. Found something significant, though:

  • 10:09:45.972Z - The first neighbor region, Ahern, gets its EstablishAgentCommunication.
  • 10:09:55.977Z - The second neighbor region, Dore, gets its EstablishAgentCommunication.
  • 10:09:56.340Z - The third neighbor region, Bonifacio, gets its EstablishAgentCommunication.

Note the timing. First neighbor comes up quickly. Then there's a 10 second delay. Then the rest of the regions come up.

This is very similar to what I get, but the delay for Sharpview is 60 seconds instead of 10.

It looks like there's a deliberate delay server side, which can be 10 or 60 seconds. That's a precise 10 second delay, within 10 milliseconds of being exactly ten seconds. So what does that?

What am I doing differently?

  1. Sharpview doesn't have an object cache yet, just mesh and texture caches. Sharpview always sets "send all objects" on RegionHandshakeReply, which is like starting from an empty cache. Might this be some throttling system so that, when you start from an empty object cache, the viewer isn't flooded with new objects?
  2. Sharpview isn't sending ViewerEffect, because avatars are just blocks for now. That's probably irrelevant.

Somewhere server side, there seems to be a timer which can be either 10 or 60 seconds.

Comments?

  • Thanks 1
Link to comment
Share on other sites

Fixed Sharpview to use the simulator object cache API properly. Instead of asking for everything as if from an empty cache, it asks for ObjectUpdateCached messages. Those come in (98 objects each) and RequestMultipleObject messages are sent by the viewer to get those objects, currently in blocks of no more than 100. This has no effect on the 60-second delay for EstablishAgentCommunication.

So that wasn't the problem.

  • Like 1
Link to comment
Share on other sites

8 hours ago, Henri Beauchamp said:

Just a wild guess, but are you replying to StartPingCheck with CompletePingCheck too ?...

Yes. Put that in last week. No effect.

My logs are very close to your logs in terms of what messages are sent. What's in the messages may be an issue somewhere.

I just noticed that I'm not sending AgentThrottle. It would make sense that the default throttle settings might be low. I'll implement that next. This might be some throttling mechanism left over from the assets over UDP era.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

14 minutes ago, animats said:

Yes. Put that in last week. No effect.

My logs are very close to your logs in terms of what messages are sent. What's in the messages may be an issue somewhere.

Or of a missing reply to some other server message, causing the said server to ”pause” messaging to your viewer, waiting for it to reply before finally timing out on the latter and continuing with the rest of the queued messages...

14 minutes ago, animats said:

I just noticed that I'm not sending AgentThrottle. It would make sense that the default throttle settings might be low. I'll implement that next. This might be some throttling mechanism left over from the assets over UDP era.

Yes, and it is still in use (even though all the asset and texture reserved bandwidth it accounts for should now be removed both viewer and server sides... A work for @Monty Linden ? 😜); this also means (and contrarily to an SL urban legend about supposed bad effects of a large UDP bandwidth setting) that you can now use a much larger bandwidth number (only part of it will be used for objects data, anims start/stop, etc, anyway). I personally use 32Mbps as a setting (I'm on an FTTH link) in the Cool VL Viewer (which defaults at 8Mbps).

Edited by Henri Beauchamp
  • Like 1
Link to comment
Share on other sites

  • Lindens
1 hour ago, Henri Beauchamp said:

Yes, and it is still in use (even though all the asset and texture reserved bandwidth it accounts for should now be removed both viewer and server sides...

This is actually how the simulator is made aware of the viewer's bandwidth setting.  It is a terrible mechanism and the weights used on both sides are nonsense these days.  I have outstanding Jiras to update this.  I don't think this is related to the EAC message problem but it is still desired.  (Defaults and limits keep things running regardless.)

  • Thanks 2
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 208 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...