Jump to content

Monty Linden

Lindens
  • Posts

    533
  • Joined

Everything posted by Monty Linden

  1. @Will2024 if you created your account in December or later, this may be a bug we're currently working on. Run through the checks as @Rowan Amore outlines. If they're good, it may be on us. We don't have our Github Issues access in place yet but look for issue 593.
  2. I can't speak for Catznip but I suspect this was related to deprecation of certain UDP inventory operations. Those UDP messages are unlikely to come back and the way forward, good or bad, is the HTTP-based system.
  3. Understood. Happy to have any insight into a possible pattern. So thanks for the report, there is something to dig into there.
  4. Are the timeouts consistently to the login endpoint host (login.agni.lindenlab.com)? Worst-case metrics look good there so it's either stuck in front of the service or some *really* unhealthy instances.
  5. We're coming up on the holiday break and Lindens are going to start disappearing (myself included). If you are comfortable posting viewer logs to Jira, file a BUG jira, attach the logs from a bad session, then mention the Jira number here so it can be found quickly.
  6. The SecondLife.log or SecondLife.old file associated with a bad session. It can be attached when filing a support ticket.
  7. Correct. We're happy to get any kind of traceroute between your home systems and a simhost in support tickets. But here are some hints on better trace routing: Better in the sense that it will terminate. It doesn't, for example, detect problems unique to UDP.
  8. I don't see it either, either in my own travels or in the metrics. But I can believe something's wrong. We just need data from the field.
  9. Support should bug you for more information soon enough. They are standing by hungry for your calls...
  10. Need to see those log files to know what's going on. I just did 10 TPs in 60s so this smells regional.
  11. Nothing really going wrong on the grid. Get some support tickets in and try login and TP to other regions.
  12. May be expected. SL viewer is shipping five or so expired certs right now. 2025 is when the magic cert expires and that will be an interesting day.
  13. Looking at something on Aditi's bake service. Things may or may not get better...
  14. FWIW: we just had an escalated case resolve with the ISP being responsible. They were handing out router/modems with shoddy firmware. Trust but verify.
  15. Let me fix that Linux install for you: https://www.microsoft.com/en-us/d/windows-11-pro/dg7gmgf0d8h4 Agreed. I lost that battle for now.
  16. Possibly. The message is somewhat misleading as are many of our messages. There isn't a "done" message, viewer just continues on into unrelated areas and possibly fails in something having nothing to do with capabilities. Someone needs to dive into the viewer log file to find out what's really going on. 12046 is the other capability port (http:). It took 32 hops from Boston: ... 18 be-3212-pe12.910fifteenth.co.ibone.comcast.net (96.110.33.134) 71.089 ms be-3211-pe11.910fifteenth.co.ibone.comcast.net (96.110.33.118) 63.584 ms be-3412-pe12.910fifteenth.co.ibone.comcast.net (96.110.33.142) 63.681 ms 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * 108.166.240.9 (108.166.240.9) 99.394 ms 108.166.232.47 (108.166.232.47) 99.456 ms 29 * * 108.166.240.16 (108.166.240.16) 91.945 ms 30 108.166.232.32 (108.166.232.32) 97.143 ms 108.166.232.34 (108.166.232.34) 91.746 ms * 31 * * * 32 ec2-54-184-44-5.us-west-2.compute.amazonaws.com (54.184.44.5) <syn,ack> 93.140 ms 99.358 ms * monty@Monty-DellXPS:~$ Might take more than 50 in .eu and beyond.
  17. Some more hints: A better tracert. If you have WSL2, you can install 'traceroute' and get a better trace on windows. 'sudo tcptraceroute -m 50 <simhost hostname or IP> 12046' can trace all the way to a simhost. (So what if WSL2 is just a 2GB traceroute installation?) Test with a VPN. If you can access SL with a VPN but not without, that points to the ISP for fault. It should get more of their attention. Read the viewer log file. There are more clues in there. (Do NOT post it here - personal information is in there.) Support ticket.
  18. If you mean the 'login.agni...' link, then you are through one potential blocker. Given that this happens with only one account, the answer is the same as the other recent failure: likely an inventory issue. Start with a support ticket.
  19. This and the rest point in the direction of an inventory issue. Keep working with support and supply the requested information. (E.g. The SecondLife.log from a bad session.)
  20. One thing to keep in mind: content like meshes and textures do not come directly from Linden. These are supplied by a CDN with PoPs (Points-of-Presence) around the world. Not all of these perform as well as they should. And we have seen cases where an ISP attempts to hijack the CDN using DNS. They then point the CDN's DNS names to their 'optimized' caching system which is then found to be both buggy and slow. This hijacking problem can be avoided by using a more trusted DNS server (8.8.8.8, 1.1.1.1, etc.).
  21. Correct. This is sim->viewer only and none of the UDP activity. I hadn't enabled all the viewer logging so details of what is going on there are not always clear. In this test, this is right after login so the first EAC is implicit in the login payload. Most of this test is movement between two regions with frequent movement to the far end of 13000 beyond drawing distance. There was a third region involved but it wasn't local so I don't have packet data from it. I expected to see additional EAC messages for the two test regions but never did. So this message has more conditionals on it that I thought. *sigh* This is the first Region Crossing from 12035 to 13000. The destination region actually gets set up at or before packet 1138 (see note). The LLAgentCommunication object gets constructed when the viewer wants a child agent. The EAC heads to the viewer at 1532 which *is* a bit delayed. The region crossing then happens at 1975 using the same seed cap sent in the EAC message. The same seed cap will be used for 13000 throughout - 12035 will flop onto new seed caps more frequently than I expected. Part of this is because the test path takes the viewer far from 12035 allowing it to fall out of the draw distance whereas the viewer is kept in or near 13000 for the duration of the test. In this case, something interesting is suspected. Viewer navigated to the far edge of 13000 and hung out there until 12035 was removed from view. Then approached 12035 and crossed into it at 3698. Viewer appears to make a valid request to 12035 at 3729 but here it gets interesting. At 3943, viewer makes the next request to 12035 with an 'undef' ack value. This only happens if the LLEventPoll object was torn down and recreated in the viewer. The 12035 region does *not* take down it's end of the connection so it resends the id=16 payload with event 19 (AgentStatusUpdate). This is an example of that outer race condition being hit. 12035 seems okay and continues talking to the viewer. At 4491, viewer crosses back into 13000 and original seed cap is reused. We're going to cross back into 12035 but more interesting things happen. Packet 4635, I believe, is part of an abandoned request. Note that its end comes after the beginning of 5331. At 1:50:34 or so I think the viewer kicked off a coordinated teardown of 12035: both the viewer's LLEventPoll and the simulator's LLAgentCommunication get taken down then rebuilt forgetting history. This is seen at pkt 5331 where ack=0 and id=1 (both sides new). As part of that teardown, a new seed cap is generated for 12035 which is supplied in the CrossedRegion message in 4713. This request was on the wire for over 28 seconds getting very near both the viewer and simulator timeouts but seems to have made it given 5343's exchange. That teardown and rebuild happens again between 6576 and 7217. Then another asymmetric and anomalous teardown happens at 8831. For no obvious reason, viewer has torn down its LLEventPoll causing an 'ack=undef' request while simulator retains state. I see two preceding timed-out request (7550, 8103) , followed by a possible success (8447), followed by the anomaly and I think maybe the HTTP handling in the viewer has a hand in this. For the viewer, a new LLEventPoll coro needs to be created to handle it. I believe it binds the URL only at creation time (may not be correct on this). On the simulator side, it's complicated. There's an aggressive attempt to cache and reuse Cap sets as they're somewhat expensive to set up. But we've thwarted it here probably with active revocations. Addr and port remain the same, just new caps. And the old EventQueue is dropped on the floor. It should only happen as a side-effect of a viewer request (TP, RC) but the viewer doesn't have direct control. More API contract details to recover later.
  22. I've been doing some manual flow analysis and can share some of the data. These involve two regions running the new state machine-based code. Login is to region on port 12035 and then a series of region crossings with a region on 13000 occur with varoius delays. Packet capture is near simulator so isn't necessarily identical to a capture on the viewer end. But already tons of oddities such as viewer either deliberately recreating LLEventPoll instances or allowing stale requests to run and simulator taking down the viewer's seed cap and LLAgentCommunication endpoint for reasons not captured here.
  23. Haven't had time to run down protocol details so that they can be documented. Still a thing I want to see. I've been doing dev test of the new EventQueue logic, among other things. I can confirm that the outer race condition I talked about before (LLEventPoll destroyed in viewer, LLAgentCommunication retained in simulator) does, in fact, occur with unfortunate results. So I have some work ahead of me...
×
×
  • Create New...