Jump to content

20 years, still full crash on teleport


You are about to reply to a thread that has been inactive for 358 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

1 hour ago, Gabriele Graves said:

Yet these disconnections still happen.  Here's the kicker.  It has been a lot better in the past.  I remember a time a few years ago when LL were working on region crossings that it seemed from my experience that TP disconnects were a thing of the past and were no longer happening.  Gradually over time they've become more prevalent for me again.

I think it could be better.  It has been better.

+1

Not sure it is actually related, but it looks to me like TPs were way more reliable before the migration to AWS servers...

  • Like 5
Link to comment
Share on other sites

2 hours ago, Gabriele Graves said:

The viewer never crashes during TP for me.  However, it does fail to teleport maybe 1 in a dozen times and then disconnects me.  This is especially noticeable during weekend sales and it's very annoying.

I have a world-class quality gigabit fibre connection to the internet (for at least 5 years) which is incredibly reliable and I don't suffer from any significant packet loss according to the viewer stats ~0.1% even if logged in for hours and hours.

Yet these disconnections still happen.  Here's the kicker.  It has been a lot better in the past.  I remember a time a few years ago when LL were working on region crossings that it seemed from my experience that TP disconnects were a thing of the past and were no longer happening.  Gradually over time they've become more prevalent for me again.

I think it could be better.  It has been better.

I have to agree with a lot of your comments. I TP in SL and Open Sim. In OS, I don't experience as many disconnections currently, unless I TP on a hypergrid that's poorly maintained. Most of the major HGs in OS have slightly better uptime compared to SL, in my view.

The Kitely HG also uses Amazon's servers... their uptime is great there, very few issues and the devs and CEO there comment a lot on their forums about tech issues. (I have four HG accounts in Open Sim.) Of course OS has issues including disconnections, too... but a lot of it is politicking, don't get me started lol!

My connection for SL/OS currently is a 1 gig connection, part of it is coaxial but mostly fiber. +3 SLT/PT US.

  • Like 2
Link to comment
Share on other sites

I have definitely noticed it a lot during weekend shopping, it feels like you're going to get hit by it eventually if you TP enough. Not sure if its related to the number of teleports you make or just the numbers game of encountering it, it does feel like if I hop around a lot though it's always going to happen.

Not a huge bug by any means but definitely annoying and happens with regularity. It is annoying if I've messed with some unrigged attachment and forgot to detach/reattach after editing since inevitably relogging will reset these to their previous state.

 

  • Like 2
Link to comment
Share on other sites

On 8/9/2023 at 8:09 PM, Paul Hexem said:

I don't care what the error message is a result of. Adding a button to click after you "decide you had enough" of staring at a broken, unrecoverable viewer that as you say, can no longer function, that's just a crash with extra steps. 

It's the digital equivalent of saying "I meant to do that" after you trip and fall.

Although to be honest, that's all irrelevant anyway, like I said. There's really no defending this- The fact that this still happens after 20 years and the option is "quit" and not "reconnect" makes LL blatant liars when they say they care about user retention.

You may not care but for the Viewer developer it is a crucial distinction.

Crashing, as Henri very nicely explained means the application shut down unexpectedly due to the illegal execution of code, be it inaccessible, corrupted or otherwise. When this happens you do not properly log out from the server, this can lead to problems such as a faulty login if you restart the Viewer and reconnect (forcefully dropping your previous session), this in turn then will come up here for instance as another complain that the Viewer is buggy and faulty and logins do not work, inventories and friendlists don't load etc etc while leaving out the crucial bit of information that you previously crashed and did a quick-relog which is ill adviced as it can cause these things to happen. Another issue is that crashing will often lose you your settings and changes (although technically it shouldn't since settings are immediately written periodically, often times even immediately on change... so why the Viewer reverts it (and how) is beyond me...)

What pisses me off personally the most is when people say "crash" but they are talking about a disconect that is simply masked by an immediate crash on disconnect, making it look like a normal random crash to the user. It starts me into a frenzy of trying to find the problem and wasting my time trying to troubleshoot what i broke, when and how... just to turn out that the person was reffering to a disconnect, which just happened to result in an immediate crash due to illegal read/write attempts to now stale data. Even worse if they don't crash but still report it as a crash despite the Viewer very clearly telling them that it disconnected. Crashes on TP are 99% of the time simply disconnects during the teleport sequence coming from stale data still being used when it shouldn't.

When someone says "they are crashing" i ask them what they are doing and often the answer is "nothing, just looking around" which sounds like a nasty random crash, when i look at their logs it turns out its just a disconnect due to connection issues or a faulty login, which often also explain the additionally reported "staying gray" issues.

Additionally i also never get these random disconnect crashes, the Viewer shuts down perfectly fine for me and displays me the "Disconnected" screen when it happens making it impossible for me to take a look at these disconnect crashes, they could be caused by anything everywhere and its not realistic to expect them to have a debugging environment installed just for this.

On 8/8/2023 at 11:12 PM, Henri Beauchamp said:

I'm afraid you are mixing things together...

A disconnection is not a crash. A crash happens when the viewer executes an illegal instruction (e.g. accessing some data out of the allocated address space, or jumping/branching/”returning” to a faulty address containing junk/random memory, causing faulty opcodes to be executed). In this case, the viewer will just ”vanish” or the OS will report a crash and close it. If this happens to you, you should report the crash to the viewer developers (with the necessary data, i.e. the stack trace or crash dump, and the logs), so that the crash can be fixed; I personally never crash any more on TP with my viewer (and the rare times I did in the past, I fixed the corresponding bugs).

On the other hand, a TP may fail because the viewer failed to connect to the arrival sim server; the said disconnection could be the result of a bad network (lost packets), or a race condition (messages arriving out of order or too late to allow a proper connection sequence), or a failed handover between servers (for the same kind of causes: network or race issues). In this case, the viewer does not have any server left to speak with (not even the departing sim, which has already disconnected), and it will report a failure and present you with a grey screen, indicating it is disconnected; at this point, the validity of the data it got cannot be guaranteed any more, and the viewer does not ”know” what was the reason for the disconnection. A reconnection is therefore impossible, and the best course of action is to ask the user to restart a fresh viewer session and reconnect. True, it would be theoretically possible for the viewer to clean up all its memory and propose a reconnection from the login screen (like if you had relaunched it), but it would be complex to implement for a rare occurrence, and the resulting second session would be handicapped by side effects of the previous one (such as virtual address space fragmentation); it is much cleaner and safer to quit and restart the viewer for good.

The TP protocol in SL is however way too fragile (I already made a few suggestions in the Server User Group meetings to strengthen it). The lack of a proper handshaking protocol, and the fact the departing sim does not wait for the TP to complete (i.e. for the arrival sim to take over after a successful connection with the viewer) cause these cases when the viewer is ”left in the blue”, with no server to speak to !

I doubt they will ever listen. Anyone who played a couple online games could tell them that the way they handle teleports is incredibly fragile at best and i can't imagine it being that hard changing and fixing this, i mean come on, just have the client be connected with two regions until the teleport is done but i bet that would cause weird shenanigans like you popping up twice in friendlist or on both regions while transitioning, hahaha....ha... i shouldn't be laughing... that's probably what would happen.

  • Like 1
Link to comment
Share on other sites

The technical reason is that LL are still using the same login protocol where your login session is not handled by a central server but rather whatever simulator you're connected to. Thus, a hand over is not fail safe, and despite what people may claim about 'it hardly ever happens' this is because network conditions are vastly different around the world.

The political reason is that LL just hasn't had the motivation to fix it. What will mobilize them, I don't know. Maybe their mobile viewer will get them to finally consider making login sessions float independently of simulator connection - This would make sense since mobile phones roam and change IP as you move around between towers or wifi routers, which would currently cause you to lose your login session, whereas under a roaming login session you would remain logged in.

  • Like 4
Link to comment
Share on other sites

17 hours ago, NiranV Dean said:

Another issue is that crashing will often lose you your settings and changes (although technically it shouldn't since settings are immediately written periodically, often times even immediately on change... so why the Viewer reverts it (and how) is beyond me...)

For attachments, this is because any change to them is only committed to the inventory database when you either: detach them, TP away, cross a sim border, or logout cleanly.

Note that I won't complain the least about this state of affair, because it got the very nice corollary to permit ”undoing” any changes to your attachments when you edit them and badly mess-up; in this case, simply copy the attached item in your inventory and paste it (LL forbids copy of attachments in their own viewer, but AFAIK, all TPVs allow it): the pasted item will have all the changes ”reverted” (actually unsaved) and you can then detach and delete the crippled item, and wear the ”reverted” copy instead...

However, this means that in case of a real crash (and most often on spurious disconnections as well), your attachments state changes will be unsaved and you will find them back in the state they used to be on the last login, wearing, successful TP, or sim border crossing (whichever occurred last).

 

On spurious disconnection I however noticed that, on next login, you often find your avatar moved back to the last successful login/TP sim, instead of inside the sim where you got logged out: I would definitely consider this a bug in LL's avatar ”presence” management/tracking, and unlike the above ”attachment reversal bug/feature”, I find it an annoyance (especially if you changed your avatar's outfit in the mean time, and the latter does not match the maturity level of the sim where you get logged back in).

 

17 hours ago, NiranV Dean said:

which just happened to result in an immediate crash due to illegal read/write attempts to now stale data.

Well, a viewer should never crash on bogus/stale data reading, even during a login after a spurious disconnection... Even in this case, you should understand why the crash happened and fix it (which is, of course, much harder than for crashes happening in a simple, reproducible way)...

Over the 16 years I thoroughly polished LL's code to make it ”mine”, I took great care, in my viewer, to sanitize every place where such a bogus data processing could lead to a crash. My principle is that a ”release quality” software should simply never, ever crash: if something unexpected happens, you must, as a programmer, still find a safe and sane path to handle the issue and minimize the consequences (e.g., for the viewer, a render glitch would be acceptable, as long as there is a suitable warning logged, so that this ”unexpected” issue can be further diagnosed and dealt with properly, and also as long as the viewer does not crash but keeps ”happily” chugging along). If you are curious enough, try this command line on my viewer sources:

grep -ri Paranoia linden/indra/* | wc -l

This will currently report 513 matching sites for the release branch, and 542 for the experimental (PBR+EE) branch... These sites are all the places where I added a check to avoid a crash and provided a suitable fallback path for ”unexpected” situations, and for which I added a ”// Paranoia” comment to them (but I did not always added a comment, especially 5+ years ago, before I started systematizing those)...

Edited by Henri Beauchamp
  • Like 2
Link to comment
Share on other sites

8 hours ago, Henri Beauchamp said:

For attachments, this is because any change to them is only committed to the inventory database when you either: detach them, TP away, cross a sim border, or logout cleanly.

Note that I won't complain the least about this state of affair, because it got the very nice corollary to permit ”undoing” any changes to your attachments when you edit them and badly mess-up; in this case, simply copy the attached item in your inventory and paste it (LL forbids copy of attachments in their own viewer, but AFAIK, all TPVs allow it): the pasted item will have all the changes ”reverted” (actually unsaved) and you can then detach and delete the crippled item, and wear the ”reverted” copy instead...

However, this means that in case of a real crash (and most often on spurious disconnections as well), your attachments state changes will be unsaved and you will find them back in the state they used to be on the last login, wearing, successful TP, or sim border crossing (whichever occurred last).

 

On spurious disconnection I however noticed that, on next login, you often find your avatar moved back to the last successful login/TP sim, instead of inside the sim where you got logged out: I would definitely consider this a bug in LL's avatar ”presence” management/tracking, and unlike the above ”attachment reversal bug/feature”, I find it an annoyance (especially if you changed your avatar's outfit in the mean time, and the latter does not match the maturity level of the sim where you get logged back in).

 

Well, a viewer should never crash on bogus/stale data reading, even during a login after a spurious disconnection... Even in this case, you should understand why the crash happened and fix it (which is, of course, much harder than for crashes happening in a simple, reproducible way)...

Over the 16 years I thoroughly polished LL's code to make it ”mine”, I took great care, in my viewer, to sanitize every place where such a bogus data processing could lead to a crash. My principle is that a ”release quality” software should simply never, ever crash: if something unexpected happens, you must, as a programmer, still find a safe and sane path to handle the issue and minimize the consequences (e.g., for the viewer, a render glitch would be acceptable, as long as there is a suitable warning logged, so that this ”unexpected” issue can be further diagnosed and dealt with properly, and also as long as the viewer does not crash but keeps ”happily” chugging along). If you are curious enough, try this command line on my viewer sources:

grep -ri Paranoia linden/indra/* | wc -l

This will currently report 513 matching sites for the release branch, and 542 for the experimental (PBR+EE) branch... These sites are all the places where I added a check to avoid a crash and provided a suitable fallback path for ”unexpected” situations, and for which I added a ”// Paranoia” comment to them (but I did not always added a comment, especially 5+ years ago, before I started systematizing those)...

I probably didn't make it clear enough, i was talking about Viewer settings, not the famous attachment revert.

I've been adding what i believe are a lot of paranoid checks all over my stuff where things either already crashed once or could crash (based on other crashes i had, or simply due to copy-pasted code where i already added those), but this is solely in my own code (and i mean my own-own code, not altered stuff in the main Viewer, just my own stuff i wrote from scratch such as the Poser) but i've also usually commented them when i believe that they might be unnecessary (or were caused by some weird shenanigans, so people would understand why i'd put something weird there that doesn't look necessary)

  • Like 1
Link to comment
Share on other sites

1 hour ago, Paul Hexem said:

”Technically” not a crash, but when the only possible option is the viewer shutting down... That's a crash.

No, that's a disconnection (and the first dialog is not even that: you can still retry a TP, preferably from another sim since the failure is likely a server to server communication issue); the viewer is still running (normally in the first case, disconnected in the second).

Sorry, but it you make up your own dictionary, changing the meaning of the words, don't be surprised if people find your stance nonsensical.

Edited by Henri Beauchamp
Link to comment
Share on other sites

53 minutes ago, Henri Beauchamp said:

No, that's a disconnection (and the first dialog is not even that: you can still retry a TP, preferably from another sim since the failure is likely a server to server communication issue); the viewer is still running (normally in the first case, disconnected in the second).

Sorry, but it you make up your own dictionary, changing the meaning of the words, don't be surprised if people find your stance nonsensical.

Nonsensical is defending having to quit any time their servers lag or have issues.

  • Like 2
Link to comment
Share on other sites

27 minutes ago, Gabriele Graves said:

It's a testament to how crappy the LL viewer is.  Imagine if every time a web browser failed to connect you had to restart the browser.

You obviously not have the slightest idea about how a web browser and a viewer are programmed...

For a start, the first is non-real-time (and will retry connections as need be: but you still can get 404 or 502 errors, and no way to view the web page, having no other choice than closing the tab), the second is real time and must exchange data with the server so that everyone sees the same thing in the sim.

It's like comparing apples and cucumbers... Sure, they are both of the vegetal kind and are both comestible... 🤣

And no, it's not the viewer fault: it's a problem with the TP protocol itself (which lacks a proper handshaking between the departure sim server, the viewer and the arrival sim server).

Edited by Henri Beauchamp
Link to comment
Share on other sites

4 minutes ago, Henri Beauchamp said:

You obviously not have the slightest idea about how a web browser and a viewer are programmed...

For a start, the first is non-real-time (and will retry connections as need be), the second is real time and must exchange data with the server so that everyone sees the same thing in the sim.

It's like comparing apples and cucumbers... Sure, they are both of the vegetal kind and are both comestible... 🤣

And no, it's not the viewer fault: it's a problem with the TP protocol itself (and the lack of proper handshaking between the departure sim server, the viewer and the arrival sim server).

No I don't and as a user I don't care.  The LL viewer is the only piece of internet connected software that I have ever used that needs closing and restarting when a connection fails.  As a user, that's crappy.

Edited by Gabriele Graves
  • Like 2
Link to comment
Share on other sites

Just now, Henri Beauchamp said:

Just as crappy as closing your tab in your browser because the server returns a 404 or 502, or simply never finishes loading the page...

But I'm done with trying to explain to nonsensical persons... Have a nice day !

Nobody asked you to man-splain anything and insulting me and/or anyone else here will get your post reported.  Have a nice day yourself! :)

  • Like 3
  • Thanks 2
Link to comment
Share on other sites

53 minutes ago, Henri Beauchamp said:

You obviously not have the slightest idea about how a web browser and a viewer are programmed...

For a start, the first is non-real-time (and will retry connections as need be: but you still can get 404 or 502 errors, and no way to view the web page, having no other choice than closing the tab), the second is real time and must exchange data with the server so that everyone sees the same thing in the sim.

It's like comparing apples and cucumbers... Sure, they are both of the vegetal kind and are both comestible... 🤣

And no, it's not the viewer fault: it's a problem with the TP protocol itself (which lacks a proper handshaking between the departure sim server, the viewer and the arrival sim server).

404's don't require you to close the browser- you can just go to another web page. Which is what the viewers should do- can't teleport to a region, pick a different one. Or lost connection to current region, go to a different one.

There's really no way to defend telling residents to quit and go do something else any time there's the slightest flutter of the connection. I'm not sure why you're doing it.

  • Like 2
Link to comment
Share on other sites

  • Moles

Hey gang!

As a gentle reminder, this is a technical discussion about a topic that is a source of frustration for many people but can be difficult to explain and to understand. Spirited discussion and constructive disagreement are welcome, but please do your best to keep frustrations under control. 

  • Like 2
  • Thanks 2
Link to comment
Share on other sites

1 hour ago, Dyna Mole said:

Hey gang!

As a gentle reminder, this is a technical discussion about a topic that is a source of frustration for many people but can be difficult to explain and to understand. Spirited discussion and constructive disagreement are welcome, but please do your best to keep frustrations under control. 

Thank you! No need to "crash" the Forums!

Link to comment
Share on other sites

14 hours ago, Henri Beauchamp said:

and the first dialog is not even that: you can still retry a TP, preferably from another sim since the failure is likely a server to server communication issue

This has never worked for me. If I get that message it's game over, waiting for another 30-60s will bring up the "you have been disconnected" dialog, teleporting does not work once it has displayed.

 

Edited by AmeliaJ08
  • Like 3
Link to comment
Share on other sites

This is an authentic complaint.

Crashes on teleport still happen. They don't appear to be a pc/internet issue as they occur across several different types of build.

They appear to happen more frequently when draw distance is set above 128 and higher performance settings are selected.

It's disappointing that neither viewer creators or SL developers have nailed this down.

I can also see the possibility that this harms retention.

Ironically the person that jumped in to say that this was a 'you problem', didn't stick around to see the others reinforcing the OP's position.

Unironically, people lept at his profile picture rather than saying "Yeah, this sucks and happens to me too".

  • Like 2
  • Haha 1
Link to comment
Share on other sites

57 minutes ago, Unregistered said:

This is an authentic complaint.

Crashes on teleport still happen. They don't appear to be a pc/internet issue as they occur across several different types of build.

They appear to happen more frequently when draw distance is set above 128 and higher performance settings are selected.

It's disappointing that neither viewer creators or SL developers have nailed this down.

If you get genuine crashes on teleport (or likely shortly after a TP succeeded), please do report them to the developers of the viewer you are using (via their respective JIRA for LL's viewer or FS, their forum for Cool VL Viewer, their SL group, Discord, or whatever bug reporting channel they elected for other TPVs): yes, crashes ”on” (more likely ”after”) TPs may (and did) happen, but these can be fixed as long as they get reported together with the proper debugging info (stack trace or crash dump).

I myself fixed quite a few crashes after TPs for the Cool VL Viewer over the 17 years I developed it: there could still be a couple of crash bugs lingering around, and I'd be equally happy to squash them (I do love to squash bugs 😛), but I can't when I don't encounter them by myself and they are not reported by the people encountering them !

Edited by Henri Beauchamp
  • Like 2
Link to comment
Share on other sites

  • Lindens

I don't monitor the 'General' forum closely but noticed this while replying in another thread.  Speaking personally, I've enjoyed and agree with the complaints here.  I, too, find teleporting/region crossing appalling and it needs to change such that any failure would be surprising and not part of the 'SL Experience.'

But this is where we are.  Teleport is a confluence of many buggy systems swirling into a whirlpool of frustration and failure.  It's going to be very hard to get this asynchronous pile of state sorted but that's the task and we want to see this done.

I'll share a recent finding to illustrate what this looks like.  If you've looked into the code you may be familiar with the TeleportFinish message.  This is sent to the viewer which triggers it to move into the destination region.  The discovery is that this message is often queued but never delivered to the viewer leading the viewer to disconnect (yes, it's the viewer that disconnects) and display the quit/IM message.  The cause centers around the EventQueueGet cap and if you're a TPV developer who has sniffed around that and found a bad smell, your nose was not wrong.

This will not be the *one* cause of TP failures nor are TP failures the only result of it.  That's what making this work as everyone would like will require:  many fixes over time incrementally improving the result while avoiding damage elsewhere.

  • Like 3
  • Thanks 10
Link to comment
Share on other sites

3 hours ago, Monty Linden said:

The cause centers around the EventQueueGet cap and if you're a TPV developer who has sniffed around that and found a bad smell, your nose was not wrong.

More comments on that would be appreciated. I was just implementing event polling in Sharpview, but I don't have multiple regions working yet. So I have no hard data on teleports or region crossings at the message level. So far, my observation has been that I get 100% of SL event polls in serial number order starting with 1 with no numbers missing. That seems to work perfectly. (The Other Simulator sometimes skips event poll sequence numbers. Need to take that up with those devs.) That seems to be also true for UDP packets. I see occasional retransmits, but all reliable UDP messages do show up eventually. I don't think that packet loss or drop is the primary problem. Delays and re-ordering, though...

In-order delivery, reliable delivery, no head of line blocking - pick two. It is not possible to have all three. The event poller has the first two, reliable UDP has the last two. Event poller events and UDP messages are not synchronized with each other. So everything that handles messages has to be prepared for the unlikely event that something arrives in a non-usual sequence. If there's network delay or packet loss, out of order arrivals become more likely. I suspect, but have no definitive information, that network effects on teleports and region crossings have more to do with arrival order than packet loss.

For teleports and region crossings, more than one region is involved, regions are unsynchronized, and each region has its own message data paths. Those region servers are talking to each other, too. The complicates the situation.

I've written before about the race condition at region connection that causes some objects not to be sent to the viewer, an "interest list bug". The problem is that the viewer has to tell the simulator where the camera is before the simulator tells the viewer where the avatar is. This results in a sudden jump in camera position at login and teleport, which seems to be able to mess up what object updates get sent. I discovered that adding a 2-second delay before asking the simulator to start sending objects made that problem disappear. That's not a fix, that's just a demonstration that there's a race condition.

A state machine diagram, or diagrams, would be a big help here. Then the code could check for event A coming in while in state B, where that's not an allowed event in that state. This is the kind of problem where you need formalism to get it right.

This stuff is intensely boring, but essential to correct operation. It's the sort of thing where it's hard to convince management that considerable formal design effort is required to fix things which "seem like simple bugs". I'm glad it's getting attention.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  • Lindens
5 minutes ago, animats said:

A state machine diagram, or diagrams, would be a big help here. Then the code could check for event A coming in while in state B, where that's not an allowed event in that state. This is the kind of problem where you need formalism to get it right.

Hahaha.  To try to fix the event-get issue I had to lay out a solution as a state machine diagram.  I hope to share these as part of the wiki documentation at a future point.  (More later...)

  • Like 1
  • Thanks 5
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 358 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...