Jump to content

Degraded Performance since AWS move.


You are about to reply to a thread that has been inactive for 1221 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

Ok, so I realise that not everything is full finished or sorted out when it comes to the cloud move, but I am literally at my wit's end trying to deal with a multiple of issues that have cropped up over the past few months.

I'll try to be brief with this ;)

So, several months ago, prior to the cloud uplift, but when LL announced that they had started to change some code for better compatibility (there is a post somewhere from LL about this), I started to have issues with vehicles in SL. My main pastime in SL is exploring the grid using vehicles - planes, cars, boats etc, and in general terms, the moment LL started changing this code I've had some major issues with about 90% of my vehicle inventory.

Issues range from planes thinking they've just landed when crossing a region, animated rotors and propellers stopping at regions crossings, boats switching off at region crossings, and numerous script errors relating to animations on script crossings. Whilst the region crossing time itself has seemed to improve, the sheer number of issues with most of my inventory has made any kind of travel really frustrating.

Now, I'm not the only one having these issues, multiple people across several travel groups I am in all report the same problems, and there's currently at least two Jira's relating to these problems, both currently unresolved. One Jira is here: https://jira.secondlife.com/browse/BUG-229214 

The only piece of concrete info from these Jira's so far is an indication that these vehicle problems seem to be worse for people who have a higher ping latency. So that's suggests all is fine if you're in the USA, but the further away you get, the more problems you'll have. (I am a long way away from the SL servers and have a ping latency of between 180-230 m/s normally).

However, i've struggled along as best I could until more recently with the final uplift of all regions to the cloud - and now there are even more issues coming to the surface.

I am getting terrible drops in FPS whenever my viewer has to rez more than basically empty land. Typically, where I have a home in Bellesseria, my FPS would never drop below about 30, even travelling through crowded regions, now, it's often under 7 FPS. It's gotten to the point that I can no longer use my Premium home. This seems to be occurring all across the grid too - FPS is fine in an empty area but becomes terrible when i have to rez numerous objects. My FPS has never dropped like this in the past.

Secondly, assets, mainly mesh, is simply not rezzing at all in many cases. I am now running into walls, houses, literally all manner of items that simply don't show up at all. I know they are there because when I try and edit them, the mesh wireframe shows up (or at least a wireframe box) - but sometimes I have to wait for 10 minutes before these objects "show up"

Another issue is what appears to me to be my viewer (Firestorm), simply forgetting what's in the texture cache. I can travel to a region, have everything rezzed in, then move into the next region for say 1 minute.. when I then go back to the first region, almost everything is unrezzed and full of grey or blurry textures, as thought none of these items have been cached. I am actually getting MORE lag and rez issues the second time I go to a place compared to to the first time - which is frankly bizarre.

The latest issue is now my ping latency is starting to creep up substantially. Since the last region restarts, my ping time on average is at least 100 m/s higher, and quite often it is far worse than that. Again, others in groups I am in have also reported the same issues recently.

As it stands, right now I can do little in SL other than stand in a skybox, all of the regular things I enjoy are nothing but an exercise in frustration. I can only add that my computer hasn't changed, I've repeatedly tested my internet connection and it's as good as it's ever been. I've already had quite a few people offer help and suggestions, tried them all, nothing is making the slightest bit of difference. Most suggestions revolve around making minor changes that don't appear to make a huge difference, but my SL performance has degraded by such a degree that it logically has to be something really major that's impacting me, whether it be from the cloud uplift or elsewhere.

Does anyone have an ideas about what could cause a huge degradation in performance that I could resolve at my end?

 

 

 

  • Like 2
  • Thanks 3
Link to comment
Share on other sites

I am experiencing similar troubles.  Everything is fine if all I do is buy mesh clothes and try them on :(  I've got rid of all my mainland, it was unusable, and I am seriously considering if I need Premium, my houseboat and contents take so long to rez when I log in...

  • Like 3
Link to comment
Share on other sites

Yep, I know I am not the only having these issues. To add to my opening post, I remember two other issues too. The first is that when out exploring, even with a healthy draw distance of say 200m, the objects in the next region don't start to rezz in until i am almost at the region crossing. With a longer draw distance they should be rezzing in as soon as they are within that field of view. Even linden roads don't appear until I am almost on top of them, and again, I'm hearing the same from others.

The final issue, which relates to items not rezzing, is that often, once I cross into another region, the vehicle I am using seems to disappear completely, or at least some of it does. It still works, but it's invisible , or at least partly. I have to toggle wireframe mode to restore it, though often it will become visible on the next crossing anyway.

  • Like 1
Link to comment
Share on other sites

11 hours ago, Eowyn Southmoor said:

Does anyone have an ideas about what could cause a huge degradation in performance that I could resolve at my end?

The higher ping times are probably the source of your vehicle problems. It remains to be seen how much we can improve those - it's something we will be working on very soon. 

Your FPS and rezzing problems sound as though they are actually viewer issues unrelated to the uplift per se, unless you're also experiencing high packet loss (the stat immediately under Ping Time in the [official] stats floater)...

Link to comment
Share on other sites

4 hours ago, Eowyn Southmoor said:

The final issue, which relates to items not rezzing, is that often, once I cross into another region, the vehicle I am using seems to disappear completely, or at least some of it does. It still works, but it's invisible, or at least partly.

vehiclepartialloss.thumb.jpg.269ac779e7997d5ce356d89c1ca1c203.jpg

Yes, that's a problem.

Reproduced by adding 300ms of delay to the network connection and driving a demo car from Manji Automotive Factory in Burns around Robin Loop. The vehicle did not recover, even after about 8 more region crossings.

Latest release of Firestorm, 6.4.12, 64-bit, on Ubuntu Linux 18.04 LTS.

 

Edited by animats
  • Like 2
Link to comment
Share on other sites

12 hours ago, Eowyn Southmoor said:

Issues range from planes thinking they've just landed when crossing a region, animated rotors and propellers stopping at regions crossings, boats switching off at region crossings, and numerous script errors relating to animations on script crossings. Whilst the region crossing time itself has seemed to improve, the sheer number of issues with most of my inventory has made any kind of travel really frustrating.

Yes. My own vehicles, which have extensive workarounds in their LSL programming to recover from those issues, are doing fine. I can take one of my bikes around Robin Loop at 60kph with a forced 300ms network lag without problems. (Normally I'll drive that at 100kph, but with 300ms lag, it's too hard to stay on the road.) I've discussed the technical issues around this before, so I won't repeat that.

Most of the serious problems are related to double region crossings. SL needs code sim-side to totally prevent a second region crossing starting until the previous one has been completed.  If you start a second crossing before the first is 100% complete, something will break. Vehicles and drivers can try to prevent this by slowing down, but it will only be airtight if done sim-side. Users and scripts cannot always stop soon enough, and cannot always tell if the crossing is 100% complete. I've made this point for years now, filed JIRAs, made videos demonstrating how to reproduce the problem, and brought it up at many Server User Group meetings. I hate to have to keep harping on this, but read the messages above. This is losing SL users outside the US.

SL will hold together reasonably well with 300ms network delay, except for double region crossings. It's a bit sluggish, but not that bad unless you're driving fast.

There are two region corners on Robin Loop roads in Heterocera, so it's a good place to test.

  • Like 1
Link to comment
Share on other sites

I have noticed slower rezz times or sometimes things just not rezzing at all. The one thing I haven't noticed was any improvement in overall sim FPS, I do alot of weddings with 'everything on' loaded with avis, and FPS is not any different than Linden servers was or AWS. But like the days of yore, some sims perform super good, some not so much. I always notice sims that perform good with good FPS rates crank the GPU - and other sims with low FPS rates the GPU is mostly idle. I guess it was too much to hope for that the move to the AWS clould would improve performance at all

  • Like 2
  • Haha 1
Link to comment
Share on other sites

Regarding the ping time, this seems to be something that anyone discussing these issues with me seems to either gloss over or not get my point. Due to my RL location, I will never have a "good" ping time - around 180 ms is about as good as it's ever gotten.

However, I want to make it totally clear, that even with a "normal" ping of around 200 ms - I've never had the slightest issue with region crossings or vehicles behaving oddly during a crossing. The double-crossing issue also has rarely effected me, because I choose the best spot to cross to minimise that, and I slow right down. I can confirm that during 2019, I averaged a whopping 50 hours of travel time in between crossing issues or unseats, showing just how trouble-free m experience usually is.

So whilst it's all well and good saying that high ping times are responsible for vehicle issues, this makes no logical sense, because in that case I should've been experiencing these issues for years, which I clearly haven't. I also don't think that I've just been "lucky" either with region crossing performance in the past - not when my SL experience over the past 4 years has been statistically very consistent and reliable.

As I already mentioned in the OP, the first vehicle problems commenced PRIOR to the uplift, as a direct result of code changes made supposedly to improve region compatibility with the cloud. Others experienced the same issues, and that is why I opened the Jira. These problems still continue now, 6 months after I first noticed them.

 

  • Like 2
Link to comment
Share on other sites

On 12/18/2020 at 6:45 PM, Oz Linden said:

The higher ping times are probably the source of your vehicle problems. It remains to be seen how much we can improve those - it's something we will be working on very soon. 

Your FPS and rezzing problems sound as though they are actually viewer issues unrelated to the uplift per se, unless you're also experiencing high packet loss (the stat immediately under Ping Time in the [official] stats floater)...

 

2 hours ago, Eowyn Southmoor said:

Regarding the ping time, this seems to be something that anyone discussing these issues with me seems to either gloss over or not get my point. Due to my RL location, I will never have a "good" ping time - around 180 ms is about as good as it's ever gotten.

However, I want to make it totally clear, that even with a "normal" ping of around 200 ms - I've never had the slightest issue with region crossings or vehicles behaving oddly during a crossing. The double-crossing issue also has rarely effected me, because I choose the best spot to cross to minimise that, and I slow right down. I can confirm that during 2019, I averaged a whopping 50 hours of travel time in between crossing issues or unseats, showing just how trouble-free m experience usually is.

 

My experiences since August are pretty much the same as Eowyn's, and the approach to blame only the slightly increased ping times for the vehicle issues on crossings is highly odd to me as well.

I can say, while many people used to complain about region crossing issues all the time before this August, and people were always asking for help, advice "to reduce lag on sim crossings", etc, I have never had serious issues with crossings, other than the slightly annoying long interpolation times. I got used to that, though, and with normal region and grid (network) performance, crossings only rarely failed for me.

When the code changes began in August to "improve" region crossings... um,  no, I know, it was to prepare the Uplift, obviously not to make something to the benefit of the vehicle-user communities - something that a lot of us awaited, nevertheless - one thing for sure, region crossing time reduced so much, that with normal operation of two adjacent regions, it was already like there was no region crossing at all - well, except one problem: the errors thrown by the vehicles, or simply stopping to act on control inputs, etc. All the things that Eowyn described above, I have noticed as well.

I have noticed there were several adjustments made, between August and November, which affected the region crossing behavior direclty or indirectly, sometimes they got overall faster or slower (slower, like, feeling like a step back to the original behavior, with long crossing and interpolation times). In its latest state, since the simulator servers got uplifted as well, region crossing time, with normal performance of the two adjacent regions, is insignificant (meaning: wonderful). Now if you pay attention to the words I outline in bold, the point is: Region crossings improved a lot in a way that they are much-much faster, so their time aspect is the improvement. The quality of the crossings, however, changed in a less reassuring... rather, worrifying way.

One aspect of the quality of the crossing is that you won't get disconnected or stuck upon attempting the crossing. I dare to say, this has improved, overall. Heavily scripted, complex-linkset vehicles used to contribute to failed crossings before, now, the overall experience is that these heavy vehicles can cross much better than originally. Better, as faster, and with less risk of a failed crossing. However, while like Eowyn, I have never really had issues with "corner crossings" where roads or narrow waterways were fabulously built onto region corners, now I really have to slow down before the first crossing, stop in the middle, and slowly proceed to the second crossing, and it is still not a life insurance. However, I don't think it is a standalone issue. It is because of the "out-of-sync" issue we are experiencing.

Another aspect of the quality of the crossing is that you would arrive in the next region with your vehicle, sitting and positioned on your vehicle, in the exact same state as you were in the previous region. Now this is what degraded badly, beginning in August, well before the simulator server uplift and the overall ping time increase. Also, the ping times didn't increase drastically. While I'm in Europe, and I used to have 180-200ms Ping Sim values before the Uplift, now I get 200-240ms. It is a disappointment for sure, but doesn't account for the issues we have.

My observation since the beginning, is that objects previously took a long time to proceed from the previous region to the next one, experienced an extensive delay in their script events, in the meantime, the receiving simulator server would already recognize the operating (seated) avatar in the receiving region as well (and sitting on the object), and the permissions that are triggered by being seated on the object, could be triggered (control inputs, camera, animations, most importantly).

My observations since August (after the code changes, but with still the original simulator server hosts and lower ping values) are that objects now transfer from the previous region to the next one much faster, thus their script events have virtually no delay. This results in the script almost instantly looking for the operating avatar sitting on the object, to trigger permissions, but the avatar, in fact, has not yet been recognized by the receiving simulator server as sitting on the object - what's more, as I suspect, in some cases the avatar has not even been recognized as being in the receiving region: this is just about 20% of the occasions, but there are script errors stating that the avatar is not in the same region, hence it failed to set permissions. In line with this, I have also noticed that the viewer struggles to get the "sitting" status of an avatar in the Radar function, if the avatar is not in the same region as you, and, after you move into the same region (or they do), the Radar still fails to show them as sitting - while they are in view range, and you can see them sitting in a boat, for example - which might or might not update within 1 minute after both of you arrived in the same region.

My further observation directly related to the "out-of-sync" error is, that vehicles whose root is behind the sitting position of the operating avatar, have a better chance to avoid producing errors and loss of control and other permissions, while those that feature a sitting position forward to the root, are highly prone to producing errors on every region crossing. I am not sure though, whether this should really affect the transition of the vehicle and the avatar in a way that it would be significant, but it seems so.

On the other hand, since all simulator servers got uplifted, and are basically running on the same server version, I have noticed better region crossing quality, as well as time. The vehicles I use that were less prone to the "out-of-sync" errors produce even less errors, but unfortunately the more prone vehicles remained as unusable as before. Crossing time, and being able to cross without getting weird force disconnections and other ugliness, have improved in a stable way, as much as I noticed. This is another point I don't believe a slight increase in ping time would account for the major issues with vehicles.

Just to address the other issue as well, yes, as several others, I have noticed the same issues since the simulator server uplift: "very slow asset loading", "object and texture cache apparently failing, caused by something not on client-side", more severe FPS drops, along with weird stuttering if trying to move while not everything has been downloaded and rendered yet on scene. This is, however, an entirely different issue, as I've seen, and it really seems to be related to the higher ping times and other network-related issues.

As a conclusion, I think it is important to keep looking into each of these issues thoroughly, one by one, to figure out what exactly causes them, and especially with the "out-of-sync" issues, to find a way that would somewhat restore the behavior of script events and avatar transfer to a synchronized manner on the region crossings, because expecting the creators to find workarounds to compensate for the issue (well, or silently leaving it to them), is not the solution. Many vehicles would be perfectly usable without these errors, while their creators are no longer active in SL, so without a fix for the overall issue on LL's side, those products would remain impaired and unusable forever. @Oz Linden Can you please try to look into these suspicions? We have been talking about the same in previous topics, in JIRA reports, etc, and it seems like these suggestions bounced off, while it is just becoming more and more consistent what we are experiencing is due to the above reasons, and not simply because of the ping time increase itself. It would be necessary to sort this out, because despite the improved crossing time and stability we have been waiting for, this is now two steps back first, then one step forward only, if we still cannot use our vehicles properly, despite all efforts made so far.

  • Like 1
Link to comment
Share on other sites

48 minutes ago, AlettaMondragon said:

Can you please try to look into these suspicions? We have been talking about the same in previous topics, in JIRA reports, etc, and it seems like these suggestions bounced off, while it is just becoming more and more consistent what we are experiencing is due to the above reasons, and not simply because of the ping time increase itself. It would be necessary to sort this out, because despite the improved crossing time and stability we have been waiting for, this is now two steps back first, then one step forward only, if we still cannot use our vehicles properly, despite all efforts made so far.

Your note is a pretty good summary of the current situation, and aligns with the data we have about region crossings.

To be clear ... we didn't actually set out to improve object region crossing times (it had been on our list, but we had not gotten to it), but as part of moving the simulators we discovered a serious flaw in how it worked and the fix made the performance of object crossings much better. This has had the unfortunate effect of reversing the usual crossing order - it used to be that the avatar usually beat the vehicle to the region, and now it sometimes comes out the other way. Add the effects of some changes that were needed to the script scheduler for other reasons, and what we've got is a considerably different environment than vehicle scripts used to have. Not all of them are adversely affected, but it's certainly true that some are.

All but the last few things have been moved, and while there's always cleanup work to do, we do expect to be able to spend some time early next year on performance issues, including some that we believe should make a positive difference to the problems you describe.

As for the object rezzing issues, we've recently added new metrics in the viewer that are reported back to our statistics gathering system that we will try to use to understand why some users are affected and not others.

  • Like 2
  • Thanks 3
Link to comment
Share on other sites

@Oz Linden One aspect of the move to AWS cloud servers has been a substantial increase in pings to each region...in the order of 10-25% dependent on the part of Europe the user is located.  In ordinary day-to-day activities this has a negligible effect on in-world activities.

However in any and all competitive activities those outside the USA have long been at a significant disadvantage due to increased response times...this has now been magnified by the increase in ping.  That surprised me and I must assume that the routing to the simulator server now has more steps to it, since the geographical change is insignificant.  It sicncerely hope that LL will put significant effort into ameliorating this historical disadvantage suffered by non-US users.

Link to comment
Share on other sites

The topology of the Internet is not nearly geographical.  It is weird.  It changes often.  Many recent changes seem to have been detrimental to performance which often means they were cost driven.  From my location I have three 10 gigabit connections to large peer networks and distribute those around the state on a 100 gigabit network.  It is not unusual to have to make traffic engineering changes to resolve user issues with a specific destination because a peer is using a peer that is doing something weird.  Unfortunately for the end users they must engage the network operators when this happens and we often have trouble understanding their reports.  In the case of Second Life and Linden Lab, they are end users with almost no visibility into the peering and transport systems.  ICMP ECHO is low priority traffic and MAY be discarded in transit and MAY be delayed or discarded by the target.  The same goes for “traceroute” probes.  Not all elements in the transport systems are IP hops.  Many are carrying IP over MPLS or some other encapsulation and this will not show as a hop on traceroute output.  I do think the Second Life Viewer still uses a lot of UDP packets for application data.  I have been seeing that some carriers are discarding UDP to avoid congestion and slowdown of TCP flows.  When I discover this I am usually troubleshooting an end user complaint.  The solutions available are to TCP encapsulate the UDP flows or change routing to avoid the lame carrier.  Ideally the UDP traffic would not be impeded.  UDP is selected for discard during near-congestion conditions because the vast majority of UDP traffic on the Internet is malicious in nature or being generated by peer-to-peer file sharing protocols that were originally designed to defeat now obsolete traffic management methods.

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 1221 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...