Jump to content
Sign in to follow this  
Toysoldier Thor

Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Recommended Posts

bytes 2.jpg              bytes 5.jpg

 

I'll leave these here for you to analyze. 

On the left, I was crashing like crazy.  Finally gave up after the 5th crash.

On the right, I went back later, had no problems.

Thanks

Perrie

 

Share this post


Link to post
Share on other sites

Perrie, the sim looks quite happy in both of those screenshots. The only real difference being that there were a lot more avatars on the sim when you were crashing. Assuming others on the sim weren't having the same problem, I'd have to suspect your viewer was (ungracefully) hitting some limitation in your machine configuration. Avatars are generally the most complex things a viewer has to render in a scene, so if it's close to the edge, a crowd is likely to push it over. Anyway, this seems strictly a viewer problem; if you're still using Firestorm, you might want to raise it with those developers to see if they have specific debug tips or tweaks to suggest.

Share this post


Link to post
Share on other sites


Qie Niangao wrote:

Perrie, the sim looks quite happy in both of those screenshots. The only real difference being that there were a lot more avatars on the sim when you were crashing. Assuming others on the sim weren't having the same problem, I'd have to suspect your viewer was (ungracefully) hitting some limitation in your machine configuration. Avatars are generally the most complex things a viewer has to render in a scene, so if it's close to the edge, a crowd is likely to push it over. Anyway, this seems strictly a viewer problem; if you're still using Firestorm, you might want to raise it with those developers to see if they have specific debug tips or tweaks to suggest.

Thank you very much Qie.  I never used to have this much trouble there and it started for me about the same time Toysoldier started posting about this trouble.  So while I know correlation does not equal causation, I thought it might be a possibility.  I have even done a clean Install.

Back to the drawing board.................

 

Share this post


Link to post
Share on other sites

yeah those two stats charts of you crashing and you being healthy are both showing that the sim is not under any stress in either situation.  The sim was busier when you were crashing and almost 100% more avatars but the sim was not having trouble dealing with it.


So Qie is very likely correct that a factor related to your localized config/connection to the SL / sim.  Could be a threshold of avatars crushing your viewer setup/resources.  Maybe even because of changes of the sim code like pathfinder that has pushed your config over the edge - hence why it started happening around the same time as these major sim wide crashes have started.  dont know.

Did you ask others that frequent the sim if they have noticed themselves crashing on this specific sim like you?  If so, maybe its a frequent target for griefers?  Maybe they hid something on the sim?  But then others would also notice what you did?

Just some thoughts but its not the same problem we are generally looking at here from the stats.

Share this post


Link to post
Share on other sites


Toysoldier Thor wrote:

yeah those two stats charts of you crashing and you being healthy are both showing that the sim is not under any stress in either situation.  The sim was busier when you were crashing and almost 100% more avatars but the sim was not having trouble dealing with it.

 

So Qie is very likely correct that a factor related to your localized config/connection to the SL / sim.  Could be a threshold of avatars crushing your viewer setup/resources.  Maybe even because of changes of the sim code like pathfinder that has pushed your config over the edge - hence why it started happening around the same time as these major sim wide crashes have started.  dont know.

Did you ask others that frequent the sim if they have noticed themselves crashing on this specific sim like you?  If so, maybe its a frequent target for griefers?  Maybe they hid something on the sim?  But then others would also notice what you did?

Just some thoughts but its not the same problem we are generally looking at here from the stats.

Again, thanks for the replies.  The first night it happenned to me there were quite a few others crashing.  But now it appears I am the only one with a constant probem.  I feel like the odd person out sometimes.  Ever since the introduction of Mesh I have had all kinds of problems.  When I use a Mesh viewer I take a 70 to 80% performance hit.  So most of my time in SL I still use Firestorm Beta.  There was a rather lengthy JIRA about my issue which Riuniti(sp?) Linden took a serious look at but it did not result in a fix.

 

Share this post


Link to post
Share on other sites

Hi everyone, I don't usually post in forums because quite frankly very little gets solved by complaining and there is usually far to much misdirection intentional or unintentional. In a sense many of you that posted here are correct so I am going to be blunt. Anyone that has been online and played online games has probably witnessed anomalies from time to time. The problem in SL at the moment is simply that for what ever reason the servers or connections to the servers are no longer adequate. It could be new content, cramming to many sims on each server or even trouble makers who knows? I do know the problem seems to become progressively worse with the first maintenance of each month to the extent that simple tiny boats can't cross sims lines where once large yachts moved effortlessly. If anyone at LL is listening I suggest you get on it and buy the equipment if that is what is required. I play some very dynamic online games and the ones I no longer play are gone because excessive lag killed them and the money stopped. Thank you for reading everyone. TristaMay.

Share this post


Link to post
Share on other sites


TristaMay wrote:

Hi everyone, I don't usually post in forums because quite frankly very little gets solved by complaining and there is usually far to much misdirection intentional or unintentional. In a sense many of you that posted here are correct so I am going to be blunt. Anyone that has been online and played online games has probably witnessed anomalies from time to time.
The problem in SL at the moment is simply that for what ever reason the servers or connections to the servers are no longer adequate. It could be new content, cramming to many sims on each server or even trouble makers who knows? I do know the problem seems to become progressively worse with the first maintenance of each month to the extent that simple tiny boats can't cross sims lines where once large yachts moved effortlessly. If anyone at LL is listening I suggest you get on it and buy the equipment if that is what is required. I play some very dynamic online games and the ones I no longer play are gone because excessive lag killed them
and the money stopped. Thank you for reading everyone. TristaMay.

The elephant in the room that everyone appears to avoid is called MESH. I hope the demands of a few creators for something that has zero user-side benefit was worth destroying the grid for. Pathfinder (part of mesh) was another instant downgrade. They've also probably changed the rezzing priorities to accomodate mesh. Do not believe anyone that tells you that mesh does not cause AWFUL lag because it does - for everyone.

There was once another laggy mesh world called Blue Mars. How did that work out?

(Waits for the shills to attack like hyenas.)

Share this post


Link to post
Share on other sites

Actually the elephant in the room is a combination of a) computers that are barely adequate for Second Life, b) good computers with poor setups for Second Life, and c) poor network connections either on the resident's end, LL's end or somewhere in between. Blaming recent additions to SL like mesh has always been fashionable ever since people claimed changing from texture hair to flexi hair and system skirts to flexi skirts was going to ruin SL for everyone.

Share this post


Link to post
Share on other sites

I put in a land performance ticket a couple of weeks ago because of this exact issue.

I have a full sim which is mainly my sculpt store and building grid. I was informed that there is an "open bug" affecting all sims at random with this kind of sim failure.

I have had a few LL restarts in order to get the sim back on the map. So far they don't show any solutions, but do recognize that there is a problem.

Thane Woodford

 

Share this post


Link to post
Share on other sites

I have information about the Lindens having problems in the Content & Mesh UG and the Server-Scripting UG meetings on my blog. it will post Wednesday morning 12/12.

The problem was worst in Borrowdale. We were dropping out from slow connections as best I could tell. After loggnig back in, twice, we moved to Grasmere. It was not as bad, but went into the same performance problems.

In the Server Scripting group in Denby, we saw a similar problem but it never got so bad that we dropped out. 47 avatars present. About 40 minutes into the meeting it started to clear up. By the end of the meeting the region was running fine.

I brought up this thread at each meeting. Nyx is not really into server problems. Andrew has other thinks on his mind. I'll see if it happens in Thursday's meeting when Simon is there. He seemed to have the most intreest in the problem.

 

Share this post


Link to post
Share on other sites

The two lag-to-death crashes at Bay City's tree lighting on Saturday (North Channel) were similar. Again, around 40 people in the sim, and Network Time skyrocketing, along with Unacked Bytes.

One thing I didn't expect to see (and should have taken a screenshot, but didn't) was the time components summing to a much larger number than the reported total frame time. When the crashes were imminent, the Network time would often match the reported total Frame time (around 8000 msec), but Simulation and Agent and sometimes Images would also each have more than half the total frame time, all for the same frame. I have no idea what this means; it must be some reporting glitch, amplified by the extreme conditions.

I was also surprised to see, in the worst of it, very rare frames with almost reasonable times. These did not indicate recovery; they'd be followed by many frames of deep dilation before maybe another "good" frame -- or a crash. (Not surprisingly, the unacked bytes remained at the same high level -- often over 2MB -- for the "good" frames as well as the rest.)

In the midst of it, I remembered that seeing high Network times does not necessarily imply a network cause for the crashes. That may be the case, but watching the Unacked Bytes climbing steadily, it's clear that those would just keep accumulating as long as the sim was stuck doing anything, as long as it could still count those unacked bytes in the buffers. Similarly, if processing reported as Network time has the highest priority (likely true), it may always be the current activity for reporting for all that "stuck" time, regardless of what's actually causing the stuck state.

Share this post


Link to post
Share on other sites

Hey Qie!


In a way I am glad you witnessed one of these instant sim crashes if only to see how it progresses.  I have pretty much given up that LL is really looking into the root cause to the issue.  I personally think its because the root cause is the pathfinder code and they dont have a solution for it they do not want to remove it.


Sadly, because Andrew cancelled the Friday 4pm SLT meetings - I am no longer able to attend the LL meetings so that I could bring this topic up.  So its just another one of the grid problems that is being forgotten as LL focuses on releasing new code to introduce shiny new features.

Its just like LL is still ignoring the major bug that all V3 viewers have with Notice Attachments going frozen and un-usable (even if clicked from the Group Notice window).  This is a MAJOR ANNOYANCE and yet after all these month there has been no resolution to this problem.  And since we no longer have a Resident JIRA to bring  attention to these major grid problems, LL is allowed to just ignore these problems.

Its really become a very frustration situation.

Share this post


Link to post
Share on other sites

I'm kind of glad I saw it first-hand, too, finally. I'd not been bringing it up at the Tuesday server meetings because I wasn't sure it was still happening, until I experienced those Bay City crashes. (With very rare exceptions, I can't attend the merged Thursday meetings on Aditi.)

One thing I wish I knew is whether there's any hope of relief as a result of a change deployed this week to the RC channels, about which Inara Pey reported:

Physics Memory / Region Performance

As reported 
, the physics memory issues affecting some regions, 
, had been tracked down by Simon Linden to a Havok issue related to navmesh rebakes. His fix for this problem cleared QA and forms a part of the RC deployments for the 12th December, together with
a fix for a low-level threading problem within the simulator code which has also been causing region crashes
.

[emphasis mine]

The release notes offer no details -- they're no longer allowed to even reference the castrated jira, so there's no guessing what the unspecified crash fixes are supposed to address. The thing is, as I hinted in my previous post, I'm now thinking that the inflated network statistics may not necessarily implicate a network cause to the problem. If I click my heels together enough times, I can make myself believe that network events might be just the only ones that get counted while the rest of frame processing is stuck in some threadlock. (That wouldn't by itself explain the inflated network time, but after I saw that the component times can sum to much more than the reported total frame time, I realized I had little understanding of how those time metrics are really measured.)

Anyway, the point is, it may be useful to know whether the same problem still occurs on RC channel sims this week, and if not, whether the problem occurs at all anymore once the main channel gets updated with that code.

Grasping at straws here, I know.

Share this post


Link to post
Share on other sites


Qie Niangao wrote:

The release notes offer no details -- they're no longer allowed to even
reference
the castrated jira, so there's no guessing what the unspecified crash fixes are supposed to address.

The server notes do very occasionally reference a JIRA number but not enough.  From Beta Viewer 3.4.3 they seem to have started adding the JIRA numbers back in to the Viewer release notes - but only the 'MAINT' numbers rather than the original 'BUG' s etc. initially reporting.  LL still have a long way to go to make it worthwhile again, but at least it is progress.

Share this post


Link to post
Share on other sites

That's good to know, but I have to say: Much too little, much too late.

This is off-topic for the thread, I realize, but I've lost patience with Rod Humble getting a grasp on the gross incompetence of Lab management. I'm coming to the conclusion that he is now just another part of the problem.

Share this post


Link to post
Share on other sites

Yes Qie, sadly it has become very clear that Rodvik is not the savior to improve SL, many of his actions have actually made matters for the future of SL and its Residents far worse.  He really does not see SL as being anything more than LL's current declining Cash Cow and hopes its will last long enough for LL to diversify into other revenue gnerating product streams.  New products that does nothing to restore SL's glory, improve SL's experience, or regain/retain the ever declining SL customer base.

He has put policies into action that in fact further isolate LL from this customers.  With his direction this past summer to gag the resident JIRA,(this was 100% his policy and direction even against the recommendations from several LL staff internally) Rodvik does not want to FIX all the SL bugs that makes the current SL exerience poor, he simply wants to hide the problems and make it much harder for SL Residents as a community to point them out and complain about them and get them resolved.

When Rodvik has been openly asked directly to explain the reason for this policy to gag an invaluble service as teh Resident JIRA (in the Commerce Thread/forums a couple months ago), he simply did not respond.

So major impacting bugs like the likely PathFinder inducing Instant Sim Lag/Crash bug (this one) and the V3 viewers Notifications with Attachments function freezeup (that has been a major thorn to so many in SL since it appear early this year), cannot be escalated to LL.  As frustrating as these bugs are... the residents have no way to communicating the impact of these major problems to LL anymore because Rodvik has successfully GAGGED the COMPLAINTS.

As such, I dont see problems like this will get resolved any time soon... Rodvik wants them to focus on more and mroe new features and just let the experience degrading bug fester.

Share this post


Link to post
Share on other sites

So as bad as these instant sim lag crashes have been over the past 6 months, over the past few week either this problem has morphed into a new symptom or we now got a new type of serious sim stability bug on our hands.  It is being noticed at a lot of busy events and the venue owners, artists, and SL resident attendees are becoming even more frustrated.

At times during many of the larger events, a large number of the attendees at the event instantly crash (often including the singing artist themselves).  It does not happen that ONE person crashes... it will be like 5 or 10 attendees just crash off the sim at the very same time.

On Jan 1 I went to a new sim venue where they are having grand opening music events.  About 40-50 ppl were at the event including me.  During that time - 3 times I instantly crashed along with the singing artist and several others.  The funny thing is that it is not all the attendees and the sim does not crash either.

This has also been happening on several other sim / venue / events so its not just one isolated event.  This new serious stability symptom is being noticed by the community and is often part of the discussion the artists mention while on stage (i.e. how bad the SL grid sim stability has gotten and their apologies to all those that have attended the event).

I suspect this might moreso be a bug with the latest Firestorm release (which we already know it has a major bug with the music streaming).  But since the LL Instant Sim Hang root cause has not been figured out - there is a chance its a combination of the sim code instability and the latest Firestorm code.

Share this post


Link to post
Share on other sites

I have received a couple Private messages from watchers of this thread that are too nervous to post their thoughts publicly but have had similar scenarios recently at their clubs / venues.  One of them said it has been real bad for their club and they suspected it was a new form of griefer attack.  They have brought this up with LL and as suspected, LL has provided no response on the issue.  The thought is that LL might be aware there is a new form of security breach on the grid causing this and LL doesnt want to talk about it. 

The club owners have recently implemented a group only restriction access and supposedly the problems have stopped completely.  Seems there might be some merit to the theory.  Might explain why LL has not been engaging in the instant sim hangs and the crashing of many avatars are venues. 

I read a blog from someone that reports on all the LL Weekly User group meetings and she mentioned that Maestro reported that the sim's stability has been good over the past couple week.  WHAT EVER !  Clearly the Lindens are wanting to ignore this problem.

Share this post


Link to post
Share on other sites

It would be very interesting to see what's in the crashreport logs for those viewer crashes. I have no idea where Firestorm puts them, but the LL viewer, on Linux, drops them in ~/.secondlife/logs/SecondLifeCrashReport.log .

(FWIW, my viewer has been crashing more than usual recently, often when trying to select a Mesh item. Looking at the most recent log, it appears to have gotten a "memory allocation for vertex data failed" error in mapVertexBuffer from llrender/llvertexbuffer.cpp(1641) -- which probably has nothing to do with the simultaneous crashes.)

Also, a month or so ago, Simon thought that there was a possibility that the original problem might have been helped or maybe even fixed in one of the releases that must have gone grid-wide a bit before the holidays, so it would be good to know if they're still happening, too.

Share this post


Link to post
Share on other sites

On news year eve I was at an event where the sim instantly went dumb and crashed and myself and obviously the rest of the crowd got thrown off.  The sim was laggy prior but since there was over 50 avatars and for some sims this is a lot of stress on them, i just thought it was normal lag.  But the crash happened so quickly that I could not take note if the original symptoms were at play (i.e. increasing pending up and downloads and unusually low Up and Down traffic flows).  Strangly. when I finally was allowed to log back into SL (took 10 minutes) and get back onto the sim that was still overly lagged - I opened up the Perf Stats screen and that froze my viewer instantly.  I have to hard close FS.

Over the past few weeks (i.e. over Xmas break) there has been a ton o live music events that I attended and I will say that I cannot recall a sim going dumb where I had evidence of it (as mentioned above - because I crashed too quick to tell).  But it seems that since the release of the latest FS code and the release of the latest code that supposedly Simon released across the grid to resolve this problem (I suspect it was to resolve this problem), the sims are larger live events seems to have morphed into a new form of instability and the frequency of this form of instability has become much more frequent.

I agree with you - I would like to know what is in the crash logs I keep sending the FS team.  I even want to know if they even look at these crash reports.  Also, the kind of crashes we are witnessing recently does not even generate crash logs for FS.  When I crash, the viewer doesnt just shut down/quit.  I get a friendly greyed out screen with two prompts asking if I want to QUIT firestorm or Read IMs and Quit.  Since it was a friendly exit of the viewer, when I start up again, FS doesnt think a crash report is needed.

This could be a completely different NEW problem, or, a new problem that cropped up with Simon tried to fix the old problem, OR, a new problem that cropped up because of new FS & LL sim code, OR, its the old sim instant lag hang that has morphed.   OR... a new Griefer attack that FS and LL know about but dont want to talk about.

 

Share this post


Link to post
Share on other sites

I sure hope that the LL staff dont think that the Large / Busy Event crashes that have plagued the grid since PathFinder code was released has somehow gone away. 

In fact over the past few weeks it has has gotten worse.  In the past 4 days I have been to 4 venues/events where the sim kicked everyone off instantly without warning.  I am writing this now as I wait to return to a venue that was in the middle of a concert with about 45 ppl at it and crashed.

The grid stability with all LL's new shiny features being pumped into the sim code when they havent even fixed the stability likely caused from the pathfinder code is so frustrating and impacts so many that try to host and play events.

But I guess LL really doesnt care about grid stability as this thread has gon on so long with no LL attention and no resolution.  They also do not address long standing problems like the NOTICE ATTACHMENTS FAILING that showed up on the grid last summer as well.

Share this post


Link to post
Share on other sites

And maybe if I report here every time the sim I am at crashes at a big show... maybe LL staff will notice.

Since LL doesnt even know how unstable they have made the grid ( since they rarely come in the grid to experience it for themselves - like watching a great musician just to see the crowded sim crash), maybe reporting each one i get hit with will if them a hint how many times the sims crash.

What is ironic is that I read a Nalates blog recently where she said LL belives their latest sim code upgrades have had no bad effects and all looks good.  COUGH COUGH.... of course it looks good when they never live in the gri themselves.

ANYWAY... I am posting here because once again today a music sim crashed when it had a lot of avatars on it

Stay Tuned.... likely post again soon.

Share this post


Link to post
Share on other sites

On the weekend of February 8-10 I ran a number of events for the relaunch of our Poppyport airport, marina & harbour, which is located on the Sansara mainland - see www.poppyport.com.  At one of these events, but only one of them, we suffered the Sudden Massive Lag (SML) problem where the sim seemed to freeze for about 15-20 minutes.  This happened on the Saturday night when Russell Eponym was about to perform at our marina arena.

Russell is a very popular performer in SL, and his events can attract about as many as a sim can hold.  On this occasion, however, we only had about 16 at the event - which I am inclined to put down to people not being able to get into the sim either initially or after relogging or being thrown out.  This was also fewer than the closing party we had the following day, which nearly 30 attended.

The event on the Saturday when Russell was due to perform was widely publicised through SL's Events Calendar, a number of groups, and on the internet thro our own website, both our Google Calendar & Russell's, and his fan club on Facebook.

My conclusion is that SML was triggered on this occasion by everybody trying to tp into the same spot simultaneously, which the sim server could not handle.

It is interesting also that the event on the Saturday night when SML happened differed from that on the Sunday night, since the Sunday night party actually had a significantly larger number of people who attended but fewer arrived by tp'ing at the same time - as many arrived by boat or plane at the end of a cruise and arrival times were more staggered.

We have another performance by Russell Eponym planned at our marina arena for March 9th, and the question is, what can we do to prevent the same thing happening and the sim being hit by SML?

All I can think of at the moment, is that we should try to stagger how and when people arrive, for example by giving out slightly different start times and also perhaps getting some people to tp into the next sim and then cross over by e.g. boat.  Any suggestions would be welcome!

The only other thing which crosses my mind is that, if it were possible to get LL to increase a sim's capacity for large events, that might be a useful thing and one people might be prepared to pay for.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...