Reply
Qie Niangao
Posts: 3,916
Registered: ‎02-25-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Reply to hombra - view message

Don't know if that's the same problem or not, but It's definitely network-related: over 40,000 pending downloads. And most of the extended frame is spent in Network time. I think it's useful to see these reports, whether they turn out to be the same problem or not.

(Incidentally, the Network detail that you have expanded at the top is actually what the viewer is experiencing, not the conditions at the sim. Viewers have a lot of network traffic from sources other than the sim, and can have quite normal network performance while the sim is going belly-up.)

Posts: 2,539
Registered: ‎03-31-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

[ Edited ]

Reply to Qie Niangao - view message

Was at another event that when it got to about 40 avatars the sim tanked right up - couldnt move.  People were not crashing because the host of the sim emergency moved many of his guests to his home sim.  When the count got down to about 20, the sim stabilized again.

I captured the stats while the sim was staling out and once again..... look at the PENDING DOWNLOADS.  I watched the stats as ppl left... the Pending Downloads dropped to under 5.

 

oct22-915p-40avatars.JPG

Then at the sim that everyone jumped to... there were about 30 avatars.... and it was laggy but the normal kind of laggy .... and look at the Packets in and out and pending downloads on a sim with about 12 less avatars (Processing a lot more packets - over 4 times more - and only 1 pending download)

oct22-952p-28avatars.JPG

Toys SL Marketplace StoreToys SL Art Gallery
Qie Niangao
Posts: 3,916
Registered: ‎02-25-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Reply to Toysoldier Thor - view message

I tried at the Server user group yesterday:


[2012/10/23 12:49] Qie Niangao: Do we know any more about Toysoldier Thor's bug... BUG-355
[2012/10/23 12:50] Qie Niangao: (aka MAINT-1682, about sim-crippling network performance, with moderate-to-high avatar counts)
[2012/10/23 12:51] Simon Linden: I don't have any news on BUG-355
[2012/10/23 12:51] Qie Niangao: okay. reports are that it's still happening, so... probably will keep getting asked about it.
[2012/10/23 12:51] Nal (nalates.urriah): Talk to Nyx, I think we saw that in Barrowdale yesterday.
[2012/10/23 12:52] Nal (nalates.urriah): The region had 0.03 TD and 45 FPS... w/free script time and no PF...


Any progress is behind the MAINT jira wall, so it's hard to judge whether anybody is actively looking at this or not.

Posts: 2,539
Registered: ‎03-31-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Reply to Qie Niangao - view message


Qie Niangao wrote:

I tried at the Server user group yesterday:


[2012/10/23 12:49] Qie Niangao: Do we know any more about Toysoldier Thor's bug... BUG-355
[2012/10/23 12:50] Qie Niangao: (aka MAINT-1682, about sim-crippling network performance, with moderate-to-high avatar counts)
[2012/10/23 12:51] Simon Linden: I don't have any news on BUG-355
[2012/10/23 12:51] Qie Niangao: okay. reports are that it's still happening, so... probably will keep getting asked about it.
[2012/10/23 12:51] Nal (nalates.urriah): Talk to Nyx, I think we saw that in Barrowdale yesterday.
[2012/10/23 12:52] Nal (nalates.urriah): The region had 0.03 TD and 45 FPS... w/free script time and no PF...


Any progress is behind the MAINT jira wall, so it's hard to judge whether anybody is actively looking at this or not.


There has been no progress on this JIRA and as you can see nothing mentioned in tis thread and nothing talked about in the user groups on any progress to this bug.  The JIRA's last activity was that it was ACCEPTED by LL shortly after I created it a few weeks ago.  Since the JIRAs are not gagged and hidden from public view, its just that much more easy to let them go stale without any Public Eyes able to watch it.

The evidence that there is something going wrong with the "NETWORKING" aspect of the sims or the Debian OS underneath the sims or their connectivity within the LL DCs is mounting.  And in fact this week has been a real bad week.  Not sure if its because of recent sim code upgrades that aggrivate this problem or unrelated but similar symptom problems, but it has been a real unstable week on the sims this week.


Qie, I really wished that you myself and some others with interest and knowledge of I.T. could be allowd under the covers at LL's DC.  As a Network Architect, it is fun and challenging to fix network related problems.  They may seem tough but compared to many other I.T. problems, network problems are generally easier to isolate and find root causes to.

The problem is that LL seems focused on all their recent sim code upgrades and are not really interested in fixing old problems - even though its ironic that the Sim team is focused on an objective to IMPROVING PERFORMANCE & STABILITY... and if they could find the root cause to why the sim's network traffic goes stale / stupid, they could potentially resolve a lot of the most painful and frustrating lag / crash problems on the grid ....  which in turn means HAPPIER RESIDENTS!

All we can do is keep slapping the evidence of a network problem in the LL DC in LL's face on this forum until they place some focused effort into diagnosing these symptoms.

Toys SL Marketplace StoreToys SL Art Gallery
Qie Niangao
Posts: 3,916
Registered: ‎02-25-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Reply to Toysoldier Thor - view message

Inara Pey reports from the Friday Server user group that this problem is planned to be (partially?) addressed in next week's Magnum RC rollout. Fingers crossed.

Posts: 2,539
Registered: ‎03-31-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

[ Edited ]

Reply to Qie Niangao - view message

Yeah Qie.... that is what Simon said at the server meeting on Friday but the impression I got from the way they were discussing this topic was that they only "hoped" that this bug might be addressed.... more as if its with some luck that it will be fixed.  I really dont think they are focusing on diagnoising the network stale-out problem.

As such, I am writing this post as I am at yet another staled out sim during a major event...  Notice the network stats again.  3Meg of unAcked data.... and lots of down and uploads pending... and yet the PPS are rediculously low even with all the data that is pending.

 

CrashedSim-Lexa-Oct27-920p.JPG

Toys SL Marketplace StoreToys SL Art Gallery
Posts: 2,539
Registered: ‎03-31-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

[ Edited ]

Reply to Toysoldier Thor - view message

I am personally nervous today that this event disrupting bug within the sim code or LL DC network infrastructure (that all but cancelled last night's music event last night because the sim would not recover in time for the concert to continue) will rear its ugly head today at 12 noon for the even that I am the feature artist.

The Paris Metro Art Gala will have a music artist streaming and is expected to host well over 70 avatars to this major event.  A ton of planning and effort has gone into this evnt by several poeple.  If what has been happening at several other major inworld events with these NETWORK LAG OUTS happens this afternoon, I am going to be ROYALLY TICKED OFF LL!

I am crossing my fingers the sim will hold up under the stress and not trigger this bug in the sim code.  I am going to capture the perf stats during the event if I have time.

Toys SL Marketplace StoreToys SL Art Gallery
Posts: 2,539
Registered: ‎03-31-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Reply to Toysoldier Thor - view message

UPDATE from the major event today and how it went.

LL... you need to start FOCUSING ON THIS BUG!!  

I am Royally Ticked Off that this thread has been going on for over a month.  We have provided LL very clear evidence that there is a bug somewhere in the Network Layer of the LL Sim, or the underlying Server OS, or the LL Data Center network infratructure.  I have personally brought up issue up to many Friday LL Server/Sim user group meetings.  I have created a formal JIRA on this issue.  Yet there is still no serious effort by LL Staff to focus efforts on diagnoising this SL BUG.

Maybe you Lindens do not understand the magnitude of the impact this bug is causing YOUR CUSTOMERS inworld - since you rarely come inworld to understand it.  But let me help you out.  When there is a major event planned at a sim (i.e. a music, club, fashion, art venue), not only has there been a lot of planning, time, and effort put into event.  BUT, there is advertising / promotion revenue at stake and MONEY LOST because of this LL BUG ON THE SIMS.

At today, Paris Metro Art Gala, the event was promoted around countless media avenues.  One promotion of today's event went out to an audience of 80,000 members!  There was big costs put out to hire a top quality singer.  Gifts were made for the guests.

And so what happened today for the event that started at 12 noon SLT?  Well I got there right at 12 and with not even 20 people on the sim, the lag was intolerable!  I could barely move.  This should be be the case with only 20 avatars!!  By the time the sim got to between 50-60 avatars people started crashing and on the PERFORMANCE STATS I could see that the LL SIM BUG started showing its ugly head.  I knew we were in trouble.  The Singer crashed.  I captured the stats before I crashed.  Then the sim itself didnt crash but massively staled out so that almost everyone crashed out and could not get onto the sim for about 10 minutes.

Then all of a sudden the sim was letting people TP back to it.  The lag was substantially less.  We only got about 30 avatars back since a lot of people gave up on the event and never returned.  WE LOST A LOT OF POTENTIAL SALES of art and the PR for the Paris Metro sim.

Here is the screen capture of the sim just before I crashed:  Notice the arrows.  Notice the common symptom of the network pps being for lower than what should be happening for a sim with 55 avatars on it.  Notice the Pending Uploads and the Unacked data.

ParisMetro-1222-oct28-lag.JPG

 

To the LL Team at the Friday Server Sim Usrer Group meeting, I will be doing a lot of talking about this topic so I sure hope by Friday your team has done some REAL DIAGNOSTICS on this issue and will have answers.  There is no excuse that you are not aware of the problem... that we havent pointed fingers of where the issue is.... that there is no JIRA.... 

Toys SL Marketplace StoreToys SL Art Gallery
Perrie Juran
Posts: 4,940
Registered: ‎10-16-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Reply to Toysoldier Thor - view message

A question if I may?

Could some Ava's be more susceptible to this than others?  I myself have been having a lot of trouble at one venue with a lot of crashes.  It is the only place I have had any trouble so last night I was trying to watch my statistics bar while enjoying the event.  I have had as many as 4 crashes in the period of an hour. Last night only one.

I was as far as I could discern last night the only one who crashed but I am pretty sure others have been.  Watching the statistics bar I did keep seeing the unpacked bytes climbing over 600 but the pending downloads stayed very low.

This stuff is a bit over my head.  And I just loved the hostesses response when I asked if any one else had crashed. "You might want to clear your cache." 

Qie Niangao
Posts: 3,916
Registered: ‎02-25-2009

Re: Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Reply to Perrie Juran - view message

There are certainly many reasons for one user's sessions to crash more than other's, and to diagnose that, we'd need to delve into all manner of details of your viewer, machine, and network configuration--which you may have already supplied ad nauseum somewhere else. I suspect, however, that your crashes are unrelated to the sim network problem(s) in this thread. A few hundred unacked bytes aren't a big deal; the problematic conditions are indicated when it's a few hundred thousand or more unacked bytes (among other apparent symptoms with which this thread has been wrestling).

One thing I'm noticing is that Image Time sometimes does and sometimes doesn't climb along with Net Time. I'm not sure this actually means anything; Image Time is all about the network, and I suppose it's just a function of what the sim is trying to do on the (sickened) network that determines how the time is distributed.

I wish I knew the Ops guys' take on this problem.  We've never been able to see inside MAINT jiras, so it's not anything new that we're in the dark on this... but it sure is frustrating.