Jump to content
Sign in to follow this  
Toysoldier Thor

Increase in Instant SIM LAG & Crashes During Larger Events - Network Source?

Recommended Posts

I am writing this new thread here since creating a JIRA for public input and involvement is no longer possible AND because later today I will be bringing this serious issue up at the Weekly LL Server User Group meeting at 4pm slt.


BACKGROUND:

A major part of my nightly activity almost every night in SL is to attend several live music concerts / events / gigs at a wide variety of venues that host live events.  I attend usually 2 to 4 live events a night.  As such, I have very good experience on being on sims where I frequently see 30 - 50 - 70+ avatars on a sim to attend many of the more popular music artists in SL.  I know what kind of lag to expect over the past years of attending these events. 

I also know what is normal verses very unusual regarding the kind and intensity of lag as well as sim crash frequency at events - which does happen but is usually quite uncommon to happen even when a sim is real full.

THIS WEEK THINGS CHANGED:

I dont know what changed and I am suspecting it was changes that LL has deployed in the server code but clearly something has changed that has caused a serious impact on a sim's ability to be stable during these larger events.


On Tuesday night (Sep 25th) I attended 4 concerts that evening SLT time and all at different sims.  At two of these event, one with about 25 avatars and one with about 45 avatars, I personally witness very unusual and severe situations that personally impacted the club owner and artists that were holding the events.

What was more suspicious was that they both had very similar symptoms:

At the 1st event with only about 25 avatars I arrived at the event almost at the beginning of the show.  I TPed into the club with massive intense lag.  I have been at events far larger with nothing near the lag i experienced at this event on the sim.  No one at the event could move.  When I looked at the lag meter it was ALL SERVER LAG as the Network and Client were both in GREEN.  It appeared that the sim was completely hitting the wall.

the stream was working still so the artists could be heard.  And then about 10 minutes into all this lag, all the avatars started crashing out of the club.  I was one of the last ones to survive without crashing.  The musicians crashed even though the stream stayed up.

Also, those that crashed took could not get back on for a few minutes and even when they got on, they could not return to the club/sim even though there was almost no one left and the lag subsided.  After about another 10 minutes I crashed too and then after returning - people could all start coming back to the sim.  It was like the entire sim had to be rid of all avatars before it could return to normal.

 

Later that night I went to another very popular concert that was preceded by another popular artist.  When I arrived at this sim, there already were about 40 ppl at the club listening to an artist.  There was no significant lag at all.  THEN... as soon as the stream was switched to the new artist's stream the sim instantly got HAMMERED by the exact same massive lag that I witnessed earlier at the first event.  It hit like a light switch and just as the stream was switched. 

Within minutes, all but 4 of us (I was one of the few that survived the avatar crashes) were kicked out of SL.  None of the 40+ avatars could log back into SL for several minutes.. and even when they got back on - none of them could TP back to the club even though they could TP to other sims.  Then almost at the same time, the remaining 4 of us all crashed off the sim as well.  We could get back pretty quick and we were all allowed to return to the sim after that - even those that crashed 20 minutes earlier.

FAST FORWARD TO LAST NIGHT....

I went to another popular club sim that frequently host very large crowded events.  I arrived to see a popular artist and there was already one on stage.  The club was a bit laggy as would be expected with about 50 avatars on it... but normal.

And then top of hour the club switched streams for the new artist start his show and BOOOM !!  everything that happened on Tuesday night happened again.  MASSIVE Server side lag instantly hit..... all but about 4 of us survived - I was one of them again.  And no one that could relog back into sl could TP to the grid.  Instead of waiting to crash, I logged out gently and re-logged back in... but for the remainder of the hour no one - including me - could return to the sim. 

Once again another major show was disrupted.


I have also heard from another venue owner that they have noticed this severe increase in sim lag freeze / crashes recently.

So I dont know if this is ever since PathFinder was deployed and that it got majorly worse after this Tuesday's code deployment.... but something severe has happened to the grid.


If you are a club / venue owner of big events... have you recently seen similar types of lag and crashes?

I will be asking the Lindens about this tonight.

Share this post


Link to post
Share on other sites

PS....

Attention to the Server User Group Linden staff : 

Simon, Andrew, Baker, Cheesey

 

Hopefully you will read this thread prior to this afternoon's weekly Server User Group meeting so that I can just point to it and ask for your input / opinions on whats going on and if LL code upgrades could be causing these majore sim lag crashes.

 

Share this post


Link to post
Share on other sites

Toy S

Seen and similar issue noted at Pine Wood sim, two (UK) nights ago.  It has put off a lot of Club "regulars" since they seem far more "crash-prone" than me.

Typical effects are freezing and "rubber-banding" when moving, extremely slow rezzing, streams faltering, etc.  This seems to have been a result of the rolling of the last set of code.

 

As a corollary we have seen issues related to sim-server/viewer communication even on lightly-loaded sims, including stream issues.  Also same server version, since it is now Grid wide.

Share this post


Link to post
Share on other sites

I don't see how any server changes could have caused a sudden-onset problem in the last few days because the main channel hasn't seen any code changes since September 18 and the RC channels were last updated the week before.

Share this post


Link to post
Share on other sites

I don't want this to be read as a specific suggestion, but this could be down to a memory leak, which means that sim servers gradually use more and more memory, until there isn't enough RAM, and the OS starts using virtual memory, and performance plummets.

That sort of problem would explain why the fault doesn't normally become apparent, because the weekly roll-outs reset the system before it gets too bad. I don't know how the location of the bad code could be discovered, It doesn't have to be all that recent a change.

Share this post


Link to post
Share on other sites

I have rented a parcel on the Hubble-SIM and since a few days, I experience heavy Lag-Attacks, slapped a LagMeter on the house and sometimes it drops down to near zero, at the same time it is practically impossible to move. Sometimes up to 200 - 300 of these Lag-Attacks per day. Note, this SIM is low traffic, usually 2 to 6 avatars on it at the same time. According to GridSurvey, these 5 Sims are on the same server:

  • Immaculate - surveyed 2012-09-20
  • Jemmica - surveyed 2012-09-20
  • Yacumama - surveyed 2012-09-20
  • Getaway Cove - surveyed 2011-12-31 (older data - Region not visited on last survey)
  • Jeremis - surveyed 2011-12-28 (older data - Region not visited on last survey)

Checked these Sims, no avatars on them during the Lag-Attacks.

Another problem: An unusual high number of Sims -especially on the Sansara-Continent- that can't process sim-crossings in a timely manner, with the result in being kicked out of any automated vehicle, or the avatar keeps on walking forever until being stuck somewhere nowhere. Or the error message comes up. Completing one of the longer YavaScript-Tours is practically impossible.

Also new: Sim simply stops responding. You can't move in any direction, only turning around one's axis is possible. Logout/Relogin usually fixes that problem.


( Happens with the default viewer as well as with Firestorm - Both are latest version.)

 

Share this post


Link to post
Share on other sites

Right now there are several mainland sims that are running in a constant state of lag - less than half their normal frame rate - while being nearly empty. They're all showing abnormally high script times - often around 50 ms when it's usually in the neighborhood of 15 ms. Even so, they're shown as running only a very small percentage of script events per frame - well under 1%. In a perfectly running sim this should be 100% and even in very crowded sims I haven't seen this much below 40%. Other nearby regions are running completely normally.

I'm wondering if there might be a new griefing script out that can silently lag the servers.

Share this post


Link to post
Share on other sites

Currently I'm on Heterocera, walking along the SLRR-Railway and most of the Sims I passed through are lagging heavily.

e.g Mocis, Tenera. Aglia is especially bad, I'm just watching a train moving slower than a garden snail.

 

Share this post


Link to post
Share on other sites

I can believe that a memory leak could be a potential cause to this problem seems like problem shows up immediately after reaching some threshold.

 

Like I mention.i have not seen a sim instantly lag up solidly so fast. It spike a switch was turned on. This problem is a major world experience problem.

Share this post


Link to post
Share on other sites

Here is all the details I have about two of the LAG event (the one on Tuesday and the one last night):

 

EVENT #1

Lag trigger date/time:

Date:  Tuesday Sep 25, 2012

Time:  9pm SLT (almost to the minute as the new artist was taking over the stream from theprevious artist)

Location:  http://maps.secondlife.com/secondlife//20/109/24

Location:  "The Pier at Wild Beach - Live Music Venue"

Description: There was about 40 ppl at the club / sim at the time the event triggered.  It happened just as the stream was changed from the previous singer to the new singer.  The artist did not even get on stage and the sim got hit with a massive server-side lag (not network - not client).  All but have 7 avatars were kicked off the sim.  I survived for about 15 minutes waiting for the audience / friends to return.  None returned while the sim stayed up.  Then the remainder of us were forcefully shut down.  When we logged back in, we could get back to the sim and the audience eventually returned for only about 10-15 minutes of show.

UPDATE: I talked to the event host that reported the problem to LL and LL responded to them that it was a "STALE SERVER".... what ever that means.  User said she did a LIVE CHAT call to report it and she talked to REED LINDEN

 

EVENT #2

Lag trigger date/time:

Date:  Thursday Sep 27, 2012

Time:  10pm SLT (almost to the minute as the new artist was taking over the stream from theprevious artist)

Location:  http://maps.secondlife.com/secondlife/Purple%20Magic/21/21/26

Location:  "Sherie's Gaslight"

Description: There was roughly about 45 ppl on the sim/parcel of land at the time of the triggered massive server-side lag (it was not network or client when I looked).  Within a few minutes only 4 of us were left on the sim.  The sim did not crash as I was able to not crash during the event.  BUT, I decided to log out gracefully and then relog and return.  No one was ever able to return for the rest of the hour.

 

I do not have enough details of the first recorded event on Tuesday (ie. where and exactly when) but I believe it was about 6pm slt and the crowd was a bit smaller than the two recorded above.

I am hoping that LL has sim activity logs and can use the sim location and date/time to see if they saw ANYTHING in their logs for these sims at those times.

Share this post


Link to post
Share on other sites

I am the owner of a live music venue.  We moved to a different sim and had our first show there on September 13th.  We were hosting three hours of shows that night.  The first hour we had about 30 people on the sim.  The second hour, we had an increase in people and when the sim hit about 40 avatars, the time dialation and sim FPS tanked.  Nobody could move and people crashed off the sim and could not get back.  The sim never went down and several avatars remained on the sim.    We had to move the last 1.5 hours of shows to another venue in order to be able to hold them.  

 

After the shows were over, I filed an offline region ticket with LL since I could not even get on the sim to restart it.  The sim was restarted and seemed to stablilize.  We went ahead with our plans to hold a single hour of live music on Sept. 15.  Once the number of avs on the sim approached 40, the same issue occurred and we moved the show to another venue to finish the show.

 

The sim owner filed a support ticket with LL and the sim was changed to a different server.  We did a massive script and av stress test to see what would happen and the performance seems to have stabillized.  However, I frequently have trouble getting audio streams to work on the land itself and often have to relog to hear sound when I teleport to a different sim, something that I never had an issue with before.

 

My venue has hosted events with as many as 95 people on the sim at one time and then we didn't suffer the kinds of performance issues that I mentioned before.  This issue has started to affect not only my venue, but other art, music and dance activities that are important to me and many others in Second Life.  When we can't perform basic functions such as being able to change media streams and having sim performance issues with a relatively low number of avs on a sim, this affects a lot of things that people enjoy about SL.  

 

~Thea Dee

Co-Owner, Ground Zero Music and Art

Co-Owner, Pumpkin Pictures

Soloist, Guerilla Burlesque

Share this post


Link to post
Share on other sites

The sim where I had the issues is Eyefliez.  We also had some issues with music and sound on Atlantizz the week before that (our former venue location)

Share this post


Link to post
Share on other sites

If you want to help resolve problems, understand that LATEST VERSION does not mean anything. It is way hard and time consuming to try and figure out what version you had installed and whether it was or wasn't the latest version. OPen HRLP -> About in the viewer and copy the specific version, like: Second Life 3.4.0 (264911) Sep 19 2012 11:15:02 (Second Life Release)

Since you are discussing a server problem, include the server version, like: Second Life Server 12.09.07.264510, also in Help -> About.

The Lindens simply are not going to try and estimate which version people are on when they wrote their post and possibly be wrong and make things even more confusing. Be explicit. It will make the whole thing move forward faster.

Share this post


Link to post
Share on other sites

Thanks Nalates....

I will add this Server detail to my events posting.  Simon Linden is going to take a look at the logs for the date/time of these events for these two examples.

Share this post


Link to post
Share on other sites

I got an IM from Grafx Newbold tonight that the very popular INSPIRE SPACE Park  on the Mainland SHINDA sim tonight seems to have had the same thing happen to it.

I asked him to provide all the details but here is what he told me so far:

Sim Location:  http://maps.secondlife.com/secondlife/Shinda/35/218/1560

Date: / Time :  Sept 30, 2012  @ about 7:40pm SLT  (time corrected from 8:40)

I will quote what was posted by the person that told Grafx the details:

 anonymous resident: : if it's any consolation, SL in general is being a big **bleep** this evening
[21:42]  anonymous Resident: Tempura was impossible...changing clothes is asking for a quick logout and half my TPs are failin
[21:43]  anonymous Resident: I got that same 'capabilities not granted' message for other destinations as well  Tempura and my outfits got sticky...so I left to try and force a change...and got logged out instead. Logged back to Smith and when I tried to TP out to Tempura I got the capabilities error.  it was a series of crashes before I could get back to Tempura and then it was still bad enough to crash me again, which is unusual

Since there was no owner available to restart the sim... the SHINDA sim was blocking ppl from tping into it.  Then as we were IMing... just like the other problems i reported.  At time of this reporting - the Mainland Sim was still locked.  Since its mainland and satuday night - no one around to force a reboot of the sim.

Sounds like Tempura has the same issues but i cant confirm that. 

Hope anyone else can report details as well.

 

 

 

Share this post


Link to post
Share on other sites

OK LL - you got yourself a big problem somewhere in your Server Code..... 

It happened again at another major music event venue and exact same symptoms as the other music event....

 

DATE / TIME:  Sep 29, 2012 @ 10pm SLT

Location:  http://maps.secondlife.com/secondlife/Musicland%20Isle/117/99/24

Population at time of event:  40-55 avatars (sim can support 65 I was told)

DETAILS:

When they were switching Music Stream from the outgoing singer to the new singer (top of hour).....  The sim SLAMMED into LAG.  And then - just like the other events about 5-9 minutes later almost all the avatars were booted off SL.  When they all could eventually log in - no one could get back to the sim - it was blocking them. 

I arrived at the sim that BORKED about 30 minutes later and it let me onto the sim - only 2 ppl were there.  So it seems like it ended up healing itself - unless the sim owner rebooted it.

 

This is the 2nd time this major incident has happened to the same popular music artist in sl.  Its affecting / disrupting a lot of major events.  The Singer's management had to scramble a get everyone on to another sim and advertise for guests to come to the new sim.

 

Share this post


Link to post
Share on other sites

There is a reason mainland regions are limited to 40 avatars; not 40 on a parcel, 40 total across a region. There's nothing "enhanced" about the setup of a private region ... it's running the same software, on the same hardware, and sharing server resources with as many other regions, as any mainland region. Private region owners "can" choose to set the avatar limit on their region as high as 100, I believe, but that doesn't mean the region will run well if that many are present. So pretty much anything over 40 avatars present starts to make lag almost an expected experience. A region supporting 65 avatars would be running at more than 160% of design limits for a well managed region. I don't understand why anyone would be surprised when the conditions you describe cause a poor experience.


FYI a mainland region restart can be requested in live chat, but you have to get the avatar count on the region below 10 before they'll do it.

Share this post


Link to post
Share on other sites

@Cincia,

Please read my OP posting.  I have gone and been a part of very heavily congested sim countless times as its a very common situation with many of the popular inworld musicans.  During any given week I am at sims during live events where I see 30 40 60 and even over 80+ avatars.

Trust me when I tell you that I know full well what a sim feels like when a sim is overloaded with avatars.  I am not naive to think that sims with 40 or 50 avatars on it will cause lag on a sim.  A month ago I was on a sim during a 2 hour 2 singer event where there was almost 100 avatars.  I stood there and enjoyed the music coming from the sim's stream even though I was not able to basically move duing the entire 2 hour show.  THAT IS NORMAL and everyone that attends events like this is fully expecting lag and just shrugs it off.  BUT... we also know that normally - even though the lag grows in intensity with the increase in arriving avatars.... the sim does not crash or go "1/2 stale" and forces all the avatars to be booted out and then does not even allow them to tp back to the sim.  And then - without any intervention from the sim owner - the sim goes back to normal.

What ever is causing this - its not normal lag or activities caused simply because there are a lot of avatars on the sim.

As I mention in following posts..... in all cases I have witnessed, the sim already had about 40 or 50 avatars on the club and there was no serious lag at all.... until all of a sudden a switch was pulled / trigger was hit and the sim went into DUMB mode.

If there was a public jira for inworld residents / sim owners to share info I would have placed this data there.  Thanks to Rodvik's policy to remove Resident JIRA, this is the only place to solicit resident sharing of larger problems.

If the root cause is because of a recent LL server code issue then I am hoping that providing them as much detail as possible will help them identify the problem. 

If this is being cause by some new griefer generated script (or what ever) then providing LL this detail can hopefully point them to their internal logs that can isolate the method the attack is happening and even isolate the griefer who was either on the sim each time wearing the griefing script or placed the object on the sime that caused it.  Through forensics - they could possible go after the griefer as well as shut down the weakness that he / she is using.

Share this post


Link to post
Share on other sites

Followup.... the one thing I have noticed in the 3 events I have witnessed, when the sim does not crash yet boots almost everyone off the sim... a few residents seem to survive for good or much longer than most.  I have been lucky enough to be one of the few that are not booted out of SL immediately (the viewer says "quit" and you must close the viewer and restart it).

In all three times there are about 3 to 6 Avatars remaining.  Most of them I know and we were all talking about what is happening, but I recall there is one avatar each time that stands in the back and does not participate in the discussion.  I would ask LL when they look at all the logs during the event to also look carefully at the avatars the survived the mass boot-out to see if there is a similar avatar and/or what the avvatars were wearing or what new objects showed up and who created it at the time of the event.

I dont know root cause but I will be more diligent now to take careful note at who survives the boot out and record them - just in case it is a new form of griefer attack.  I will tell the venue owner and artist management to also take careful notes. 

The more info - the better.

Share this post


Link to post
Share on other sites


Cincia Singh wrote:

There is a reason mainland regions are limited to 40 avatars; not 40 on a parcel, 40 total across a region. There's nothing "enhanced" about the setup of a private region ... it's running the same software, on the same hardware, and sharing server resources with as many other regions, as any mainland region. Private region owners "can" choose to set the avatar limit on their region as high as 100, I believe, but that doesn't mean the region will run well if that many are present. So pretty much anything over 40 avatars present starts to make lag almost an expected experience. A region supporting 65 avatars would be running at more than 160% of design limits for a well managed region. I don't understand why anyone would be surprised when the conditions you describe cause a poor experience.

 

FYI a mainland region restart can be requested in live chat, but you have to get the avatar count on the region below 10 before they'll do it.

Cincia, MAINLAND regions are limited to 40 per region because they always have neighbors and an avatar in a neigboring region is considered a "child agent" that counts in the load on the server. I've been in standalone private regions with 70 avatars and the server hasn't shown major signs of strain - it's laggy as all get out on the client end, of course, and script running gets slowed down but the server frame rate itself stays pretty high.

Share this post


Link to post
Share on other sites

Toy, also look at the posts toward in this week's server release thread - a number of regions are running slowly due to an abnormally high script time (over three times what you'd see normally), which is difficult to have happen unless a script is deliberately forcing the servers to do something unusual. Their neighbors are acting completely normal. I've also been in regions that have the same number of avatars as you're describing have events with changes of stream with no problems.

I've been seeing activity of some people who are trying to put together a griefing campaign. It reminds me more of the movie "Bugsy Malone" than anything significant but they may have some fair griefing technology and I'd suggest filing a JIRA as a security issue as well as updating this forum.

Share this post


Link to post
Share on other sites

I am telling my Venue and Music Artist management friends to look for any details to the situation when it happens and to at minimum file a Security related JIRA and to point to this forum as a suspicion that its related.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...