Jump to content
Sign in to follow this  
nikita Jefferson

Sim restarts

Recommended Posts

Seylar hasn't been restarted; it's uptime is currently 20 days 21 hours and change. So they closed the ticket but didn't do anything... which might be okay if they're actually studying something about this "too sleepy" condition it's in.

Nuggy is dying of KittyCats. The times are super Physics heavy and there's a parcel with a whole host of KittyCats many of which are in the active, Physical state. This is at ground level, so there may be more, but whatever else there is, Physics beacons will tell the tale there. (It was restarted Mon 2019-12-30 15:06 PST, so about 20 hours ago as I'm typing.)

Share this post


Link to post
Share on other sites

It was the restart of Nuggy at 15.06 PST yesterday that broke it.  We have no idea why it was restarted as it had restarted on Saturday.  Everything was working before that.

Edited by Eleanor8

Share this post


Link to post
Share on other sites
53 minutes ago, Qie Niangao said:

 

Nuggy is dying of KittyCats. The times are super Physics heavy and there's a parcel with a whole host of KittyCats many of which are in the active, Physical state. This is at ground level, so there may be more, but whatever else there is, Physics beacons will tell the tale there. (It was restarted Mon 2019-12-30 15:06 PST, so about 20 hours ago as I'm typing.)

Not true, Qie. The cats have been there for years. Also, the sim was healthy before yesterday's restart.

Sure, the cats cause some lag, but this is something entirely different. I've had Nuggy with Scripts Run on 90% average and the cats were there.

[EDIT] The cats presence never affected my weekly dance parties. I can't say the same about LL and their mess-ups. So you see, almost 20 hours after I filed my first support ticket I still got no response. Should I be happy about it? It's all beyond a joke...

Edited by MBeatrix
adding info

Share this post


Link to post
Share on other sites
1 hour ago, animats said:

Go to Server User Group today (1200 SLT, Denby) and complain, loudly.

I went there to see if there was anything happening. Still loads of Christmas Tree and decorative stuff. Interesting thing is that it's lag monitor showed it was restarted 13 days ago.

The other thing is, complaining loudly might not be the best strategy. It's done there enough times, and somewhere in amongst the din of gestures last meeting I saw some people trying to complain about sim crossings, but I don't know whether it was possible for their complaint to be dealt with there. I had to flee before the fans blew up in my PC.

Complaining is only really effective when there is either an SLA to which the other party can be held to, or a viable alternative to which the aggrieved parties can walk away to if they don't get serious consideration. I'm not sure what we, as users, are actually entitled to.

 

 

Edited by Profaitchikenz Haiku

Share this post


Link to post
Share on other sites
13 minutes ago, Profaitchikenz Haiku said:

I went there to see if there was anything happening. Still loads of Christmas Tree and decorative stuff. Interesting thing is that it's lag monitor showed it was restarted 13 days ago.

Yep, I'm there, alone, no one else in sight. I guess is also due to "maintenance"...

Share this post


Link to post
Share on other sites
1 hour ago, MBeatrix said:

I've had Nuggy with Scripts Run on 90% average and the cats were there.

Is there any way the restart caused too many cats to wake up or something? I mean, that's a whole lotta physics going on there. I suppose it's possible something in the restart caused Physics to run slower -- there's plenty of ways sims are showing bad performance after all. When there's a lot of physics in the frame, though, it's not usually complicated to find the culprit.

Share this post


Link to post
Share on other sites
10 minutes ago, Qie Niangao said:

Is there any way the restart caused too many cats to wake up or something? I mean, that's a whole lotta physics going on there. I suppose it's possible something in the restart caused Physics to run slower -- there's plenty of ways sims are showing bad performance after all. When there's a lot of physics in the frame, though, it's not usually complicated to find the culprit.

I don't think that's the case, Qie. Restarts may affect their cycle a bit, yes, but the cats do get awake on their own, anyway (8 hours asleep, 8 hours awake). This is something that happened before, and restarting the sim on another host always fixed it. The main problem is that LL has closed for holidays after restarting the sim (and who knows how many more).

I also had the Leeward Cruising Club finishing and starting Sunday cruises in Nuggy, having the sim full with the max avatars allowed, and it never crashed or anything. You bet there were a lot of physics involved...

No, as mentioned before, the scripts issue began one year and an half ago, as far as I first noticed it. I do understand that collisions cause lag, but please believe me, it's always been minimal at Nuggy when the sim is healthy. There is something wrong either with sims software or the hardware their running on, to the point of having people in this very forum questioning if LL was running more sims per core...

Edited by MBeatrix

Share this post


Link to post
Share on other sites
3 hours ago, MBeatrix said:

This is something that happened before, and restarting the sim on another host always fixed it. The main problem is that LL has closed for holidays after restarting the sim (and who knows how many more).

Oh, I'm very aware that there's a performance problem. I'm just trying to get used to the idea it could look like this, with so much physics time -- but evidently so. Why not?

As to how many sim restarts, I can say that I haven't seen that happening this week for the several SLRR regions I monitor on the Atoll. Doesn't mean there couldn't have been many that did restart, but at least it must not have been super widespread.

Share this post


Link to post
Share on other sites
Posted (edited)
6 hours ago, Qie Niangao said:

Oh, I'm very aware that there's a performance problem. I'm just trying to get used to the idea it could look like this, with so much physics time -- but evidently so. Why not?

As to how many sim restarts, I can say that I haven't seen that happening this week for the several SLRR regions I monitor on the Atoll. Doesn't mean there couldn't have been many that did restart, but at least it must not have been super widespread.

If you go back to Nuggy and have a look at the antenna I have on a warehouse, at a dock, you'll see that the sim was restarted on the 28th (and came back healthy) and again on the 30th (with the result we know). Why was it restarted on the 30th? I have no idea.
http://maps.secondlife.com/secondlife/Nuggy/199/233/21

Edited by MBeatrix

Share this post


Link to post
Share on other sites
Posted (edited)
8 hours ago, Qie Niangao said:

Oh, I'm very aware that there's a performance problem. I'm just trying to get used to the idea it could look like this, with so much physics time -- but evidently so. Why not?

As to how many sim restarts, I can say that I haven't seen that happening this week for the several SLRR regions I monitor on the Atoll. Doesn't mean there couldn't have been many that did restart, but at least it must not have been super widespread.

Just adding something that happened when we first noticed the scripts issue, one and an half year ago, and that is somewhat related with KittyCats.

At the time, a support representative went there to check what might be wrong. The cause pointed was KittyCats collisions. The list that representative — obviously incompetent — offered had about half a dozen recent collisions. One of the collisions in the "report" even had occurred almost one year before that date. They just wanted to close another support case without doing anything relevant for it...

With this I mean that it's easy to go somewhere, have a quick check around and point something that may be causing issues without looking deeper to find out what the real problem is. Of course it's not your case, as you have no access to the info the Lindens can collect, if they are willing to.

There's no doubt the KittyCats contribute to lag but so do many other things. Also, there is a limit for KittyCats at a sim, and we could discuss if that limit should be lower or not but it is what it is. If that limit was set is because it was agreed it's a reasonable one.

And yes, I know you are aware of performance issues — it's been discussed for quite some time. As for the cause... You may well be right and physics sometimes cause problems during a restart. But the thing is, it's not only bad restarts around SL. Sometimes it feels like the whole structure is crumbling.

I am very disappointed with what the Lindens have been doing about the whole thing... The whole SL, that is. And it's kind of laughable that they expect me to extend my premium membership till September 2022 in advance when they offer a discount that still will make me pay more than I did before they played their tricks with prices and fees, and their product is the way it is. It almost feels offensive...

Edited by MBeatrix

Share this post


Link to post
Share on other sites
53 minutes ago, MBeatrix said:

As for the cause... You may well be right and physics sometimes cause problems during a restart. But the thing is, it's not only bad restarts around SL. Sometimes it feels like the whole structure is crumbling.

I don't have anything like a full-formed theory of what's going on -- and I know it's not always the same thing everywhere. Seylar, for example, is doing something completely unlike stuff I've seen before, with nearly everything in "Sleep Time". In the past we've seen sims get stuck with much of the frame devoted to "Pump IO". And then there's a more common case that seems consistent with a restart landing on a very overloaded server: this is more the standard distribution of frame times but everything needing more, so Scripts end up with very little left (and Pathfinding nearly stopped, in sims that have Pathfinding characters). I suspect now that this is just what's happening with Nuggy and that I just somehow never saw a region that had enough Physics to make it inflate so large when the sim is running slowly -- but it makes perfect sense that it could happen and I just never saw it before Nuggy.

One thing I suspect is a factor in what gets worked on and what doesn't is the impending move to the cloud. Whatever the current virtualization scheme they're using to share a server full of CPU cores among simulations, if that would need fixing to overcome this "unlucky restart" problem (as I suspect), they may well be justified in thinking a better solution will be obtained sooner by finishing the cloud migration than by throwing effort into some infrastructure code that will be obsolete before it can even get through QA. But this is all a guess because anything about the cloud is shrouded in secrecy, especially anything regarding schedule or progress.

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
1 hour ago, Qie Niangao said:

they may well be justified in thinking a better solution will be obtained sooner by finishing the cloud migration than by throwing effort into some infrastructure code that will be obsolete before it can even get through QA.

Snap.

It's making me wonder if there's actually any point in complaining at the server user group meetings, if they're on a course that can't be altered.

Edited by Profaitchikenz Haiku

Share this post


Link to post
Share on other sites

Please excuse my asking what is probably a blindingly obvious question, Qie and Prof, but your assessment of the parlous state of SL at the moment leaves me wondering about the two-way communication between us users and The Lab.  

Does it even still exist?

It's not my choice of rumination on this first day of 2020, but it does concern me that this might be a very bumpy and dark year.

  • Like 1

Share this post


Link to post
Share on other sites

Hi all! I was really hoping my first post of the new year would be more jolly, but here we are.  Happy New Year, though! 

For a couple of years now, we've had automated tools, aptly named "Grid Poking Bot" (GPB for short) responsible for doing region restarts, and this has been working quite well - most of the time. Very unfortunately, there was a problem with the GPB over the holidays, and due to a combination of events, it took us much too long to notice - and we finally caught it in part thanks to this very forum thread and a certain vigilant "Spray Can".  We're now actively pursuing the least disruptive ways to address this problem as quickly as possible.  We'll have a more detailed postmortem blog in a couple of days as well.  

We're very sorry about souring your holidays.  

  • Like 2
  • Thanks 7

Share this post


Link to post
Share on other sites
2 minutes ago, Grumpity Linden said:

Hi all! I was really hoping my first post of the new year would be more jolly, but here we are.  Happy New Year, though! 

For a couple of years now, we've had automated tools, aptly named "Grid Poking Bot" (GPB for short) responsible for doing region restarts, and this has been working quite well - most of the time. Very unfortunately, there was a problem with the GPB over the holidays, and due to a combination of events, it took us much too long to notice - and we finally caught it in part thanks to this very forum thread and a certain vigilant "Spray Can".  We're now actively pursuing the least disruptive ways to address this problem as quickly as possible.  We'll have a more detailed postmortem blog in a couple of days as well.  

We're very sorry about souring your holidays.  

Thanks for the info, but Nuggy is still buggered and got no response from Support despite having filed a ticket on the 30th, not long after Live Chat closed (+3 PM SLT).

Share this post


Link to post
Share on other sites
Posted (edited)
4 hours ago, Aishagain said:

Does it even still exist?

Communication? Yes, except when there's party-goers with multi-line chat gestures :)

 

However, because we haven't signed Non-disclosure-agreements (and a few other reasons), there is a definite limit to what answers can be given, which can be a bit frustrating at times.

Edited by Profaitchikenz Haiku

Share this post


Link to post
Share on other sites
1 hour ago, Grumpity Linden said:

and a certain vigilant "Spray Can"

It took me a few minutes to work out who said bug-splatter was :)  At least we now know they push their own button :)

  • Haha 1

Share this post


Link to post
Share on other sites
8 hours ago, Profaitchikenz Haiku said:

It's making me wonder if there's actually any point in complaining at the server user group meetings, if they're on a course that can't be altered.

3 hours ago, Aishagain said:

Please excuse my asking what is probably a blindingly obvious question, Qie and Prof, but your assessment of the parlous state of SL at the moment leaves me wondering about the two-way communication between us users and The Lab.  

I don't know about complaining, per se, but I do think the Lab listens to us when we try to help them understand the impact of what we're experiencing. And I think they tell us as much as they think they can about what they're doing about problems, but within constraints I don't entirely understand.

I do understand that details about the cloud migration (one particularly important part of 2020 work) could lead users to some superstitious interpretations of what they're experiencing on the grid. (And I of all people, especially in this thread, should be cognizant of jumping to unwarranted conclusions!) But I do think it would be useful to tell us that some specific issues just won't get worked because solving them is throwaway work after the cloud migration is complete.

Thing is, they might also need to spread that word more broadly internally, and help Operations and Support understand the consequences to their workload: This stuff won't get fixed, so you may want to be on the lookout for it and find ways to make response more efficient temporarily, until the migration is complete. Maybe they already know, or maybe it doesn't matter to them, but I think I'd make it my business if I were managing those resources.

Another thing about what we share with the Lab:

They need to know details of these sim performance problems now, pre-migration, because they could gravely affect the cost effectiveness of cloud hosting. Sims that currently can't keep up with demand are representative of bottlenecks somewhere, likely caused by how sims are currently hosted on datacenter hardware. If those bottlenecks are lifted in the cloud, the resulting increased processing could get very expensive unless bottleneck-like throttles are in place. (Such throttles may be a natural effect of their specific cloud hosting arrangement, but if so, the performance problems need working independently of cloud migration.)

 

  • Like 3

Share this post


Link to post
Share on other sites
Posted (edited)
16 minutes ago, Qie Niangao said:

They need to know details of these sim performance problems now, pre-migration, because they could gravely affect the cost effectiveness of cloud hosting.

( deleted, this silly old duffer was confusing server restarts with region restarts)

It would help a great deal if there was a definite set of figures to associate with a good, struggling or dead-as-a-parrot region when shifting to a different platform. It would also help if there was more up-to-date information about some of the newer metrics available in the statistics pane, for example the pinned and low LOD measurements, or what the relationship between some of the times means. 

Edited by Profaitchikenz Haiku
geriatria

Share this post


Link to post
Share on other sites
Posted (edited)

Possibly related: I started seeing a number of Mainland regions restarting New Years Day around 18:00 Zulu, so I guess about 11 AM Pacific.

[ETA: I saw the reports much later, but that's about when the restarts started happening.]

Edited by Qie Niangao

Share this post


Link to post
Share on other sites

One of my own regions is in bad shape. Ping regularly over 100, physics under 20, crashed twice last week. 

Ticket open with LL since the first crash on the 1st, still no answer.

  • Sad 1

Share this post


Link to post
Share on other sites
2 hours ago, Gadget Portal said:

One of my own regions is in bad shape. Ping regularly over 100, physics under 20, crashed twice last week. 

Ticket open with LL since the first crash on the 1st, still no answer.

I recommend Live Chat Support. They're usually quick to respond and perform the restart. Ask them to restart the sim on another host.

Share this post


Link to post
Share on other sites

It's referenced in Status, but not here in the forums. And the message is a bit unusually worded, but they've started running restarts.

The last restarts currently recorded are in the Incident History section, and were at the end of November, and I am sure that is wrong. Somehow they seem to have lost a lot of what happened in December. Is it they only have non-scheduled stuff as Incident History?

I'll try not to be surprised by Release Channel restarts on Wednesday, but they say often enough that they don't like having servers running for more than a couple of weeks without a restart, so this is all looking a bit clumsy

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...