Jump to content

Deploy Plan for the Week of 2021-03-15


Mazidox Linden
 Share

You are about to reply to a thread that has been inactive for 1124 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

  • Lindens

Second Life Server:

No Roll

Second Life RC:

Version 556847 will be deployed to all RCs

Release Notes can be found at https://releasenotes.secondlife.com/simulator/2021-03-11.556847.html

Wednesday 2021-03-17 07:00-10:30 PDT

On Region Restarts:

All regions should have been restarted within the last 10 days, so we don't anticipate any rolling restarts other than the RC version change this week.

  • Thanks 11
Link to comment
Share on other sites

5 hours ago, Profaitchikenz Haiku said:

And after the restart - scripts run 98%

I can't remember when I last saw anything like that!

tenor.gif

That's amazing!  I hope it's actual performance and not just adjusted numbers!

If you'd like to share your discoveries @Mazidox Linden I am super eager to hear the story of how you defeated the lag monster.

  • Like 1
Link to comment
Share on other sites

1 hour ago, NeoBokrug Elytis said:

I hope it's actual performance and not just adjusted numbers!

I hadn't thought of that. I won't know for certain until the main channel gets this and I can see the effect on my train and the funicular, both of which are really pitiful at the moment with 35% script run.

  • Thanks 1
Link to comment
Share on other sites

I can certainly think of ways in which the number of scripts running can be misleading, buried in the details of what a script does. Some old train scripts, still essentially standard, haven't been updated for a long time. I know there are "Puppeteer" scrips, moving prims in a linkset, which use obsolete command with built-in waits.

It could have been the measurement, it could have been the reporting of the measurement, or it could have been both.

It's not been easy find places to test.

Link to comment
Share on other sites

4 hours ago, arabellajones said:

It's not been easy find places to test.

(That gave me an excuse to tidy up and combine a few HUD scripts I've accrued that sample the sim environment on teleport.)

Some regions currently running 556847 with rez zones adjacent train stations (spare time estimates very subjective and ephemeral):

  • Eagan. No Spare Time; 70-80% Scripts Run (no rez zone at ONSR stop, but allowed on VRC parcel)
  • Epirrhoe. About 11ms Spare Time.
  • Foxglove. About 5ms Spare Time.
  • Mocha. No Spare Time; about 70% Scripts Run.
  • Tenera. Most of the frame is Spare Time (but probably always was)
  • Tuliptree. Maybe 2ms Spare Time (bursty)
  • Tussock. About 7ms Spare Time; I happen to know this one had maybe 4 ms before this update
  • Thanks 1
Link to comment
Share on other sites

Im confused.   ("What else is new, I hear you ask).

I think my first question should be, as Ive been a bit out of touch, is - how can I tell if a region is RC or Main channel.   I dont see it on the window nav bar and The server_channel argument to llGetEnv seems to return "Second Life Sever" everywhere.  Alternatively is ther ea list of which sims are in each channel?

As I understand it - RC channel should have 556847 and main channel should have 556255,

The reason Im looking into it is that I had thought that 556847 was only on beta grid and it had some sim crossing improvement code.  But Im actually seeing 556847 on a lot of main grid sims ( Blake Sea etc ).

Yesterday I observed dramatically improved sim crossings compared to the previous few weeks.  Looks like the improvements made it to RC channel.

 

Link to comment
Share on other sites

Yeah, I thought I was seeing 556847 on a lot of sims, more than were ever "RC" sims in the past, but I'm not sure. And yeah, server_channel doesn't seem to identify anything anymore, at least not anywhere I visited.

I have heard one report of a region that appears to have gotten noticeably worse Scripts Run % performance as a result of this new upgrade. I visited, but don't have any way of investigating deeper. It might be good to see if there are any patterns for regions that improve and those that don't.

  • Thanks 1
Link to comment
Share on other sites

One big question, which Ill try to get Simon to answer is ... what is the relative impact of the new code for each case of a border crossing

Old > New
New > Old
New > New

Right now Id say its all looking pretty good so maybe the better question is when will this to to Main Channel.

Also, is there any expected benefit to agent detaching and other sorts of collateral damage?  (ie Im not sure if thats just a possible comsquence of a slow crossing or if it can occur for other reasons.

  • Like 1
Link to comment
Share on other sites

The border-crossing issue is tricky. The NCI group has a balloon flight from Kuula, running through protected/public parcels, which has developed a bad failure on a crossing from Dedi region to Kalli region. Dedi region is 556847, the new server on the RC channel. Kalli is 556255 so Main channel. When we made a test flight the balloon stopped moving after making the crossing, close to the northern end of the boundary. Kalli has a 20m protected/public strip along the east and north sides.

There's just so many reasons why things might go wrong. I find myself wondering if the vaguely described monitoring tools the Lindens have installed will do anything useful. But the crossing worked, so why would anything be recorded?

 

 

  • Like 1
Link to comment
Share on other sites

I like to think that it is an actual fix even though I have no evidence of it yet; because it was a sudden change in performance.

Like the initial bug report, there was just a day where script performance drastically changed.  I have a 10 region estate that hosts rather complex MMO-like games (for SL).  Initially some of the systems behind the game started complaining as I didn't have failover checks for REALLY old components.  Then when trying to host events where there's a lot of avatars in the region and a simple rez intensive process going on, it really hit home to how bad things had gotten.  All of my regions had about 99% in most stats before the initial bug, but after that day is when I saw scripts run taking a huge dump.

I have been trying to develop a way to benchmark this invisible script lag, without any proper way to pull the "Scripts Run" stat.  I've found a single way to at least detect poor script performance, but not get consistent results with the results I'm getting.  I'll definitely be re-benchmarking things once this hits main channel and provide some feedback.

Link to comment
Share on other sites

1 hour ago, Miller Thor said:

I don't believe this supposed performance increase in the execution of the scripts until I see them with my own eyes, on our two main channel regions, in the statistics bar. 

You better not, yes. West Haven was always around 98% Scripts Run on 556255, and after the 556847 update Scripts Run dropped to around 50%. ☹️

Link to comment
Share on other sites

1 hour ago, Profaitchikenz Haiku said:

Oh dear, perhaps the improvement I've seen was just the "restart it enough times and it'll come good" phenomenon? 

I hope not but it's weird that some regions got a good increase in performance while others got worse, much worse.

I checked a few more of my reference regions on 556847 and only found one with Scripts Run around 50%. That's Nautilus - Adherbal (with about 6000 scripts).

I don't have the time to check them all right now, but will be doing that in the next few days.

[EDIT] West Haven shows less than 5300 active scripts.

Edited by MBeatrix
adding info
Link to comment
Share on other sites

2 hours ago, MBeatrix said:

I hope not but it's weird that some regions got a good increase in performance while others got worse, much worse.

I checked a few more of my reference regions on 556847 and only found one with Scripts Run around 50%. That's Nautilus - Adherbal (with about 6000 scripts).

I don't have the time to check them all right now, but will be doing that in the next few days.

[EDIT] West Haven shows less than 5300 active scripts.

I'm not sure how useful that checking might be because scripts run % and a few other stats are also affected by the physical server and what regions are on the same server as well. It's been the case for years and didn't become any better after migration to AWS.

For example my region. Regular 20k island, no adjacent regions, no breedables or other taxing stuff, little less than 4k scripts running. I usually restart (if needed after rolling restarts happen, andd I'm not happy with what I see) it until I see 90%+ for scripts run, if I get "lucky" and end on (apparently) less loaded server it's often is between 97% and 99%, but I saw as low as 55%. In which case I just restart again.

I'll see if 556847 makes any difference when it rolls onto the main channel.

Edited by steeljane42
Link to comment
Share on other sites

2 hours ago, MBeatrix said:

it's weird that some regions got a good increase in performance while others got worse, much worse.

The variable performance was one of the hallmarks of the initial problem, after several region restarts the performance would be back right up at the top again, but there was no obvious rule as to how many times or how far apart the restarts should be to solve the problem.

I asked a question ages ago at a server user group meeting about a load-balancing mechanism, it was (understandably) brushed aside, but there has to be such a mechanism, and it may well be that it will need some tuning (and might not distinguish between server versions anyway). I expect the key criteria they'll be looking for are firstly that nothing gets broken, and secondly, nothing gets obviously worse.

 

  • Thanks 1
Link to comment
Share on other sites

@steeljane42 @Profaitchikenz Haiku So, according to you both the problem isn't a bug. Is there a bug? LL Says there is, and I tend to believe them... Especially because they took almost a whole year to acknowledge the issue.

Now, according to what I see around, the fix didn't work. A couple more regions on 556847 that I've checked didn't get any improvement — Lipshen has around 40% Scripts Run and Dallows has around 55%.

My home region, Nuggy, is on 556255, has lots of breedables (KittyCats) and has 90% Scripts Run, average.

I'm aware that what I just wrote reinforces your opinion, but if LL says there is a bug, I'm sure there is.

Edited by MBeatrix
typo
Link to comment
Share on other sites

10 hours ago, Profaitchikenz Haiku said:

The variable performance was one of the hallmarks of the initial problem, after several region restarts the performance would be back right up at the top again, but there was no obvious rule as to how many times or how far apart the restarts should be to solve the problem.

I remember it used to be that way on regions I track closely, but I personally haven't seen that behavior in at least a few months: I've been getting about the same Spare Time (or Scripts Run percentage) after a restart, until Tussock showed this big improvement after the update. Maybe that's coincidental with it having gotten mediocre restarts for months and now finally an especially "good" one. But it's all anecdotal anyway, n of 1, could be somebody unwittingly lifted a lag monster, even. I'd love to see real statistics across the sims before and after the update.

  • Thanks 1
Link to comment
Share on other sites

As I said, I don't believe in the miraculous performance increase in script usage until I see it on Tuesday and find confirmation in a long-term monitoring that goes over several weeks and several restarts. In the 13 years I have already had regions, I have already experienced so much that with it I only believe what I see in black and white in front of me.

  • Like 1
Link to comment
Share on other sites

I'm now at a loss to explain what's happening. I have a home in a region that is (used to be identified as) an RC channel, and last week it went to 556847. As I posted above, I saw a great improvement on the scripts run %. I have a parcel in Zephyr that as far as I know is a main channel, and it has lousy scripts run %, but when I checked today, before the scheduled rolling restarts, it too was 556847. I help out with a friend's parcel on a private island that also has frequently bad figures, and as far as we both know, is a main channel, and when we checked today, that also is 556847.

ETA discovered that the predicted roll start time of 10:00 am on the status posts when I log in aren't the same as those quoted in the forum post, the rolls seem to have already happened.

Rolling Restarts for Second Life Main Channel

Tuesday, March 23rd, 2021, 10:01:28 AM (UTC)

And yes, the sim stats gatherer confirms Zephyr restarted just over three hours ago (at 6:40 or so AM).

So I have to say, based on experience, SecondLife IS a game, the game is trying to find out what published information you can believe and what you should just toss over your shoulder.

 

Whatever. It does though confirm Que's observation above, that 556847 does not by itself mean an improvement on the scripts run %. It also seems to confirm observations by somebody (Fourmilab?) that there are good and bad regions, good regions seem to stay good for some time, bad regions stay bad for some time, and very rarely you get one swap behaviour.

 

Edited by Profaitchikenz Haiku
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 1124 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...