Jump to content

Deploy Plan for the Week of 2020-11-16


Recommended Posts

Second Life Server

We will be updating all currently uplifted Second Life Simulator to version 552183.

https://releasenotes.secondlife.com/simulator/2020-11-12.552183.html

Scheduled Wednesday** 2020-11-18 07:00-10:30 PST

Second Life RCs

https://releasenotes.secondlife.com/simulator/2020-11-12.552183.html

Scheduled Tuesday*** 2020-11-17 07:00-10:30 PST

Additional Rolls

We will be moving the rest* of the Second Life Server channel currently running in the datacenter to the cloud running version 552183 Thursday 2020-11-19 07:00-10:30

On Region Restarts

The uplifted Grid Poking Bot is running on Aditi, and we hope to move it to Agni as soon as we've run a last battery of tests on it. This doesn't help much for restarting regions this week, but as noted above we anticipate restarting almost all of them as part of this week's rolls anyway.

Additional Notes

* We will be keeping a few regions (Named "Debug1", and "Debug2") in the datacenter for Residents to use for workarounds to known issues with the Cloud

** We've moved the Second Life Server roll back a day (see below) to allow faster iteration of some high priority fixes; if the fix released to RC Tuesday does what we want, we're open to rolling it to Second Life Server on Wednesday or Thursday.

*** We're rolling the RCs first this week to get additional data on how well a change our engineers implemented in the new version works to fix issues Residents have been experiencing in uplifted regions

Edited by Mazidox Linden
  • Like 5
  • Thanks 8
Link to post
Share on other sites
5 hours ago, Mazidox Linden said:

We will be keeping a few regions (Named "Debug1", and "Debug2") in the datacenter for Residents to use for workarounds to known issues with the Cloud

I would put autoreturn times on every parcel in those regions or your going to have "trouble", heh.

Link to post
Share on other sites
4 hours ago, Lucia Nightfire said:

I would put autoreturn times on every parcel in those regions or your going to have "trouble", heh.

Not too short, though. It depends what sort of load you want for testing. They're almost sandboxes in function.

I don't know how long you'll need them for, but having an identifiable debug sandbox for MC and each of the RCs might be a good idea, even if they end up on the Cloud. Not worth fussing about now, but I can see there being some post-Uplift tweaking of old customs. 

Link to post
Share on other sites

I really do not understand the differences between Main Server and Second Life Server...I have to say that these terms seem almost designed to confuse folk. 

As far as I can tell Thursday is the day that the balance of regions will be uplifted to the Cloud (excepting the ones referred to in Mazidox's original post).

With particular reference to you @Mazidox Linden, I am never going to get the image of you as a bug spray can on that Lab Gab video out of my head...it is priceless! 😄

  • Thanks 1
  • Haha 1
Link to post
Share on other sites

I for one second Aishagain's request.  Have known Main Channel (where are private sim cluster resides) and RC channel for a long time.  Does SLS mean everything finally going to be uplifted by Thursday night?  Right now the Raglan Shire cluster is 4 cloud sims and 4 regular sims.  We would really like to have them all in the cloud.  Hoping a lot of the little issues that folks have had in the last week would go away.  Also we end up wondering if we should restart our non cloud sims, which are our busiest ourselves.  I know this frustration is being felt by other private clusters.

  • Like 1
Link to post
Share on other sites
13 hours ago, arabellajones said:

The Wednesday activity has started.

I am very confused as to what is happening on Thursday, with the various new labels appearing. Different timezones for me, but should I bother with anything planned before Friday?

If I am reading this correctly, Mazidox Linden is telling that as of today (Wednesday, 18 Nov 2020), most of Second Life has been rehosted to Amazon Web Services.  And tomorrow (Thursday, 19 Nov 2020), all the rest will be rehosted to AWS.   After that, only those two regions, Debug1 and Debug2, will continue to be hosted at Linden Lab's data center.

Within my experience, as a user, the uplift process has been very fast and seamless.  If your region has already been rehosted, no worries at all.  If it is still waiting to be moved on Thursday, treat it like any region roll day--just hold off on rezzing no-copy stuff and prepare to run away for a few minutes if you get a region restart message.

  • Like 1
  • Thanks 1
Link to post
Share on other sites
8 minutes ago, Camden McAndrews said:

If I am reading this correctly, Mazidox Linden is telling that as of today (Wednesday, 18 Nov 2020), most of Second Life has been rehosted to Amazon Web Services.  And tomorrow (Thursday, 19 Nov 2020), all the rest will be rehosted to AWS.   After that, only those two regions, Debug1 and Debug2, will continue to be hosted at Linden Lab's data center.

Within my experience, as a user, the uplift process has been very fast and seamless.  If your region has already been rehosted, no worries at all.  If it is still waiting to be moved on Thursday, treat it like any region roll day--just hold off on rezzing no-copy stuff and prepare to run away for a few minutes if you get a region restart message.

That was my reading too, and fits my observations of regions I've TP'd today

All the RC channels finished being uplifted a couple of weeks ago, so Tuesdays rolls were just the RCs. Today has been uplifted main channel regions.

  • Like 1
Link to post
Share on other sites

Just to clarify yes, you all are correct that by about 24 hours after this post (and hopefully a lot sooner!) we will have all regions on the main grid in the cloud, except for a couple of regions we're keeping around to help work around issues like BUG-229623 which reproduce in the cloud, but not in the colo.

  • Like 1
  • Thanks 6
Link to post
Share on other sites

Saying 'goodbye' to my uptime record.  😉  Simulation of region <redacted> has been running 37 days 0 hours and 37 minutes.  It was a good run, too.  See you in the next hardware!

  • Like 2
  • Haha 1
Link to post
Share on other sites
1 hour ago, Ardy Lay said:

Saying 'goodbye' to my uptime record.  😉  Simulation of region <redacted> has been running 37 days 0 hours and 37 minutes.  It was a good run, too.  See you in the next hardware!

Here's a record for me when the the region I typically hang out in went to AWS:

[2020/11/11 04:58:10]  Env Monitor (KVP)(GS): Animesh Adult has restarted.
 Start Date & Time: 2020-11-11 @ 04:58:01 PST
 Last Est. Stop Time: 2020-11-11 @ 04:53:12 PST, Est. Downtime: 4 minutes, 49 seconds
 Last Sim State Save Date & Time: 2020-11-11 @ 04:53:15 PST, Sever Time Loss: 4 minutes, 46 seconds
 Last Est. Uptime: 50 days, 23 minutes, 2 seconds
 Environment Change: Sim Version & Sim Host have changed.
 Current Sim Version: 2020-11-09.551942, Prior Sim Version: 2020-09-11T22:25:15.548903
 Current Sim Host: simhost-0e06ddc284f36b2d4.agni.secondlife.io Prior Sim Host: sim10707.agni.lindenlab.com

  • Like 1
Link to post
Share on other sites

I have to say that while I am all for the march of progress, this past week has nearly given me a stress breakdown with all of the issues I was having on my development sim.  Inventory not saving, scripts taking forfreakingever to load, textures not rezzing properly and a host of other bizarre little happenings nearly did me in.  I am sincerely hoping that todays restart comes with some serious fixes because I am at my wits end trying to get things done for all of the pre-black-friday-sales-events-deadlines I have piling up.  And as if development were not challenging enough -  I cannot seem to teleport more than about twice before I crash out. 

@Lindens - how do I go about working in one of the non-updated servers?  For the sake of my RL health I need some reprieve from this frustrated/uplifted experience I am having.

Link to post
Share on other sites

There are not any regions left in the datacenter that are suitable for long or medium term projects. If you are encountering issues on uplifted regions, please file a bug report for each issue you are encountering at jira.secondlife.com

Link to post
Share on other sites
18 hours ago, Mazidox Linden said:

Just to clarify yes, you all are correct that by about 24 hours after this post (and hopefully a lot sooner!) we will have all regions on the main grid in the cloud, except for a couple of regions we're keeping around to help work around issues like BUG-229623 which reproduce in the cloud, but not in the colo.

I just want to say CONGRATULATIONS!  This is over a month before Oz's long-time target of getting the whole grid on the cloud by the end of the year, which leaves ample time to stabilize, fix, and tweak before January 1.  In my long experience managing software teams, beating a committed schedule is a rare and joyous event! 

Kudos to the skill of the team in achieving this technical milestone, and to Oz in the artful management of expectations :)

 

  • Like 7
Link to post
Share on other sites

Hope they are going to fix the idle script bug soon, too. Pretty sure lower performance of my region is because of this. https://jira.secondlife.com/browse/BUG-229611

My region was among the last to get moved today, and script time went down from 90-100% with a 1ms spare time left, to 65-80% without any spare time. With an exact number of scripts/objects/events (it's my region, so no, there are no changes in anything). Doing manual restarts now, since even before the uplift I did encounter way lower performance after some rolls/restarts (my guess was that my region ended on the same host as some very busy regions, like events or those afk sims), but no luck after 7 restarts.

So far mildly disappointed, I thought that AWS servers will run it WAY better than outdated hardware at LL's datacenter (according to what Oz said https://modemworld.me/2020/11/07/lab-gab-november-6th-cloud-uplift-update/#what2 ). So far I see worse performance and nothing else. Picture of region stats below. I can and will cut a few more scripts (goal is 4.2k) after I finish rebuilding a couple of areas, but the point that it so far runs worse than on ancient hardware still stands.

 

2.PNG

Edit.

A few more restarts later and it's better (and still, it barely runs the same on a modern AWS hardware vs old datacenter's one). 95-99% scripts run, although almost no spare time left. I'd really like if LL could fix that inconsistency in region's performance while they are messing with how it runs on AWS. Region shouldn't have up to 1/3 difference in performance based on literally nothing other than some internal stats/hosts that we can't even see.

 

3.PNG

Edited by steeljane42
Link to post
Share on other sites

Lots of inconsistency among regions on the AWS cloud.

My parcel on a private region, which was running 99% Scripts Run, and 7.50 ms of Spare Time before the uplift,  is now running 84% Scripts and 0.001 ms Spare TIme after. 

No changes in active objects or # of scripts before and after the uplift that I can see, although the script time is using up most of the server cycle time now.

1803471016_StatisticsAWScloudHomecrop.png.75ca90a6d64a423ff00220dab9fb3605.png

 

But on my mainland parcel, I don't see any difference between SL/agni and the AWS cloud performance:

897218301_StatisticsAWScloudSandstormcrop.png.122abe2f9cb7cc7d6f46ee84c7f16852.png

 

 

Edited by Jaylinbridges
Link to post
Share on other sites

If you start with an empty region and plot Active Scripts and 100% scripts run while ranging active scripts from 0 to 9000 you get an interesting chart.

I am wondering if a person could just sample the script load of existing regions to produce a reasonable data set or if the varied script activity would make too much noise.

Hey @Mazidox Linden !  I have a suggestion for you!

Edited by Ardy Lay
Link to post
Share on other sites

Just adding my two pennorth to this I echo Nika Talaj

4 hours ago, Nika Talaj said:

I just want to say CONGRATULATIONS!  This is over a month before Oz's long-time target of getting the whole grid on the cloud by the end of the year, which leaves ample time to stabilize, fix, and tweak before January 1.  In my long experience managing software teams, beating a committed schedule is a rare and joyous event! 

Kudos to the skill of the team in achieving this technical milestone, and to Oz in the artful management of expectations

I am not nor have I ever been a software team manager but I can appreciate a big project well done. Congratulations to @Oz Linden and the team.

Link to post
Share on other sites

Yes, good job.  I don't want to sound like I am complaining about up-lift.  I know it's a good thing.  I only find the one problem and I suspect LL will be working to improve it and other things now.

Link to post
Share on other sites
19 hours ago, Dictatorshop said:

@Lindens - how do I go about working in one of the non-updated servers?  For the sake of my RL health I need some reprieve from this frustrated/uplifted experience I am having.

Our best belief is that performance and stability will be significantly improved following this weeks rolls. For example, our monitors of teleport success rate have returned to normal.

  • Like 2
  • Thanks 2
Link to post
Share on other sites

I've said this before. It takes a while after restarts finish for the Grid to settle down. I'm nut sure when things finished today, it was a long time after April Linden's announcement that the end of the maintenance popped up on Status. Also, there were a couple of severe drops in reported concurrency in the last 24 hours, one at about 18:30 UTC. I was logged in then, and didn't see anything obvious where I was.

I've come to expect high loads at weekends. Concurrency is already over 40,000, but right now would be a fair time to test. It's mid-range for concurrency.

The Lindens have done a big job, but I got the feeling that the grid status page could have used some better messages. A couple of times I wondered just what was happening. I found out, the hard way, that Uplifting a region does involve a restart, but some of the messages were oddly worded. I suspect that when they used "the SLS Channels" they should have typed "all SL's Channels". That's me guessing, it was a new way of labelling things, and I feel I shouldn't have to guess.

Anyway, things have worked out pretty well.  I expect there are things still to do, some likely tweaks to the sim servers and other services, and there's the real world out there to worry about too.

I'm in real-world lockdown where I am. I am planning for a lonely Christmas. I've had enough funerals this year. Wherever you are, be careful.

  • Like 1
  • Thanks 1
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...