Jump to content
animats

Is LL putting more sims on fewer servers?

Recommended Posts

Posted (edited)

Of possible interest to this thread, extracted from the Server User Group this afternoon:

Quote

[2019/05/28 12:06] @Torric Rodas : It's the time I've ever seen this region running less than 100% scripts, although most of us would be happy with the 93% this is achieving. Any news on that Simon?
[2019/05/28 12:06] @Simon Linden : Nothing concrete, but there's work going on with the script scheduling and we're pretty confident we can make it better
[2019/05/28 12:07]  Joe Magarac ( @animats ): Let us know when you have a sim on the beta grid where we can test.
[2019/05/28 12:07]  Torric Rodas: they've cloned one of mine to beta grid and that runs at 100% all the time
[2019/05/28 12:07]  Torric Rodas: on this grid it runs at 40
[2019/05/28 12:08] @Rider Linden : Yes.  That surprised me.  I was hoping for worse performance on Aditi...
[2019/05/28 12:08]  Simon Linden: having avatars there is the usual difference ... all our extras and HUDs and attachments loads down the regions
[2019/05/28 12:08]  Loading... ( @Whirly Fizzle ): Does the region have neighbours on Agni though?
[2019/05/28 12:08]  Rider Linden: Yes.
[2019/05/28 12:09]  Torric Rodas: it does whirly

I don't know which sim was cloned but if it happens to be part of the London cluster there indeed could be many avatars on the main (Agni) grid.

Edited by Qie Niangao

Share this post


Link to post
Share on other sites

As discussed previously, idle scripts doing nothing still use sim time. Somewhere between 5000 and 6000 scripts, they use all the sim time available for scripts.

This just happened in Vallone. At around 4000 scripts, the sim was sluggish, but my pathfinding characters still worked. They were getting about 25% of the pathfinding cycles they should, but kept running. Then someone else on the sim built four skyboxes, and the number of active scripts went to about 5500. Pathfinding steps dropped to below 5% of normal. My pathfinding characters now run into walls, go off the parcel, and can't walk in a straight line. They used to work for weeks at a time. Now they fail every few minutes. Pathfinding has lower priority than scripts, so when script time gets really tight, pathfinding almost shuts down.

Not happy about this.

  • Haha 1

Share this post


Link to post
Share on other sites
Posted (edited)

I have been following this discussion with intense interest, having a similiar experience with scripts running at a very low % and HUDS, menus, and scripted items having a very delayed response time that is not duplicated when retrying them at Hippo Hollow or other low use regions.  I have submitted a JIRA that was just accepted https://jira.secondlife.com/browse/BUG-227099

Lacking the extensive knowledge of many contributors to this thread, I can only affirm how frustrating this situation is. 

Edited by sunderezz

Share this post


Link to post
Share on other sites

The main conclusion from Server User Group today is that the developers have no idea what's gone wrong with SL performance.

 

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)

There are some themes & patterns that have emerged,.  In my attempt to learn &  understand I will list what I have "gathered"  I am eager to hear what I missed or understood incorrectly. 

Today the owner of the region where I have my venue took down several emitters of butterflies and moths that I had asked her consider removing in mid April.  They were showing a very high rate of object updates.   I was stunned to see that with these gone, the sim script % increased from 30ish % to 60%ish.  The physics allocated memory of 52ish stayed stable.   This has me so curious as to how to effectively test for non-script related things that  suck up server resources. 

 

  1. Sometime in April many regions appear to have been effected by a systemic event related to SL servers.  One  outcome consistently reported appears to be a slowing down of scripts in the regions effected.  The exact nature of this event has yet to be definitively identified (or I have missed this).
  2. Situations within a region that amplify the impact of the "alleged event" and subsequent reduction of script speed may include:
    1. A high number of objects (over ______?)
    2. A high number of scripted objects and of this group, a high number of scripted objects with running scripts (over __________?)
    3. A high number of collisions
    4. A high number of emitters and moving objects that use simulator resources to read object updates constantly
    5. A high number of 1024 textures (while this has been named, is it a server side issue, client side, or both?) 
  3. Key indicators of region health include 
    1. Sim FPS 
    2. allocated physics memory 
    3. script percentage
    4. server time
    5. total frame time
  4. Steps to take to "trouble shoot" and improve a region's function seem to  include 
    1. Remove or stop running scripts in houses, buildings, landscaping, fires, lights, special effects (pretty swirling stardust for example), art, etc. 
    2. Carefully consider your choices when building & selecting scripted objects with "active scripts" ("active" meaning running scripts) used in a parcel/region. 
    3.  Keep the stats window open to assess impact of scripted objects as you work. 
    4. Remove all temporary rezzers
    5. Remove or stop all moving animals, breedables, and emitters that generate temporary objects generating constant object updates,  this can have  HUGE impact on server resources (ex - a dragonfly emitter, swimming fish) 
    6. Reduce total number of objects\Reduce total number of scripted objects
    7. Become informed about the many ways scripts can "eat up" server resources \See if you can amplify the issues in your region by "testing" via putting out a "set" of suspect items.  For example, suspecting emitters were making things worse I rezzed 10 emitters that showed a negative impact on sim stats script %. \sim restarts https://community.secondlife.com/forums/topic/100232-what-happens-during-a-sim-restart/
  5. Tools to use in trouble shooting include
    1. Stats window kept open 
    2. area search (highlight objects,  right click, select "show scripts' from the menu
    3. the "pie chart" access by right clicking on an object, find "scripts" and select for script info and option of stopping scripts in modifiable objects.
    4. records using print screen to record region statistics using ONLY the Second Life Viewer (I have been told that Lindens will not use data from the FS viewer but do not know if this is accurate.  
    5. Turn on "Show Updates" In Developer Tools > Show Window > Show Updates to Objects . . . . ALL the blue flowing streams represent updates to objects.
    6. every objects' land impact rating https://community.secondlife.com/knowledgebase/english/calculating-land-impact-r273/      https://community.secondlife.com/forums/topic/83294-prims-prim-equivalent-land-impact-a-too-long-guide/

 

 

RELEVANT THREADS & LINKS

It really is the number of idle scripts that drags down a sim

 

The Down and Dirty Truth on Lag and How You Can Improve Your Viewer’s Performance.

https://blog.zoha-islands.com/the-down-and-dirty-truth-on-lag-and-how-you-can-improve-your-viewers-performance/

memory allocated too high?

 

 

Edited by sunderezz
  • Like 2

Share this post


Link to post
Share on other sites
4 hours ago, sunderezz said:

Today the owner of the region where I have my venue took down several emitters of butterflies and moths that I had asked her consider removing in mid April.  They were showing a very high rate of object updates.   I was stunned to see that with these gone, the sim script % increased from 30ish % to 60%ish

As I had mentioned in my previous posts in this thread, I am fairly certain the script execution time issue is related to a networking issue on LLs end.  I too saw the same exact things on regions I've been able to test on.  Basically the more packets per second the region is sending out, the lower the script execution is.

Share this post


Link to post
Share on other sites
2 hours ago, NeoBokrug Elytis said:

I am fairly certain the script execution time issue is related to a networking issue on LLs end

This was mentioned at the recent Server User Group meeting, with a particular type of vendor that made frequent network calls as an example. However, you're saying that it isn't just http-type activities, but basic server actions such as a newly-rezzed object, or a  changed position in the region of a moving object? Even a slideshow on a screen in a cinema where every 60 seconds the textures change is going to be doing this.

Share this post


Link to post
Share on other sites
On 5/24/2019 at 2:43 AM, Resi Pfeffer said:

Is there a way to contribute to a Jira? Im renting a homestead parcel what has about ~30% script run since weeks (Higginston), and the landlord GLH just doesnt know what to do at all.

Sort of... Different people (users) have different permissions in the JIRA. That leads to various people (users) providing different answers as to what can and cannot be done in reports. So, two conflicting answers can both be right. Plus, some users can change the security level of the reports and those changes affect what we can do and see..

If you cannot comment on a JIRA, post another Bug Report and reference the JIRA number you are responding to. If the Lindens can use the info they will add it to their internal report. Solid data is always welcome.

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
4 minutes ago, Nalates Urriah said:

Solid data is always welcome.

As the script run jumps between 25% and 75%, my data is not very solid, but worth to have a look at :)
Thanks :)

Share this post


Link to post
Share on other sites
Posted (edited)

As far as idle scripts go.  I went to Calleta, what was supposed to be the hallmark of lag.  Not even 3k scripts going.  Don't think they were lagged either :(

But, are we hearing anything new?  Since the beginning of time scripts were blamed.  And if it isn't chicken farms, then horse farms, now I have a DFS farm as a neighbor.    Do these not cause lag anymore?  It used to not bother me, because those folks always abandoned in well under a week.  Until now...

I wish I could get a script count per parcel.  Because running around and deleting scripts from things that haven't been touched in well over a year, certain lights, and street signs, I don't see how you can get thousands of scripts by regular normal usage.  I think we've become uneconomical in our usage of things.  When I first started, it was like a panic just having a few scripts that you didn't delete.

Edited by Israel Schnute

Share this post


Link to post
Share on other sites
Posted (edited)

The console you get with a private island shows scripts by resources which is probably the best information you;re ever going to get. I've never had a homestead so I can't comment on what they give, and I know that mainland doesn't give you much more than the parcel scripts list, What you need to know is how much time each scripts are consuming, which as far as I know can only come from the estate top-scripts tab. It was an eye-opener to me when I first started looking through what was on the island, and it's a good way to look for very intensive scripts and then play around with them to see what you can do to improve things.

 

You could manually get the counts per parcel from the parcel info tab which gives you the kBytes for each scripted object in the parcel, but I have found the size of a script is not indicative of the amount of time it is consuming, think of a data store script that does nothing but receive a link message and add the data to a list of visitors if they have not already been entered, you would see seven scripts there (one for each day of the week), but the actual time consumed by those scripts would be very small. The visitor sensor that scans an area every 20 seconds, however, would be swallowing a lot more time.

Edited by Profaitchikenz Haiku

Share this post


Link to post
Share on other sites

My script info box always worked well for my stuff, but there is a tab Control+Shift+1 and Control+Shift+2 that gives a lot of data on whatever sim I go to immediately.  
I just wonder if a sim says x number of active scripts, how to tell what number is on the exact parcel.  Might be something only the Labs can check since there was an abuse report option at one time for excessive sim resource usage.

Share this post


Link to post
Share on other sites

 

6 hours ago, Profaitchikenz Haiku said:

You could manually get the counts per parcel from the parcel info tab which gives you the kBytes for each scripted object in the parcel, but I have found the size of a script is not indicative of the amount of time it is consuming...

1 hour ago, Israel Schnute said:

I just wonder if a sim says x number of active scripts, how to tell what number is on the exact parcel.  Might be something only the Labs can check since there was an abuse report option at one time for excessive sim resource usage.

You can kind of approximate script count from those memory numbers (for your own or your group's parcels only). The vast majority of modern scripts are going to list at 64K each (even though they won't be using that much really). And script count correlates surprisingly (disturbingly) well with script time. It's very crude, but there's some signal in the noise.

Of course there are plenty of special reasons a particular sim might be script lagged. I was looking into one problematic sim earlier today. Somewhere something was keeping scores of scripted objects temp-rezzed each minute. They may be tiny but it takes CPU time to load those scripts into the sim and scheduler and then remove them when the temp object expires, over and over.

Has temp-rezzing scripts gotten slower? Who knows, but also: who cares? Every sim would be better without any of it going on.

Share this post


Link to post
Share on other sites
18 hours ago, Nalates Urriah said:

Sort of... Different people (users) have different permissions in the JIRA. That leads to various people (users) providing different answers as to what can and cannot be done in reports. So, two conflicting answers can both be right. Plus, some users can change the security level of the reports and those changes affect what we can do and see..

If you cannot comment on a JIRA, post another Bug Report and reference the JIRA number you are responding to. If the Lindens can use the info they will add it to their internal report. Solid data is always welcome.

It's a known bug in the JIRA system that you can't comment on a JIRA entry by another person, even if you have "commentor" permission.

Share this post


Link to post
Share on other sites

Here's an experiment we might try to get LL to run.

  • Find a region that's struggling with high script load.
  • Copy the region to the beta grid, something which LL does for testing.
  • Put it on a server with at least 2, preferably 4 CPUs, all by itself.
  • Record load on that server.
  • Invite people to visit the sim on the beta grid.

Some tasks in the sim code, such as rezzing and sim to sim copy, have been moved off the main thread. That only helps if you have an additional CPU to run that stuff. It's entirely possible that a busy region now needs more than one CPU to run it properly. This needs to be tested and we users need to know.

LL has sort of been assuming that one CPU per sim is a maximum, not a minimum.

Share this post


Link to post
Share on other sites
7 hours ago, animats said:

Some tasks in the sim code, such as rezzing and sim to sim copy, have been moved off the main thread. That only helps if you have an additional CPU to run that stuff. It's entirely possible that a busy region now needs more than one CPU to run it properly.

Hmmm. A single core runs multiple threads, it just can't run them in parallel, so it can't get any benefit from steps to multi-thread the code. If those steps introduced overhead (very possible especially if there are rendezvous sync points) then the simulator could indeed have gotten slower as a result of multi-threading if there are no processors to spare.

(The opposite is also possible, of course. Back in the dark ages when POSIX threads were new and shiny, nobody had multiple cores, but multithreading was still advantageous, a relatively efficient way to let some processing proceed while some other processing was blocked in wait state.)

Share this post


Link to post
Share on other sites
Posted (edited)

Talking about wait state, I spotted something last night in a sim I was visiting, typical condition of scripts run under 50%, script time was up at 16mSec with no spare time, but the Physics sleep time was 5mSec (This was Firestorm which seems to give and extra time section that Singularity doesn't). Is there a chance that the server is being forced into a sleep state when it should instead have been giving that time as spare time? 

The region my parcel is on got a restart yesterday and has once again gone back to 102%  Script time is 15-17 mSec, spare time 2 - 4 msec.

Edited by Profaitchikenz Haiku

Share this post


Link to post
Share on other sites
4 hours ago, Qie Niangao said:

Hmmm. A single core runs multiple threads, it just can't run them in parallel, so it can't get any benefit from steps to multi-thread the code. If those steps introduced overhead (very possible especially if there are rendezvous sync points) then the simulator could indeed have gotten slower as a result of multi-threading if there are no processors to spare.

Right. The sim code has a main loop - get info from viewers, do physics, do scripts, update world, tell viewers. Repeat. 45 times a second. That's currently single-thread, and if the sim is out of script time, it's using 100% of one CPU just to do that. When the sim runs out of CPU time, it first sacrifices pathfinding, then script time, then cuts the sim frame rate.

Other stuff, such as rezzing, and (not sure about this) some of the bulk copying associated with teleports and region crossings, has been moved to other threads. Those need CPU time too. If there are other cores available in the same computer, that work can proceed in parallel with the main loop. A single sim can now use more than one core effectively. Under heavy load, a sim may now need more than one core.

LL puts multiple sims on one computer. The usual allocation is supposedly 1 core for a full sim, less for homestead and open space sims. That may not be enough any more. LL really doesn't seem to have faced that. They're assuming that the sims on one computer will include some that need less resources than others, so that, across all the cores, there will be some spare CPU time. That assumption seems to have broken down. We've seen reports here of people restarting their estates repeatedly in hopes of being assigned to a computer with some spare time. That doesn't seem to work any more. The idle time just isn't there.

This is a big problem for LL. Their business model has assumed that the compute load per sim will not increase. It's especially a problem with the planned move to AWS, where cost goes up linearly with the number of CPUs. Look at the "C5" instances, for  heavy compute usage. If you buy your own machines, you get better price/performance if you buy more cores, but not on AWS. (AWS has to do this, or they'd get people buying big instances and subdividing them into little instances to undercut AWS's own pricing.)

I wonder if LL management has thought this through. There are articles indicating that AWS is a huge financial lose for compute-bound services.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...