Jump to content

Just How Bad Can Server Performance Get?


Odaks
 Share

You are about to reply to a thread that has been inactive for 1488 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

openspace sims are 32 sims per core,  so if it's that, which I doubt because the servers are over built to handle any loads, the SL of old has been gone for years, physical performance hit's on the physical server side does not exist.   when a region,  openspace, homestead gets tackles, typically it's within that sim agent time is 23ms,  and the script time is 33.0 ms.   11k scripts too on that region, which I did not think was possible, but could be including the avatar.

Link to comment
Share on other sites

9 hours ago, bigmoe Whitfield said:

ah okay, yeah those are stacked 4 or 8 per core now too.   so it still holds weight hehe.  thank you for clarification.

Where did you get this idea? AFAIK full sims are still 1 per core (which may be 4 or more per CPU, of course). Homesteads have always been 4 per core. 

ETA: I popped into Space to see what's up and it's now behaving as well as can be expected for the script load it's handling -- currently 13,429 active scripts; it's running about 25% of them each frame which is quite good for so many scripts. The other time details are currently  pretty reasonable with practically no time dilation, possibly after the sim was restarted yesterday (2020-02-20 11:02 PST).

Edited by Qie Niangao
  • Thanks 1
Link to comment
Share on other sites

The readings shown in the screenshot I took are really weird. A Total Frame Time of 284ms? Simulation Time 109ms? A Script Time of 33ms, and even a gnat's whisker of Spare Time, but nothing relying on scripts working.....

What on earth was going on?

Sure, later that day things improved. I guess a re-start fixed it all. Nevertheless, I've never seen anything like that bad before, or such weird readings.

Link to comment
Share on other sites

The readings are weird -- but also familiar. Almost all that "Simulation Time" is booked to something called "Pump IO" which seems as if it should mean something but I've never heard an explanation that made sense to me. Anyway, this misbehavior pops up occasionally, as a search of the Forums will attest, and the only "fix" I've seen is just what was done here: restart the sim until it behaves normally again, which is not always successful on the first try. According to the wiki, back in 2011 somebody reported at a Simulator User Group that they could make it go away by restarting voice only, so maybe try that? (I didn't even know that was possible.)

  • Thanks 1
Link to comment
Share on other sites

7 hours ago, Qie Niangao said:

Where did you get this idea? AFAIK full sims are still 1 per core (which may be 4 or more per CPU, of course). Homesteads have always been 4 per core. 

ETA: I popped into Space to see what's up and it's now behaving as well as can be expected for the script load it's handling -- currently 13,429 active scripts; it's running about 25% of them each frame which is quite good for so many scripts. The other time details are currently  pretty reasonable with practically no time dilation, possibly after the sim was restarted yesterday (2020-02-20 11:02 PST).

It's somewhere in the forums a linden once posted how many they have stacked.

Link to comment
Share on other sites

The thing about digging up old forum posts about SL operations is that SL operations change and those old posts don't magically update.

The last thing I heard from Linden Lab about simulator hosting may be out of date now.  At the time they were moving regions around on the simulator hosts to keep cores from being saturated.  If a region was using more CPU time than a given CPU type could keep up with then it got moved to a better CPU.  Some regions could share CPU cores because they were easy to run.  CPUs in service varied by performance with newer units having higher core and memory densities.  Regions were no longer tied to pre-determined hardware types.  Simulator hosts were replaced as performance to power consumption ratios and accounting dictated. <<<BIG FAT OLD NEWS WARNING HERE>>>

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Ardy Lay said:

The thing about digging up old forum posts about SL operations is that SL operations change and those old posts don't magically update.

This.

We have no idea how LL structure their data center and even less of a clue now 'the cloud' is involved.

Attempts to get any meaningful data at TPV meetings tends to fall on deaf ears, with 'better science' being cited as reason to keep everything super secret - If we're not complaining about something, then everything must be fine. It's worth noting that this is the exact opposite to LL of old who would proudly cite their new better faster servers.

Regions could be running 1 to a core, 50 to a core, swapped out to disk when idle, physically ripped out of the racks and used to prop up desks.

Link to comment
Share on other sites

If we'll all just agree to STFU about how many sims are running per core, I'm fine with that because it's a practically useless measure of anything, soon to become even more meaningless, post uplift. But if folks are going to make up random notions of that metric and then ascribe explanatory value to their imaginings, I'm going to keep asking Simon at the Server User Group, every year or so, just to rein in the crazy. Last time I asked -- which seems like it has to have been within the last six months, don't make me go back through my chatlogs and forums posts again -- the answer was an unequivocal one-per-core. Note that this means infinitely less than it may seem, not only because we have no clue what those CPUs are, but also any virtualization scheme behind sim hosting is a nearly total black box to us; indeed, one thing we do know is that a sim is  not pinned to a core, despite the happy accident of being provisioned one per core.

So for sure it shouldn't matter, it's utterly meaningless, so let's please all pledge that there's no utility in using imaginary hardware configurations to "explain" performance problems, fantasies of LL's financial conditions, or Grandma's choice of gun rack for her F-150.

  • Like 4
Link to comment
Share on other sites

4 hours ago, Ardy Lay said:

Who said anything about virtualization?

Exactly.

Well, to be clear, I wasn't suggesting there was full-on, hypervisor-based virtualization; I merely meant to suggest some sensible allocation of shared resources. That's why I mentioned pinning simulations to CPUs, a brute force way to prevent any one sim from tanking performance of the other sims sharing the CPU. I know they aren't doing that, and frankly I'd be surprised if they're doing much beyond letting Linux processes fight it out amongst themselves -- and anyway whatever they're doing now almost surely won't be the target cloud approach.

Link to comment
Share on other sites

1 hour ago, Qie Niangao said:

Well, to be clear, I wasn't suggesting there was full-on, hypervisor-based virtualization; 

It's almost unthinkable there wouldn't be a full on hypervisor at play (It's standard, if not best practice at this point), we already know regions can be shuffled around if manual balancing is needed (and only takes as long as a region restart to accomplish).

In any case, by the time the cloud move is completed, everything will be virtualized out the wazoo (that's just how wazoo's work).

Link to comment
Share on other sites

4 hours ago, CoffeeDujour said:

It's almost unthinkable there wouldn't be a full on hypervisor at play (It's standard, if not best practice at this point), we already know regions can be shuffled around if manual balancing is needed (and only takes as long as a region restart to accomplish).

In any case, by the time the cloud move is completed, everything will be virtualized out the wazoo (that's just how wazoo's work).

Oh. Now I really wish I'd learned details of all the sim platform migrations. It may have been more interesting than I thought.

To be sure, it's all ready for forgetting, post-uplift.

Link to comment
Share on other sites

Moving a region from one simulator host to another does not include moving application software and operating system code around.  Moving a region is little more than saving the region state to a central service then loading it from another host where the operating system and application is already installed.  I would hope that a similar strategy is used "in the cloud" so that moving regions will still be very quick.  Sure, the OS and application can be in a virtual environment of some sort that can be pre-loaded as hardware is added.  That would seem to make sense, yes.  But I doubt that only one region can run per OS instance.  Perhaps the application can even process multiple regions.

 

Edited by Ardy Lay
Link to comment
Share on other sites

  • 2 weeks later...

Just How Bad Can Server Performance Get?

As bad as that one who created this region makes it.

Around 12,000 active scripts and around 17,000 script events per second are far too much even for a full region if you calculate that an optimal performance, i.e. between 90 and 100%, starts to decrease from around 4500 scripts.

A house and a prison plus weather system. Where did he put all the scripts?
Yes, it is a huge load of basic reality, but in that case I really only say "blame yourself, no pity". You should inquire beforehand how much a region can withstand but not then whine afterwards.

In 12 years in which I am already a region owner, I have never encountered anything like this.

Contact support and have it explained to you in detail.

Edited by Miller Thor
Link to comment
Share on other sites

Scripts run in spare time. They get whats left after everything else has run. They are also capped by max frame times, so if they exhaust this slices spare time, they get punted into the next (hence the % run figure).

In short; If other things need more time, the script time is reduced.

Sure, script performance will be sub optimal .. like anyone will notice.

It's just easy to see how scripts might be actively doing something when looking at the stats, fostering the assumption everything else a region does is somehow less impactful. Especially when script use represents a mostly flat load, everything else scales up and down based on avatars on and in neighboring regions, so it's inevitable that at some point, the stats will make it look like scripts have capped the region.

In reality, the region has capped the scripts.

  • Thanks 1
Link to comment
Share on other sites

Despite the high number of scripts, that region has been ticking along for a very long time without going into breakdown. Sure, script performance has been on the slow side due to to the inevitably low scripts run percentage, but I'd never seen any sim statistics figures go so haywire before. @Qie Niangaopointed out earlier that this sort of thing has happened before, with everything going loopy-loop, only to be fixed by re-starts.

Coincidentally, a region, that I used to rent a parcel on, ran consistently with 16500 active scripts without breaking down like this. Looking around the grid, you can find many regions that are running with very high active scripts numbers. There's no doubt, from what I've gleaned from discussions on this forum, that sim performance starts to fall off when active scripts exceed 4500 - 5000, but they keep running. All that happens, as @CoffeeDujourpoints out, is that script execution just gets queued to subsequent frames (or is this, too, another bit of stated fact that isn't a fact any more?). 

Anyway, thanks to everybody so far for your contributions. Every little bit helps towards my scant understandings of these things.

Link to comment
Share on other sites

58 minutes ago, Odaks said:

All that happens, as @CoffeeDujourpoints out, is that script execution just gets queued to subsequent frames (or is this, too, another bit of stated fact that isn't a fact any more?). 

That's actually one of the very few metrics that could in theory be tested by script, although bench-marking scripts are themselves much higher load than all other typical scripts (that sit about doing almost nothing from one frame to the next), which is trivial for any scheduler to spot and preemptively throttle .. so the possibility of a false positive is very high.

Scripts do get punted to subsequent frames, this figure is shown in the stats as percentage of scripts run - as in scripts that had been scheduled an opportunity to do something this frame. We only know how many scripts there are total, how many are active (have events that require processing), the percentage of run requests granted and the number of events per second. There are an awful lot of numbers missing and without those it's almost impossible to infer actual performance and load.

Note. It is possible to have less that 100% scripts run and still more than enough spare time to have actually (and often easily) run 100% in the same frame.

At the end of the day, real evaluations of a regions performance come down to .. can I move smoothly (for SL), do I see other people move smoothly, is there rubber banding, if I poke the thing - does the thing do the thing, if I tp about do things appear quickly (assuming the CDN is all good), is avatar update rate within normal ranges.

Past that, as regions do not rule in isolation on simulators, overall simulator load can make a mess of the numbers we get to see.

If your region was running ok, and now isn't, and you didn't dramatically change anything, the first step in debugging should be a support ticket to request your region be moved to a different simulator.

  • Thanks 1
Link to comment
Share on other sites

  • 3 weeks later...
On 2/22/2020 at 9:08 AM, Qie Niangao said:

If we'll all just agree to STFU about how many sims are running per core, I'm fine with that because it's a practically useless measure of anything, soon to become even more meaningless, post uplift. But if folks are going to make up random notions of that metric and then ascribe explanatory value to their imaginings, I'm going to keep asking Simon at the Server User Group, every year or so, just to rein in the crazy. Last time I asked -- which seems like it has to have been within the last six months, don't make me go back through my chatlogs and forums posts again -- the answer was an unequivocal one-per-core. Note that this means infinitely less than it may seem, not only because we have no clue what those CPUs are, but also any virtualization scheme behind sim hosting is a nearly total black box to us; indeed, one thing we do know is that a sim is  not pinned to a core, despite the happy accident of being provisioned one per core.

So for sure it shouldn't matter, it's utterly meaningless, so let's please all pledge that there's no utility in using imaginary hardware configurations to "explain" performance problems, fantasies of LL's financial conditions, or Grandma's choice of gun rack for her F-150.

❤️ 😘❤️

  • Like 1
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 1488 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...