Jump to content
Sign in to follow this  
animats

Huge intermittent sim overload

Recommended Posts

Vallone sim is having intermittent overloads again, or at least something on the same server is.

simrunningwellmed.thumb.png.47dc2b3ee682d8aaa97fd63ecaf0a236.png

This is fine. Most of the time it looks like this. 98% of scripts running, some spare time.

slowscripts01med.thumb.png.bb28c4694f9798ad6e2c9a12852436f3.png

About once a minute it looks this. 4% of scripts running. I've seen numbers as low as 1.4% scripts running. Big "simulation time". No spare time. My avatar is the only one in the sim. "Main agents" is 1, and this is not being caused by avatars entering and leaving the sim. I have "Nearby People" open and would see that. Note that "Simulation time" is way up. Whatever that covers.

I've seen slow-running sims before, of course. But this periodic huge burst of "Simulation time" is new.

My NPCs freeze in place, or worse, continue to animate while not moving, because the anim controller isn't getting script time to stop the animation. I'm also seeing some of my NPCs stall out completely. I'd just run a group of them for a week without a failure, and have logged five failures since 2 AM SLT today. I suspect they were so starved of script time that an event queue overflowed. (I get phone messages for this: "[12:05] Dacy: Dacy in trouble at Vallone <197.06750, 13.88798, 36.35270>: Patroller task: Stalled and reset." I'd gone for days with none of those messages.) Walking is sluggish. Motorcycles slow down but don't fail.

I have one task measuring delays between timer ticks, and it checks for long stalls. I've seen 8 and 10 seconds reported.

The last two times Vallone had real problems, it was some other process running on the same server, not in the same region.

Edited by animats

Share this post


Link to post
Share on other sites

It seems there are a number of other regions on the Zindra continent that are stalling at times as well. One time I checked the other regions that reportedly were running on the same physical hardware as one of the regions in question, about half of them were running some sort of "breeder" farm. Another had a few automated vehicles running around on it.

Since LL does not publicly publish which regions are running on which server after server restarts, my little hunt could have been nothing but just a trek around the grid with none of the regions I visited actually running on the same hardware.

Share this post


Link to post
Share on other sites

My first conclusion to this issue was "Child agents" on neighbouring sims.  These WERE known to cause considerable simulation-time drag on a sim if such avatars were visible from the region experiencing the problem.  I don't mean being within draw distance, just "able to see or be seen" from that region.  If they are "seeable" the region simulator must use resources to create them.

But there is much more to this issue than that.  A few months ago I questioned whether regions could cause "cross-talk" on other regions on the same server.  That would not need the region causing the issue to be actually neighbouring the region being troubled.  I've been told that can't happen, so I have nothing more to offer by way of explanation.  I  would hope someone at LL could explain this phenomenon, but I have yet to see any Linden showing willing there.

Share this post


Link to post
Share on other sites

Sent in a trouble ticket. But, from the above message, this may be a broader problem.

Share this post


Link to post
Share on other sites
1 hour ago, Aishagain said:

A few months ago I questioned whether regions could cause "cross-talk" on other regions on the same server.  That would not need the region causing the issue to be actually neighbouring the region being troubled.  I've been told that can't happen, so I have nothing more to offer by way of explanation.  I  would hope someone at LL could explain this phenomenon, but I have yet to see any Linden showing willing there.

I have questioned that over time and, like you, no answers came. I suspect since LL is working on eventually moving regions into a cloud, answers from them will mean little now as it should be a moot point once the move is done.

Share this post


Link to post
Share on other sites

lowscripts0a.thumb.png.e1fbc182bf59295c4c50e6c61616c8e5.png

A new low. 0.5% scripts run. Time dilation 0.4. "Simulation time" over 25ms. What is going on here? I've never seen numbers like that.

Again, it's intermittent. A few seconds later, things look normal; 75% of scripts running, simulation time way down. It's getting worse; now the sim is running in super-slow mode about half the time. I'm not sure what's using "simulation time".

Only 2 main agents (avis in current sim), and 3 child agents (in nearby sims within view distance) The one other avi in the sim isn't doing anything complicated, but it's slow for her, too. She left the sim; that had little effect. Nobody is entering or leaving the sim much, so it's not transients from that. I don't think it's child agents; the next sim has a club, about 50m distant, within easy draw range. It's often busy without hurting this sim. Right now the club is empty and that's not helping.

My NPCs are running, trying to use script time, but not much of it. They use 1%-2% of a sim's script capacity; I've tested in sandboxes with 50 of them. I've had them running all week, and until 9AM today, the sim was running fine. Dacy, in the picture, has her arms folded and looks annoyed because she's waiting for her path planner to run so she can move.

As far as I know, there are no breedables in this sim. I've looked in on all the skyboxes in the past; nobody has much up there doing anything.

I'm not seeing an obvious cause of huge intermittent load.

 

Share this post


Link to post
Share on other sites

calleta03med.thumb.png.afd734ced4f705ee51f43f12e95679e7.png

Calleta sim in Heterocera has the same problem.

Intermittent very low script time and high simulation time. 2 main agents (avatars), 2 child agents (avatars outside sim but within view distance).

Share this post


Link to post
Share on other sites
17 hours ago, animats said:

Sent in a trouble ticket. But, from the above message, this may be a broader problem.

Within the last year i rented 5 homestead parcels on different regions. Tiny ones with 2K sqm, up to 21K. All of them had a good time (>80% scripts run) and a bad time (<40% scripts run).
I never was able to find the cause, even when i had the 1/3 rd region, and when the other parcel was emptly and, the third one only sligtly stuffed with prims.
Now i moved again (not for that reason) and i hope i will stay at 98%. :)

I gave up to ask the landlords to have an eye on. They dont even to know what "script run" is, or where to find that info. 😩

Edited by Resi Pfeffer

Share this post


Link to post
Share on other sites
1 hour ago, Resi Pfeffer said:

I gave up to ask the landlords to have an eye on. They dont even to know what "script run" is, or where to find that info

This varies, my parcel is on an estate where they obviously do know, because the last time there was a noticeable deterioration they restarted 5 times in rapid succession until they presumably saw the problem had cleared. In another place I visited yesterday where scripts run % has been usually very poor, it had dropped to less than 4% and none of the doors or TP pads were operating satisfactorily, but the owner was in the parcel then and I assume she was trying to get a ticket for a restart, because she said "I'm on it" when I started to say there was a problem.

Edited by Profaitchikenz Haiku

Share this post


Link to post
Share on other sites

Here we are at 10:30 AM on Monday. Here's Calleta sim.

calletasimslow04med.thumb.png.2f96b2b4d9a63232c57c035bf12a3364.png

Calleta sim at 1.3% scripts running. Again, intermittent; it gets above 80% scripts running most of the time, and over 90% sometimes. This is the convenience store of the gas station near near the railroad station. Sim is so slow that even walking across a region crossing doesn't work right. Fell through the ground and was caught in the open space below the road and above the terrain.

Calleta is a moderately busy sim, a major rail and ferry hub, with many stores and vendors, but I've never seen it this bad before. Doors take seconds to respond to a click, and then move too far and swing back.

No action on the trouble ticket yet.

Share this post


Link to post
Share on other sites

Calleta is really screwed up.  Physics Time is way too high when the scripts run fall to 1%.  Some animesh object caught in a wall or hole?  I see Physics time as high at 16 ms when Dilation drops to 0.50.  

57fc260981502824db78fe68f15b88ac.png

Share this post


Link to post
Share on other sites

This morning, Vallone is doing better, but Calleta is still stalling out intermittently.

calletaslow02med.thumb.png.28441da9a1c36a66b69eabbf82e000a5.png

Calleta sim stats, Tuesday, 10 AM. 4% scripts running. High simulation time. As before, it spikes a few times a minute. So Calleta is still broken.

My NPCs at Vallone report trouble automatically. Most days, they report no problems. Their first trouble report was around 9 AM Sunday, and the last one was at 7 PM Monday. The reports were for scripts running so slow that a stall timer tripped and reset the NPC. So we can bracket the times when Vallone was broken.

No response on the trouble ticket yet.

Share this post


Link to post
Share on other sites

When I see very low percentage of scripts run in a full region with few agents present and fewer than 4000 active scripts, I usually find "pinned objects > 0" and/or "low LOD objects > 0".  In some if not all instances, these conditions have been the result of grief causing people doing what they do.  Unfortunately I cannot tell you what sort of objects they used as I do not know.  All I know is the objects are physical in some way and they are placing them on land they do not own.  Unfortunately many owners of parcels on mainland regions leave build enabled for everyone allowing a determined griefer to abuse the simulator even when auto return is enabled.

Share this post


Link to post
Share on other sites

I've completely given up on LL resolving the absolutely abysmal region performance issues. It means that the venue you've created and are paying LL lots and lots of $$$ for will be pretty much unusable 50%-90% of the time now. Of course, my costs haven't gone down any. The problem has been getting worse for years, although there was a sweet spot about 5 years ago where things actually worked reasonably well from a sim performance standpoint.... The Mono Avatar TP in sim freeze bug had been fixed (it's back now of course) and while there were still lots of issues, Performance really wasn't one of them. Now performance is the biggest problem that basically makes it impossible to have a content rich active venue (which means a fair amount of scripts.) Forget pathfinding completely, with the current performance levels, that technology is Dead. Vehicles too. Try just driving around mainland roads - a frustrating experience that will have you aborting that effort rather quickly (it's not just the sim crossings...)

But SANSAR!

All development costs money, and good developers aren't cheap. LL has some great developers on staff, but when 95% of your focus is on new features (BOM, Animesh, Sansar, etc.) and pretty much nothing on stability, reliability and performance, this is where you end up. These performance issues (and sim crashes to go with them) should be the A#1 top priority due to their impact on users. I'm not saying that things like BOM and Animesh aren't cool or needed, but if the cost is stability and performance, I'd forgo those features in a heartbeat.

LL - please make these performance issues a priority. It will go a long way towards making SL great again (sorry, I couldn't resist...)

  • Like 1
  • Haha 1

Share this post


Link to post
Share on other sites

Vallone was  restarted about two hours ago, and it went to 95%+ scripts running, with good operation. Calleta is still having drops to 3% occasionally.

This seems to have very little to do with what's going on the region. It acts more like some kind of bug in the sim system. If it was something in the region content, restarts wouldn't clear it up.

10 hours ago, Sharie Criss said:

All development costs money, and good developers aren't cheap. LL has some great developers on staff, but when 95% of your focus is on new features (BOM, Animesh, Sansar, etc.) and pretty much nothing on stability, reliability and performance, this is where you end up. These performance issues (and sim crashes to go with them) should be the A#1 top priority due to their impact on users. I'm not saying that things like BOM and Animesh aren't cool or needed, but if the cost is stability and performance, I'd forgo those features in a heartbeat.

We had a discussion along those lines at Server User Group today. Many people there are bothered by the bugs and performance problems, and not that excited about any planned new features. The Lindens said that the group that shows up at Server User Group is atypical of the user community. That's true, but may not be relevant. The people who come to Server User Group can read the statistics windows, look at JIRAs, and are very familiar with the system. But they're mostly seeing the same problems as the new users who post "this game sux" on gamer forums - slow performance, visible artifacts in the graphics, and immersion-breaking teleport and region crossing fails. That stuff just doesn't happen in modern games.

What's in the pipeline, server side, besides "last names"? Moving to "the cloud", of course. 2019 got animesh, bakes on mesh, and EEP. Not too bad. In terms of advanced users asking for new features, mostly what comes up in the meetings is little stuff - more info available via llGetObjectDetails, a few more avatar-type features for animesh such as resizing sliders and bakes on mesh. 

There's some stuff going on viewer side, with two new graphics Lindens hired. But that's viewer-side, pretty much independent of server side. Project Arctan has server-side components, but it's mostly about what to draw when and at at what level of detail and resolution, which is viewer side.

The devs are doing a good job for the resources they have. They just don't have enough resources for this very large system.

Share this post


Link to post
Share on other sites

I see several "low LOD objects" in Calletta when the Time Dilation dives and Scripts Run drops to 4% or so.  Somebody should investigate that.  It's likely the cause, or possibly an effect.  Usually when I see low LOD objects or pinned objects the region is performing very poorly.  If I can find the things and remove them then the performance normalizes immediately.

  • Haha 1

Share this post


Link to post
Share on other sites

Was trying to get a train, using VRC scripts, through Calleta at about 23:00 UTC, and the lag was huge. Sim crossings in and out of Sweetbay are pretty bad for trains but single vehicles are tolerable. Calleta-Cecropia is pretty rough, so is Colleta-Oculea, but any sim crossing involving Sweetbay is far worse.

I've never seen a report of this Low LOD thing in my viewer. I try and make fairly efficient meshes, keeping LI and Download Weight down, but I have seen signs of sluggish handling of the higher LOD versions of a mesh, things like the slow reappearance of a vehicle after a sim crossing. I'm using the current Firestorm viewer under Linux, so there seems sod all point in trying to tell the Lindens. Are these Low LOD reports something that's only in the SL viewer?

Share this post


Link to post
Share on other sites

Look in the statistics "floater" for Simulator, Physics Details, Pinned Objects, Low LOD Objects

Share this post


Link to post
Share on other sites
3 minutes ago, arabellajones said:

Are these Low LOD reports something that's only in the SL viewer?

In the Statistics Bar (any viewer) expand Simulator > Physics Details to see Pinned Objects, Low LOD Objects and Memory Allocated.

I assume these are only really relevant when Physics Time is a significant contributor to total frame time. That often obtains in cases of griefer-deposited junk, but of course there are normal land-owner uses for physics too (although it always weighs on sim performance).

Share this post


Link to post
Share on other sites

Pinned and low LOD objects are counts of objects that have driven the physics engine to use way over a control threshold of time.  I am in the region currently, watching the number of objects in the region increase and decrease continuously.  Something is going on.  It's not my haunt so I am not familiar.  Yes, there are tracks and roads, waterways and aircraft.  Simulator probably should not be affected this much by the things going on but I also see many parcels are allowing everyone to build so all bets are off when looking for consistent behavior.  I have no way of telling what is changing.

image.png

That's really ugly simulator performance.

Edited by Ardy Lay

Share this post


Link to post
Share on other sites

Calleta has been a griefer magnet since forever, but I suspect in this case it may just be the "normal" behaviour of some temp-rezzed physical "garbage bag" objects that spend their short lives bouncing around and bumping into each other. One batch is released around 204, 160 and another around 238, 156. There are many other active physical objects in this sim, too, including a smallish Yavascript pod hub, but I'm always suspicious of multiple instances of temp-rezzed stuff that collide among themselves.

That's not to say there haven't been recent griefer problems in the vicinity. Cecropia was practically dead to the world a couple weeks ago (not physics related); a support ticket response mentioned "some nasty little additions" that were removed to restore the sim's health.

Edited by Qie Niangao

Share this post


Link to post
Share on other sites

Is there nothing like the windows taskmanager one could use to locate the problem?
Like a list of all rezzed objects containing a script, with some information added about how much memory and cpu time it needs?

Share this post


Link to post
Share on other sites
14 hours ago, Sharie Criss said:

Of course, my costs haven't gone down any.

Unless, of course, you count the monthly charge for owning a region, which has - more than once, in fact.

Share this post


Link to post
Share on other sites
39 minutes ago, Resi Pfeffer said:

Is there nothing like the windows taskmanager one could use to locate the problem?
Like a list of all rezzed objects containing a script, with some information added about how much memory and cpu time it needs?

Unlike Mainland, Estates have a "Top Scripts" window to get some of that information, although it's expensive data to collect so I don't think it's updated in real time. There are (even more expensive) ways to gather comparable information, piecemeal but in real time, on both Mainland and Estates, using other scripts.

But the problems described in this thread are mostly not caused by script time; in fact, scripts are usually starved of time because the sim is so busy doing other, higher priority stuff (such as Physics), so the Scripts Run percentage drops.

There's another problem, too, where it's the process of scheduling scripts for execution that seems to add unreasonable load to the sim. The Lab is definitely interested in that, and has been trying to improve script efficiency other ways, too, but the situations described in this thread appear (to me) to be more fundamental, with scripts only the canary in a very toxic coal mine.

Edited by Qie Niangao

Share this post


Link to post
Share on other sites

From past experience with an island, Top Scripts and top colliders each have to be manually refreshed, so trying to catch the fluctuation conditions reported in this thread would be tricky. I left the island before things like pathfinding and animesh grew in popularity so I don't know what additions have been made to the estate managers tools for them. Mainland of course doesn't have such services for parcel owners.

The object owner section of the objects tab in about land used to show temp-on-rezz items though, so several refreshes ought to be able to point to things with strange ownership as possible queries.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...