Jump to content

Recent stack heap collisions


Phate Shepherd
 Share

You are about to reply to a thread that has been inactive for 801 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

2 hours ago, Phate Shepherd said:

Has anything changed recently as far as LSL goes? A product that has been running stable for months is now throwing stack heap collision on login. Is does pull http data from outworld, but the size of that data has not changed...

I had this happen a few months back after a roll, which was mysteriously rolled back and not much info was given.

The script had been running for months without issue, then repeated stack heaps after the roll, then none again after the rollback.

I had to create a debug channel listener and script pin injection setup to replace the script whenever it borked.

I'm weary that LL is screwing around with script memory or VM memory swapping/handling/priority or garbage collection protocols.

Edited by Lucia Nightfire
  • Like 1
Link to comment
Share on other sites

12 hours ago, Lucia Nightfire said:

I had this happen a few months back after a roll, which was mysteriously rolled back and not much info was given.

The script had been running for months without issue, then repeated stack heaps after the roll, then none again after the rollback.

I had to create a debug channel listener and script pin injection setup to replace the script whenever it borked.

I'm weary that LL is screwing around with script memory or VM memory swapping/handling/priority or garbage collection protocols.

I'm at a loss... it only happens on login. I hear you about using a script injector to fix dead scripts. I've done that on a mission critical object.

Is there any way to force garbage collection in a script to see true free memory? The llGetFreeMemory wiki says "amount of free memory available to the script prior to garbage collection being run." OK... I want to know what the freemem post garbage collection is. Not sure what the point of knowing what it is prior. I do find it amusing that just adding an llOwnerSay freemem line in my script stopped the stack heap crash on login.

I guess it is time to see if LlScriptProfiler is useful in this case.

  • Like 1
Link to comment
Share on other sites

1 hour ago, Phate Shepherd said:

Is there any way to force garbage collection in a script to see true free memory?

I've been using the below method for some time. Either line alone can sometimes yield no garbage collection.

llSetMemoryLimit(65536 - (llGetMemoryLimit() == 65536));
llSleep(0.03);
  • Thanks 2
Link to comment
Share on other sites

1 hour ago, Lucia Nightfire said:

I've been using the below method for some time. Either line alone can sometimes yield no garbage collection.

llSetMemoryLimit(65536 - (llGetMemoryLimit() == 65536));
llSleep(0.03);

Hm. I didn't know you needed to sleep.

Here's what I use:
 

//
//  pathneedmem -- need at least this much free memory. Return TRUE if tight on memory.
//
integer pathneedmem(integer minfree)
{
    integer freemem = llGetFreeMemory();                // free memory left
    if (freemem < minfree)                              // tight, but a GC might help.
    {   ////pathMsg(PATH_MSG_WARN, "Possibly low on memory. Free mem: " + (string)freemem + ". Forcing GC.");
        integer memlimit = llGetMemoryLimit();          // how much are we allowed?
        llSetMemoryLimit(memlimit-1);                   // reduce by 1 to force GC
        llSetMemoryLimit(memlimit);                     // set it back
        freemem = llGetFreeMemory();                    // get free memory left after GC, hopefully larger.
    }
    if (freemem < minfree)                              // if still too little memory
    {   ////pathMsg(PATH_MSG_WARN, "Low on memory after GC. Free mem: " + (string)freemem);
        return(TRUE);
    }
    return(FALSE);                                      // no problem
}           

The idea is not to force a GC unless you have to, because it's a slow operation. So you ask if there are minfree bytes available, and it only forces a GC if llGetFreeMemory says there's not at least minfree bytes available before GC.

  • Thanks 2
Link to comment
Share on other sites

2 hours ago, Phate Shepherd said:

I'm at a loss... it only happens on login.

This is something that has been puzzling me for a while., I would often log in and then be met with one or two error IMs from a couple of scripted objects. Adding extra diagnostics to them to send me offline reports showed though that it was happening at rare intervals and I was getting the delayed error messages when I logged in. Are you sure this isn't the same in your case?

Is the culprit an attachment? All I can think of is that when you log in you effectively enter a region just as if you TP there when already logged in, and so there is an initial region loading as it tries to get all your scripts up and running.

Link to comment
Share on other sites

9 minutes ago, Profaitchikenz Haiku said:

Is the culprit an attachment? All I can think of is that when you log in you effectively enter a region just as if you TP there when already logged in, and so there is an initial region loading as it tries to get all your scripts up and running.

Yes, it is a worn hud. The script that is doing a stack heap does a lot of list manipulation when attached... and therefore also at login. Thing is, it never throws the error when attached from inventory. The only event handlers in it are state_entry, link_message, http_response and changed. It is the on_rez event in another script that triggers the HTTP get in the one that crashes.

I'm going to have to keep increasing the size of the http response to see if I am just close to a memory limit, or if it is really a script startup issue at login.

Link to comment
Share on other sites

This reminds me - on topic.  This happened recently for the first time in awhile:

I was actively working on a script, which had a known amount initial memory available, and suddenly in one test sequence it had no memory.  Stack/Heap collision errors resulted.

I had to copy the script source to a brand new script to get it to work.

No, it was not somehow compiled in LSO (non-Mono).

Yes, I have seen this before - but only when I was revising multiple versions of a script (like this time).  It had been a few years since I saw it (and wrote posts about it somewhere here on the Forum).

Like the previous times this happened: the "original" copy of the script eventually did not have sufficient memory - on reset, on recompile, etc.

I know this does not sound like the problem y'all are describing, but ya never know!

 

  • Like 1
Link to comment
Share on other sites

2 hours ago, Phate Shepherd said:

The only event handlers in it are state_entry, link_message, http_response and changed

I wonder if you could play around with CHANGED_REGION to try and make it do the same when you log in as when you attach it from inventory? As far as I know, attach from inventory restores all the saved script details from when you detached it last, but TP-ing (and hence logging in) might not have the same access to stored last state as attach. Just guessing, but I am struggling with similar problems in other Grids when Hypergrid arrival results in attached scripts arriving in a dead state, they only resume operation when detached and re-attached.

  • Like 1
Link to comment
Share on other sites

Look for large messages being sent using llMessageLinked. That function can send any sized message to any and all prims in the linkset, which means  a giant message can blow up anything accepting link_message events.

Suggestion: Make a wearable with a script that listens to link_message events and checks for big ones. Print  sender_num, num, id, and the first part of msg with llOwnerSay(). If that finds something, you've located the problem.

Link messages ought to have some kind of filtering for length or source, but they don't.

Edit: No, can't send link messages between attachments. Has to be something in the same attachment. But I'd still look for something big being sent that way. There's no size limit.

 
Edited by animats
  • Like 2
Link to comment
Share on other sites

12 hours ago, Profaitchikenz Haiku said:

I wonder if you could play around with CHANGED_REGION to try and make it do the same when you log in as when you attach it from inventory? As far as I know, attach from inventory restores all the saved script details from when you detached it last, but TP-ing (and hence logging in) might not have the same access to stored last state as attach. Just guessing, but I am struggling with similar problems in other Grids when Hypergrid arrival results in attached scripts arriving in a dead state, they only resume operation when detached and re-attached.

That is probably a better alternative that logging out and in over and over. I can't get it to repeat on demand. I suspect it may take logging in to a region that has been empty for a long time. I thought I read that it takes a while before a region goes into a semi-sleep state and that may contribute to script misbehavior when a region is in this state. If that is the case, then using your idea and wandering around mainland going from empty sim to empty sim might be the closest I can get to replicating a login.

 

10 hours ago, animats said:

Look for large messages being sent using llMessageLinked. That function can send any sized message to any and all prims in the linkset, which means  a giant message can blow up anything accepting link_message events.

Suggestion: Make a wearable with a script that listens to link_message events and checks for big ones. Print  sender_num, num, id, and the first part of msg with llOwnerSay(). If that finds something, you've located the problem.

Link messages ought to have some kind of filtering for length or source, but they don't.

Edit: No, can't send link messages between attachments. Has to be something in the same attachment. But I'd still look for something big being sent that way. There's no size limit.

 

Thanks for your replies. In this case, the link messages are pretty short, under 1k and it is the sender that is crashing not the receiver.

At the moment, it is really looking like the list handling is at fault. I was waiting to make sure the HTTP received data was valid before clearing the existing list and re-populating it (Not any sort of list growing out of control issue.)  I may have to accept that simply getting an http 200 response is sufficient for valid data, and clear the list before parsing the http data and repopulating the global list.

Link to comment
Share on other sites

You may wish to add a "Content-Range" header to your llHTTPRequest to ensure that if the server for whatever reason encounters an error, that it doesn't then send you a large error page (causing the script to crash). Some applications will respond with a 200 status code, even if they were unable to complete the request (instead, sending the user to an error page).

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

3 hours ago, Jenna Huntsman said:

You may wish to add a "Content-Range" header to your llHTTPRequest to ensure that if the server for whatever reason encounters an error, that it doesn't then send you a large error page (causing the script to crash). Some applications will respond with a 200 status code, even if they were unable to complete the request (instead, sending the user to an error page).

I do have HTTP_BODY_MAXLENGTH set. I believe that should do it?

Link to comment
Share on other sites

1 minute ago, Phate Shepherd said:

I do have HTTP_BODY_MAXLENGTH set. I believe that should do it?

In theory, yes - but sometimes, if you're requesting data from an API, it might include header data which you don't need. More often than not, people increase HTTP_BODY_MAXLENGTH to get around this, but that's a bad solution.

For example, if I was requesting weather, the API might respond:

<?xml version="1.0" encoding="UTF-8"?>
<header>
  <server>Nginx</server>
  <date>curDate</date>
  <shard>shardID</shard>
  <requestUUID>reqUid</requestUUID>
</header>
<result>Weather in Vancouver - Sunny, 22degrees, Wind 7kph NE</result>

The only thing I actually need from that result is the content within <result>, so I could use Content-Range to tell the server to only send the last x amount of bytes of the page, which contains <result>.

  • Thanks 3
Link to comment
Share on other sites

It would be handy to know which event it's processing when it gets the stack-heap collision. My hunch would be that it happens while handling that login attach, grinding lists, before it even sees the link_message from the other script. If so, the external server logs may reveal that it crashed before even issuing the llHTTPRequest.

I don't know why this would happen only at login, though. I'm not real sold on the idle region awakening theory unless this attachment is interacting with another, unattached script that's been stuck on the idle sim and that does something strange for being starved of processing time. I'm trying to make up a story about some events getting queued during logoff that clutter up the lists to be processed on login attach, but I can't come up with a mechanism for that to happen. (It's just that login is unique for being preceded by logoff.)

On 7/12/2022 at 11:51 AM, Phate Shepherd said:

I want to know what the freemem post garbage collection is. Not sure what the point of knowing what it is prior. I do find it amusing that just adding an llOwnerSay freemem line in my script stopped the stack heap crash on login.

For normal purposes, it's the freemem prior to GC that does matter because if the script tries to use more than that, it crashes, it doesn't trigger GC. I suppose it's possible that the llOwnerSay somehow triggered a GC (or perhaps that statement added just enough code memory that something else triggered the GC), but the only way I know to force a GC is with the llSetMemoryLimit gambit described above. It is expensive, but maybe not a big deal if only done on attach. It would be real interesting to know if the problem will ever arise with a GC at the start of attach processing.

  • Like 1
Link to comment
Share on other sites

16 hours ago, Qie Niangao said:

It would be handy to know which event it's processing when it gets the stack-heap collision. My hunch would be that it happens while handling that login attach, grinding lists, before it even sees the link_message from the other script. If so, the external server logs may reveal that it crashed before even issuing the llHTTPRequest.

I don't know why this would happen only at login, though. I'm not real sold on the idle region awakening theory unless this attachment is interacting with another, unattached script that's been stuck on the idle sim and that does something strange for being starved of processing time. I'm trying to make up a story about some events getting queued during logoff that clutter up the lists to be processed on login attach, but I can't come up with a mechanism for that to happen. (It's just that login is unique for being preceded by logoff.)

For normal purposes, it's the freemem prior to GC that does matter because if the script tries to use more than that, it crashes, it doesn't trigger GC. I suppose it's possible that the llOwnerSay somehow triggered a GC (or perhaps that statement added just enough code memory that something else triggered the GC), but the only way I know to force a GC is with the llSetMemoryLimit gambit described above. It is expensive, but maybe not a big deal if only done on attach. It would be real interesting to know if the problem will ever arise with a GC at the start of attach processing.

OK, the problem has been solved. Thanks to all that made suggestions.

It was crashing in the http_request event handler.

In the end, it did turn out to be the http response was JUST big enough to trip up the script at login. I made the response just a tiny bit bigger, and it crashed with a stack-heap collision every time, not just at login. Why it only happened at login with the original http body is still a mystery, but apparently it was just a few bytes shy of crashing anyway.

Originally, the http response body was parsed into a list with llParseString2List(body, ["\n"], []), and those lines parsed and appended to sublists. This meant that the full body text, the parsed body and the resulting sublists were all in memory at the same time.

I experimented with 2 ways to get rid of the list of lines. The first was to search for a linefeed, and just parse the body from 0 to the linefeed. Then use llDeleteSubString to chop off that bit from the body and repeat until the body was empty. That worked, but it was dog slow.

The final method was to implement a sliding window on the body text. Essentially a form of  llGetSubStringIndex, but with start and end character indexes. So I just moved my window through the body line by line, and never needed more than 256 characters for the resulting sliding window string.

With those changes, I can now parse a full 16k http body and still have plenty of memory left over. It isn't as fast as parsing the body into lines with llParseString2List, but the penalty is worth it.

Edited by Phate Shepherd
misppellins
  • Like 2
  • Thanks 2
Link to comment
Share on other sites

7 hours ago, Quistess Alpha said:

I've seen some older scripts do weird shenanigins like

llParseString2List(body+(body=""), ["\n"], []);

or similar. Does that kind of trick still work?

these kinds of hacks still work in LSO. The list hacks are workrounds  due to the way the LSO VM handles memory fragmentation. They are not needed in LSL Mono because the Mono VM handles memory differently

a number of other hacks are here: https://wiki.secondlife.com/wiki/LSL_Hacks 

Edited by Mollymews
M
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

12 minutes ago, Mollymews said:

these kinds of hacks still work in LSO. The list hacks are workrounds  due to the way the LSO VM handles memory fragmentation. They are not needed in LSL Mono because the Mono VM handles memory differently

a number of other hacks are here: https://wiki.secondlife.com/wiki/LSL_Hacks 

I was more familiar with the "list hacks" than the "string hacks".

Link to comment
Share on other sites

10 hours ago, Quistess Alpha said:

I've seen some older scripts do weird shenanigins like

llParseString2List(body+(body=""), ["\n"], []);

or similar. Does that kind of trick still work?

I might give it a try, just to see what the final freemem looks like. Would be nice to have the speed back.

Edit: Unfortunately, as Mollymews suggested, it didn't work in Mono. I would have thought that construct would have pulled off a post parse nulling of the body, but it didn't. I guess I shouldn't be surprised since a similar construct is used to detect which VM you are using on the LSL_Hacks page.

Edited by Phate Shepherd
  • Like 1
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 801 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...