Jump to content

Missing inventory (repetitively) in latest Firestorm


Rick Nightingale
 Share

You are about to reply to a thread that has been inactive for 372 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

To be fair, multiple too quick relogs have always caused (100% reproducible for me) all kinds of issues, whether this is actually cache corruption i never really checked but it would make sense given how the Viewer works and what Beq explained how the cache functions. This is nothing new in the performance Viewer, its just more noticeable because the shutdown process is a lot faster now. Though even before the performance Viewer i noticed that even shortly after the Viewer window got destroyed it was still doing something for a second or two.

What i find even more interesting is that running multiple instances of the same Viewer does not seem to have any adverse effects like this whereas a quick relog does. You'd think attempting to use the cache with two viewer instance at the same time ,preventing the first instance to close it properly, should in theory cause issues. Either way i hope this gets fixed, i'm seeing quite a lot of inventory related issues lately pop up in user reports.

Link to comment
Share on other sites

@NiranV Dean The inventory cache files in question aren't shared; there is an individual cache file for each account so there's no clash there from multiple instances*. A quick relog of the same account, on the other hand...

I'm 95% certain that I have had cache loss from quick relogs during all of this, but haven't been able to reproduce that over the last few days. From watching things, my theory is that my PC is, overall, just slow enough to take long enough to start the viewer up again and get to the point of logging in. That gives the cache writing from last time just enough time to finish on its fast SSD. I've watched the filesystem and seen the gzip file get finished just at the point of the (reloaded) viewer being ready to log in.

I do sometimes have a lot running on that PC though, so perhaps something slowing filesystem access down a little while I'm relogging swaps that fortunate timing around. The next viewer logs in before the cache is written from last time and causes a problem as access to the (unfinished) file clashes. All that's just vague guessing though since I've never looked at the code.

*Edit: I wonder what happens if I run a second viewer instance, and log in the same account on it... I know the first login is then auto-logged off by the server, but I wonder if the cache files in that case get messed up. Academic musings... I'm sure it will all come out in the wash now it's being looked into properly.

Edited by Rick Daylight
Link to comment
Share on other sites

19 hours ago, Beq Janus said:

There is nothing that I can see that suggests that is how it is meant to behave at all. As I said in the response above, all the of cache writing should be happening on the main thread and there is is parallel activity that should cause early termination

Ironically, I think that @Jaylinbridgesyou were more correct than I initially thought although I am still digging. The only way I can repro is to do the aggressive reboot and that would appear to prove the "viewer carries on after shutdown" conclusion that a number of you had come to. There have been a lot of changes to the threading in recent times and in particular a lot more stuff has been moved into the the threadpools. When closing the viewer you need to make sure that all the threads are terminated before you release shared resources. These are therefore triggered upon the close signal. This is what seems to be closing down the visible traces of the viewer leaving the main thread still running. On my own system I struggled to get this to repeat so I added a slowdown into the cache writer and it is more obvious now. 

It is worth noting that it is not just the inventory cache being corrupted. There are a number of other more minor things that are persisted after the cache and these will not happen when the rapid OS shutdown occurs either. Their effects are more subtle.

There are a number of things that need to happen now and I am exploring them.

1) The sequence of shutdown needs to be adjusted so that we have some visible indication that the viewer is running.

2) understanding why the OS is not flagging the process as busy when shutting down

3) make the writing and recovery more robust.

I am sure the lab will be looking into this too now that we have proven the reproducibility on the parent viewer.

 

  • Like 6
  • Thanks 2
Link to comment
Share on other sites

20 hours ago, Beq Janus said:

2) understanding why the OS is not flagging the process as busy when shutting down

If there is outstanding IO for a thread that is orderly shut down, the OS will commit those transactions before the thread itself is finally terminated. I guess most of that activity actually handles in the kernel so it will only show as kernel activity, while the process ID that created the thread is gone, or at least not visible in user space.

If the thread is abruptly terminated, then there will be write inconsistencies almost guaranteed and file corruption.  

  • Thanks 1
Link to comment
Share on other sites

Thank you @Beq Janus for the super-quick work on the fix in Firestorm for this issue. Sounds like the viewer shutdown process is on the 'interesting times' side of things, as in the saying. Hopefully LL will take on board the suggestion of a more robust approach in the future.

Those interested can read about it here: https://jira.secondlife.com/browse/BUG-232971

Looking forward to the next FS version :)

 

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

Just returning here to update where we are. 

I have recently pushed a change to Firestorm that resolves the most pressing issue, which is the inventory writing and shutdown. to my mind it is not really the best solution but the correct option is quite a lot more complicated and I'll review that in due course, perhaps with a view to doing something in FS or working with LL to get a more maintainable shutdown sequence. 

For now at least there will now be a slightly longer pause at exit, the main window will not close until the inventory file has been written and zipped. This should resolve the immediate issue. Please note that if you have had any form of inventory loss in recent releases then I would recommend a one time cache clear once you get the next release to be sure that everything has been pulled in. 

I'll put more detail into the Jira.

I am not a windows expert by any means, but from this investigation it seems that the reason that windows will allow the OS to shutdown while the viewer is still running is because the main window has been closed. As far as I have been able to establish, windows will give you the "the following processes are preventing shutdown" prompt ONLY if there is an active window. It does not, at least by default, block if a non-windowed process is active. This would explain the situation that we have seen. 

Why is there no window? because the main window servicing was moved to a dedicated thread and that thread is allowed to terminate almost immediately upon the receipt of the quit request.

As @NiranV Deannoted earlier, there has always been, and arguably still is, a prospect of inventory corruption upon rapid restart. The fix I have committed makes this far less likely to occur with the OS shutdown scenario, and also reduces the likelihood of the rapid restart so long as the user waits for the previous instance to close the window. There is undoubtedly still a race condition if you start one instance up while shutting down another.

 

  • Like 1
  • Thanks 4
Link to comment
Share on other sites

  • 2 weeks later...

Hello,

I am experiencing similar persistent inventory loss as I posted here: https://community.secondlife.com/forums/topic/493981-persistent-inventory-loss/

From what I get you guess that the issue is caused by not waiting long enough after shutting SL off. But how were you able to get your inventory back in the first place? In my case, the inventory won't return no matter what I tried (The details of my attempts are in my post). I don't understand about the Jira talk if the answer is there :')

Thank you

Link to comment
Share on other sites

3 hours ago, peggy215 said:

I am experiencing similar persistent inventory loss as I posted

Hi Peggy,

It does sound as if you are falling victim to the "quick shutdown" problem but there are many reasons why this can happen. A very comprehensive guide is to be found in Firestorm's notes on missing Inventory: -

https://wiki.firestormviewer.org/fs_missing_inventory?s[]=inventory&s[]=loss

The basic "get my inventory to fully reload again" procedure is to simply log into a quiet region* somewhere, having cleared your inventory first. It should fully reload - just be patient while it does!

You didn't say what viewer you are using but, if it is not the standard LL viewer, and the above procedure does not work, try logging in similarly with the standard LL viewer instead.

 

* Any Linden water region will usually do; the notes referenced above suggest a few. HATTON used to be a favourite and still works fine, last time I tried it.

PS

If the search-term highlighting still shows on Firestorm's guide, simply click the magnifying-glass symbol at the Search box near the top. That will clear the search I used to find that document and the highlighting will disappear.

Edited by Odaks
Afterthought
Link to comment
Share on other sites

  • 1 month later...

Coming back to this thread after what feels like a lifetime.

The forthcoming Firestorm will no longer be prone to this issue. The basic fix was simple enough but it turned out to be a merry dance that consumed most of the Christmas period.

The basic issue was as I mentioned in my 1st December post. However that fix itself uncovered a veritable nest of vipers that resulted in the final fix being somewhat different (in some ways simpler).

To recap, the issue was that the new "fast shutdown" was in fact snake oil, the speedy closing of the main window was I think not really an intended result, but was perhaps seen as a nice side-effect, of moving many of the ancillary functions of the viewer into separate threads of execution (allowing the viewer main thread to spend more time drawing stuff). However, intended or not, the vanishing of the window just hid the fact that the viewer was still busy scurrying around in the background writing the caches out to disk and cleaning up. In Windows, the "premature shutdown" defence only applies to processes that have a UI and thus having killed the UI the operating system consider it fair game to kill the viewer and this was resulting in corrupted inventory files. Furthermore, the lack of a UI also signalled to us, human(-ish) operators, that we were free to restart another instance and that could also trigger issues.

The original "fix" resolved the inventory issue, but led to a number of people see the viewer hanging during the logout process. Not being able to reproduce this problem I relied on the valuable and patient assistance of @Rick Daylightand a couple of others to help test various iterations of fixes and debug builds as I tried to locate the hanging. 

This is now complete, the fix overall is underwhelming and simple compared to the journey taken to find it! But the result is that we should now find that inventory caching on shutdown is far more reliable.

There is a downside, the so-called "fast shutdown" myth is no more. The viewer will now pause as it always used to while it writes out the various caches. There may be scope to speed this up in future, but for now you will see the extended pause at shutdown, but remember that this is telling you that the viewer is still busy.

Addendum - for those technically minded readers

The hangs I had to fight were as a result of the shutdown order of the many event queues used and the fact that they were shared between threads that had potentially different life spans. In particular, the windows thread would continue to read and produce events (mouse movements, clicks etc) that get forward to other queues to be processed by parts of the viewer (including the main thread) when those threads stop reading then the queues fill up, the windows thread then blocks waiting for space to be made, space that will never come. Meanwhile the main thread is waiting for the windows thread to finish so that it can exit, resulting in a classic deadlock.

 

  • Like 4
  • Thanks 5
Link to comment
Share on other sites

57 minutes ago, Beq Janus said:

The hangs I had to fight were as a result of the shutdown order of the many event queues used and the fact that they were shared between threads that had potentially different life spans. In particular, the windows thread would continue to read and produce events (mouse movements, clicks etc) that get forward to other queues to be processed by parts of the viewer (including the main thread) when those threads stop reading then the queues fill up, the windows thread then blocks waiting for space to be made, space that will never come. Meanwhile the main thread is waiting for the windows thread to finish so that it can exit, resulting in a classic deadlock

This, and other issues (jerky mouse, bad window position saving, and several others) are the reasons why I kept the pre-performance-viewer code for LLWindowWin32, that is, I did not push the Windows events processing to a thread (which does not even bring any visible speed benefit at the frame rates level)...

You might also want to have a look at the modifications to the thread pool code I made for the Cool VL Viewer, because there are also many shutdown issues there... and also issues with multiple-threads pool startup.

Edited by Henri Beauchamp
  • Like 4
  • Thanks 2
Link to comment
Share on other sites

I'm quite happy to see the 'slow' shutdown return rather than having to guess when it's safe to restart. As always I prefer to know what is really happening. If it could be made genuinely faster, great, but it is what it is right now.

I've still not had a freeze (or inventory loss) since the final fix was implemented - good work Beq!

  • Like 5
Link to comment
Share on other sites

  • Lindens
18 hours ago, Beq Janus said:

The hangs I had to fight were as a result of the shutdown order of the many event queues used and the fact that they were shared between threads that had potentially different life spans. In particular, the windows thread would continue to read and produce events (mouse movements, clicks etc) that get forward to other queues to be processed by parts of the viewer (including the main thread) when those threads stop reading then the queues fill up, the windows thread then blocks waiting for space to be made, space that will never come. Meanwhile the main thread is waiting for the windows thread to finish so that it can exit, resulting in a classic deadlock.

What?!  We'd *never* write code like that!  🤣

  • Thanks 1
  • Haha 5
Link to comment
Share on other sites

I have been reading this thread and am left wondering how long is the time between the window closing and the last viewer related file being closed?  I tried several times to "break" mine but did not accomplish this.

Link to comment
Share on other sites

Just now, Ardy Lay said:

I have been reading this thread and am left wondering how long is the time between the window closing and the last viewer related file being closed?  I tried several times to "break" mine but did not accomplish this.

It varies by avatar. If you enable the "Inventory" debug logging (through logcontrol.xml) you will see the start and end of the writes. You need to zap it in between those and keep in mind that there's a good chance you won't notice. Take a look at the number of items written to the cache on a good exit. Then on a failed, it will be smaller, how much smaller depends on how badly it got interupted and whether you "care" about the items "missing" is entirely chance.

The original bug manifested if the shutdown (or the restart) occurs while the <your avatar UUID>.inv.gz is being constructed, this takes a few forms as we write to a plain text version, then pass it off to be zipped. When the viewer starts it tries to carry on as best it can and can end up with a truncated inventory cache. Times will vary, the larger your inventory, the slower your disk, CPU, etc.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Late to the party, but just to add my two cents worth - the only thing I've found that works for me (not to dismiss anyone's helpful suggestions in here), is uninstalling FS and doing a clean install.

LL Viewer works just fine and so does BD - but FS?  Not a chance.

Clean install gives me a few weeks - maximum - of normal use before it starts its nonsense all over again.

I won't bother putting in a ticket since the last time I did that someone 'moved' my ticket - and it was never addressed or seen again - so I don't find that avenue particularly helpful. 

  • Like 2
Link to comment
Share on other sites

  • 1 month later...
16 hours ago, theo Hilltop said:

Hey all.

I've got the same issue now as well for days..

Whatever I do, the first login works, then the second one just get me an inventory bug.

Anyone got an idea?

Cheers,
Theo

Right click on your empty folder and reload that folder, it will appear. This happened to me on a huge batch of folders, and worked after I did this.

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 372 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...