Jump to content

Beq Janus

Resident
  • Posts

    606
  • Joined

  • Last visited

Everything posted by Beq Janus

  1. I would be more strident on this. That version of Second Life viewer should not be run, at all, ever. It is sad that it remains listed and available for download on the Linden Lab website as it gives a very poor impression of Second Life at a time when more Linux users are potentially appearing (I see a small but growing number running Firestorm on Steam Decks). I would consider it an unsafe download given the ancient SSL and CEF libraries, not to mention the fact that it lacks support for a number of core features that are central to a modern Second Life experience. Just for those of you who think that Linux is no longer listed on the downloads page...that's because you are not running Linux. This is what you would see if you visited Secondlife.com on a SteamDeck or any other Linux-based device, looking to join up. It would be worth shouting out @Vir Linden, Signal Linden and @Monty Linden and @Alexa Linden. While I accept that the effort to restore upstream (i.e. Linden Lab) support for Linux would probably not see a clear commercial return, (I think that there are indirect returns on the investment, through automation potential, etc.); it would be cheap (practically free really) to at the very least have the downloads page changed to reflect the realistic case that while LL do not directly support Linux, it is a platform that they recognise as being used by their customers and affirming that Linux users are fully supported by a number of actively developed TPVs such as Firestorm, CoolVL, Alchemy etc. The present state simply misleads these users into downloading a decrepit ancient viewer that then breaks and actively discourages them. Another lost customer that did not need to be lost.
  2. Thanks @arton Rotaru Put simply the whole thing is a bit of a lottery. It has been interesting over the last month or so. I came back from having been waylaid by RL and the need to earn money 🙂 over the summer to find that there were a number of different alpha rendering issues that emerged after the performance updates. I fixed some in FS and the Lab fixed the same ones and more in their Maint-N update, this in turn broke other things. So I've spent a while dancing circles around this stuff and playing bug whack-a-mole because the "correct behaviour" has never been defined; thus people find a thing that they think is a cool hack to make sure that their pet creation renders as they want, but then we innocently change some code and the house of cards comes crashing down. There are a range of hacks such as the use of materials (blank or otherwise) to change the apparent render priority, the use of full bright for the same, even the use of alpha mask cutoff, when alpha blend is in use, all being treated as cast-iron rules, when in reality they are pure happenstance, a side-effect of the order in which the code runs. This "order" may be arbitrary and thus can change because we alter the way we store something, or just change the algorithm to make things faster, who knows simply changing the compiler could affect some of these nuances. The attachment point agreement is step one of the route out of the dark ages. I hope by the end of the weekend to have a concrete solution as a proof of concept that will go a lot further and I will seek support from the Lab to make that a documented and stable feature for the foreseeable future. By this I mean that if your content follows the rules and breaks then we will fix the viewer, if you don't follow the rules then you'll need to fix the content. Moreover, if and when the day comes that rendering upgrades and tech changes mean that some part of the rules can no longer be supported, we viewer devs (or more correctly for such changes, LL) will at least be able to broadcast this fact and make a breaking change in the full knowledge of what it is breaking. As with the attachment points, the plan is to get a working proof of concept, test it out and work out what the impact is on current content and then ask LL to adopt it; "Beq says so" should never be acceptable (or trusted)🙂 but "LL says so" is something you should be able to hang your hat on. TL;DR at the moment, the ordering within a linkset and within the faces of a mesh/prim is not fully guaranteed, and even where an ordering exists or can be deduced, it is not underwritten to stay that way. I hope to get that resolved soon(tm)
  3. I bought into Google Stadia as a founder in the hope that it would be open enough to allow us to explore FS on there. Sadly it was not made available in a way that I could justify trying to support it. A lot of people found the Old SL-Go useful back in the days before Sony bought it and killed the service. With Stadia now closing I am looking at GeForce Now as an alternative. It is not high on my list but definitely of interest.
  4. I would double check for inventory corruption. There's another thread here discussing this issue and it is plausible that some of what you are seeing could be the result of a previous bad shutdown. Hopefully we'll have a new set of betas real soon.
  5. Just returning here to update where we are. I have recently pushed a change to Firestorm that resolves the most pressing issue, which is the inventory writing and shutdown. to my mind it is not really the best solution but the correct option is quite a lot more complicated and I'll review that in due course, perhaps with a view to doing something in FS or working with LL to get a more maintainable shutdown sequence. For now at least there will now be a slightly longer pause at exit, the main window will not close until the inventory file has been written and zipped. This should resolve the immediate issue. Please note that if you have had any form of inventory loss in recent releases then I would recommend a one time cache clear once you get the next release to be sure that everything has been pulled in. I'll put more detail into the Jira. I am not a windows expert by any means, but from this investigation it seems that the reason that windows will allow the OS to shutdown while the viewer is still running is because the main window has been closed. As far as I have been able to establish, windows will give you the "the following processes are preventing shutdown" prompt ONLY if there is an active window. It does not, at least by default, block if a non-windowed process is active. This would explain the situation that we have seen. Why is there no window? because the main window servicing was moved to a dedicated thread and that thread is allowed to terminate almost immediately upon the receipt of the quit request. As @NiranV Deannoted earlier, there has always been, and arguably still is, a prospect of inventory corruption upon rapid restart. The fix I have committed makes this far less likely to occur with the OS shutdown scenario, and also reduces the likelihood of the rapid restart so long as the user waits for the previous instance to close the window. There is undoubtedly still a race condition if you start one instance up while shutting down another.
  6. Ironically, I think that @Jaylinbridgesyou were more correct than I initially thought although I am still digging. The only way I can repro is to do the aggressive reboot and that would appear to prove the "viewer carries on after shutdown" conclusion that a number of you had come to. There have been a lot of changes to the threading in recent times and in particular a lot more stuff has been moved into the the threadpools. When closing the viewer you need to make sure that all the threads are terminated before you release shared resources. These are therefore triggered upon the close signal. This is what seems to be closing down the visible traces of the viewer leaving the main thread still running. On my own system I struggled to get this to repeat so I added a slowdown into the cache writer and it is more obvious now. It is worth noting that it is not just the inventory cache being corrupted. There are a number of other more minor things that are persisted after the cache and these will not happen when the rapid OS shutdown occurs either. Their effects are more subtle. There are a number of things that need to happen now and I am exploring them. 1) The sequence of shutdown needs to be adjusted so that we have some visible indication that the viewer is running. 2) understanding why the OS is not flagging the process as busy when shutting down 3) make the writing and recovery more robust. I am sure the lab will be looking into this too now that we have proven the reproducibility on the parent viewer.
  7. There is nothing that I can see that suggests that is how it is meant to behave at all. As I said in the response above, all the of cache writing should be happening on the main thread and there is is parallel activity that should cause early termination. IT is the case that a lot of non-critical (where critical means rendering and main game loop) activities have been moved to separate threads as part of the performance changes, so it could be that something is tripping over itself and hence my belief that somewhere something is causing the viewer to crash and burn without any trace. But as I said this is mostly speculation right now.
  8. So here's a very early feedback. The logs and the notes are excellent Rick, thank you. The worrying thing is that the only conclusion I can draw at the moment is that the viewer is crashing on exit. Needless to say I need to look deeper into this. Using your log A we can see that at the start we have Then right at the end we see. These two lines are the very last in the log and to be blunt, that is not possible under normal operation. Those lines are written out by the inventory model when it is told to cache itself by the "disconnectViewer" function. It is not in a background thread. It should not have any route for a parallel exit (where some other thread decides to shut the process), and in fact the viewer window will not shut until this completes. To add to this, there is no obvious way for the saveToDisk function that does that logging to complete without logging at least one other line, either a success or a fail. All paths through the code are covered by an error handler. Yet we see no errors, no logging at all. We can be pretty sure that the viewer does stop at that point, because immediately after the saveToDisk() we call gzip on the file. and that in turn has logging associated. "But Beq," I hear you cry "if the logging is killed then that would explain it." It could...but, next time we start we find this mess... I've not looked into those in depth yet, but the attempts to recover here seem to be tripping over themselves. The fact that the .t file is there at all implies that the gzip did not complete, which means that it was not JUST the logging that stopped. I need to try to reproduce this in a debugger and work out what is going on. But that'll have to wait as I'm heading off to bed. There is a chance that tomorrow or the next day , I completely recant all this when I find out that I overlooked something very obvious, but right now it does look like something peculiar is going on and my best guess is that the viewer is meeting an untimely demise and not at the hands of the operating system. If I cannot get this to repro for me I might need to get you to turn on more debug logging to work out if we are exiting early and unannounced. For now though, g'night.
  9. Hi @Rick Daylight I've just seen this (the forums told me I'd be tagged but I cannot see where) [Added later, as I seemed to forget to hit submit! Apparently you get notified even if the post is held for moderation, how weird] I've skimmed most of the thread, and the Jira. If I understand correctly you have concluded (so far) that the issue is that the OS is being terminated, before the writes complete. Is that a correct (if very compressed) summary? I am unconvinced of that a premature termination is behind this, such a termination would result in a truncated zip file and that would then fail to unzip on restart and the entire cache would be flagged as invalid as it cannot be unzipped. As you have noted, there are two stages, the writing of a temporary file (the '.t') which is then gzipped. The premature termination can of course happen to the temp file, but that makes matters rather more complicated because we are now looking for a situation where the temp file is patrially written due to some other event, but the gzip is then triggered correctly. Intriguing. Can you try adding inventory logging to the viewer and then adding a log file for a failed run to a Jira, please. In your settings folder you should find a logcontrol.xml file. Open that up in your code editor of choice. towards the end of the file you should see a section that looks like this. <array> <!-- sample entry for debugging specific items <string>AnimatedObjects</string> <string>Avatar</string> <string>Inventory</string> <string>SceneLoadTiming</string> <string>Voice</string> <string>Avatar</string> --> <string>import</string> <string>export</string> <string>Outfit</string> </array> Simply move the line <string>Inventory</string> down so that it is out side of the comment block, place it between export and Outfit for example. You'll want to revert that later as it will generate a lot more logging. This is normal for support tickets as they are kept private as they can have more personal info in them. It's a bit of a fuzzy line really but that is how things are setup at the moment.
  10. Could just be the normal "incremental" loading, hard to say. My brief glance at the code (above) didn't highlight any obvious use of the base and detailed. normal maps would be good to a point. resolution becomes an issue but we know it works given how many of us resort to mesh terrain now to counteract the built in terrain sadness. a displacement map would be even better (with the same resolution caveats) and given the current plans on PBR a couple of extra terrain maps is the least of my worries.
  11. probably worth an OpenSim excursion one day to have a look at this. I'm focussed on some other things right now but that's intriguing as I don't see anything in the code that screams out for that kind of error range.
  12. Nothing very useful to add on this. I tend to work by trial and error, I wrote my own Blender addon that uses an orthographic projection to create a height map from a model. That worked well enough for my needs (building a land that correlated to the scene I built for an FF region.) I probably then just messed with the sliders until I got what I wanted in the textures. I've never looked in depth at the terrain code etc so any comments here are based on a cursory glance through. The shaders seem to have nothing random in them, nor do they seem to have any FS specific changes, at least not in the last 7 or more years that I looked back through. The surface patch code (which is responsible for generating the mesh for the terrain, does have FS (or at the very least TPV) specific changes as we deal with the nightmare of VarRegions on Opensim and the multitude of evils that fall from that. It is always possible that somewhere in history those changes have led to a variance in SL but I have nothing but conjecture on that. Inside the surface patch creation code there is an element of noise being added. I've not looked close enough to find out where and why, but that could explain why there are small differences from machine to machine with the same code. That noise has been there since the start of the revision history (around 16 years). (VarRegions for those that do not know about OpenSim, is a feature that allows OpenSim estates to have regions of varying sizes not just 256mx256m. A nice idea that adds layers of complexity into parts of the viewer) I think the issue here is likely part of the fuzziness. high is the minimum possible start for texture 4, that doesn't mean it will be, though by the same measure texture 1 should be capped at 16, but depending on the granularity of the heightmap at that point I can well imagine that those might overlap in such a small sample range. You mentioned "following the documentation" which (as I am sure you are well aware) is a very unreliable path to take in SL as documentation is at best vague notes on what might have been intended and at worst, a reverse engineered guess. The code does not really support what is said about height levels. First it takes the 4 corner, min height values and does a bilinear interpolation, munges a few things and then gets an "exact height" to which it applies quite a large amount of noise (which would seem to support my assertion above about the narrow range of heights being the issue) // // Choose material value by adding to the exact height a random value // vec1[0] = vec[0]*(0.2222222222f); vec1[1] = vec[1]*(0.2222222222f); vec1[2] = vec[2]*(0.2222222222f); twiddle = noise2(vec1)*6.5f; // Low freq component for large divisions twiddle += turbulence2(vec, 2)*slope_squared; // High frequency component twiddle *= noise_magnitude; F32 scaled_noisy_height = (height + twiddle - start_height) * F32(NUM_TEXTURES) / height_range;
  13. Complaints are fine 🙂 (esp. if delivered constructively) I'll need to collect them up at some point so if you are inclined to raise me a jira that would be great. Physics only exists on the server, so I cannot create local physics as such, and I cannot think of anyway to achieve that. I've been suggesting the need for client side physics in SL for a few years now as it would be a major boon to many "game-like" behaviours, enablee us to stop the avatar AOs from ghosting through items etc and have the world be generally more solid, but right now that's not even on the drawing board as far as I am aware. What I could do, in theory, is rez the physics mesh, with a blue semi-transparent material in the same way that I draw the physics shape in the edit tools (using the "eye" icon) you'd not be able to step on it, but it would let you visualise it. Let me know if that "compromise" is useful or not and I can add that to the list.
  14. There are some logistical problems with multipart mesh, which is the main reason I've left it alone for now, I'd definitely like to add that. The UI, as I think I noted in the blog, is rather rough edged. I hate the sub-component pulldown box for example. One option is to have a directory-like tree structure, another is simply to treat them as the uploader today would and rez the object as a set of component meshes. Whichever we choose, there are some things to consider, for multipart meshes we need to have a surrogate for each part, and I think we'd probably expect those to behave as a linkset, it will become hard to find lost parts when we rez the initial surrogate on the ground and then spawn unconnected children from it. Link sets are essentially a server side thing, there are a number of "tricks" that we have to play to maintain the local mesh "illusion", notably ignoring object updates from the region for those objects we've "hijacked". The more complicated we get the more corner cases arise, linksets are part of that problem space.
  15. I've had one person with whom this has repeated on default viewer. @Whirly Fizzlemay have more info. That one was slightly different though as it seems to be able to be triggered at a location. The number of people affected seems to be a small handful and so far I've only seen it reported on FS, but with such a small number affected that does not rule things out one way or the other.
  16. Things get very questionable as we head down this path so all this advice is very speculative. The viewer (all viewers that are built from the foundation of the LL viewer) will run the main thread at 100% (on windows this is hidden by the awful scheduler that moves it around the CPU cores). Nowadays in addition to this we have many threads do extra work fetching textures and so forth, increasing the overall CPU usage while leaving the main thread free to draw more frames per second. We will also be hitting RAM and VRAM harder as well as your hard drive where the cache is. OBS, meanwhile is heavily demanding the GPU to record the screen, will have some memory pressure (to allow it to keep up with the frame rate and streaming to disk) and disk IO. The challenge is to work out where the conflict is here. As we are running faster, it could be that OBS is not getting enough CPU... I doubt this to be honest, though not knowing the machine specs that could be an issue, just not my first assumption. OBS will probably be streaming more data now, meaning that the IO demands on your drives is higher. FS is also more demanding on its caches etc. so ensure that they are writing to separate drives and use the window performance tools to see if IO is spiking high. The next of the usual suspects is RAM, both OBS and the viewer will be chewing on RAM to keep the pace up. It may be RAM pressure, check the memory usage on your machine. The final, for now, suspect has to be the GPU, as you note in your own observations. With higher framerates comes high GPU and VRAM usage. OBS needs it's access to the GPU too and so contention there is a real possibility. Look at the VRAM and GPU usage using a video card monitor app (I think we can even do this native in windows now). If VRAM is maxed then you might want to tweak the viewer "dynamic vram" settings to be more generous towards the OBS process, other things such as shadow quality can also have a significant impact on the throughput of the GPU. .
  17. I'd be interested to see a bug report (https://jira.firestormviewer.org) from someone willing to send us their settings. I've seen a couple of people (and it does seem to be a tiny minority) reporting unusual stalling (on hardware that you'd not expect it) and it is frequently resolved by a full settings wipe. It would be interesting to work out what combination of settings (and probably hardware) is causing this unexpected behaviour as it might allow us to do something about it. On the plus side, a wipe seems to resolve it, on the other hand, not knowing what weird combination is causing this means it might come back as you set things back to your liking.
  18. I just looked at it today (I tend to be reactive to threads here and only see things when I get pinged 🙂 ) and I'd have to take time (that for RL reasons I don't have at the moment) to examine the tests in more detail to form a proper conclusion. However, we'd need to assess how many draw calls each of those decomposed into for a start because, as you know from my previous "preachings" draw calls outweigh triangles by orders of magnitude. the server-side and network aspects can be largely excluded, they have depending on what it is you are trying to measure. There is a difference between frame rate and what for want of a better term I will call "scene rate", where the latter is how long it takes for all of the assets in your view to be full downloaded and made available for rendering. As you note CPU/GPU balance and bottlenecks is very context sensitive to both the scene being rendered and moreover the hardware doing the rendering. What you should be seeing when comparing a previous generation viewer to the current performant viewers is higher GPU utilisation no matter how powerful the GPU it is likely to be working harder than before. Likewise the CPU profile should have changed. Historically you'd have seen the CPU running at 100% on a single core (Windows is absolutely rubbish at conveying this and what you will see is that 100% spread across all the cores, so an 8 core CPU would show on average 12.5% utilisation - just blame the awful windows scheduler), nowadays you'll still see that burning core that represents the main render thread, but increasingly you'll see more utilisation on the other cores, with more work moving to other threads to run in parallel, in particular you'll see this upon arrival in a new region as we fetch and decode all the textures. In your case, the 3090 was barely awake before, and in my experimentation that should hold true for most moderately complex scenes on anything more than around a GTX980, because the fillrate of those GPUs could easily outstrip the supply rate of draws from the CPU and therefore waits. We are now pushing far more work per draw call, and so the GPUs are having to step up and play their part, but how much it steps up, will vary. I do have a TODO list task to revisit my previous experiments on mesh avatars and perhaps to extend that work, but it is unlikely to happen this side of November due to RL work commitments.
  19. I think everyone overthinks the entire LI thing. LI is a composite number that was an estimate of the impact on systems 15 years or so back. It is in tended as a constraint, a brake on otherwise uncontrolled resource usage. It is not directly, scientifically linked to performance and having spent quite some time going back and forth over alternative calculations I am far from convinced that such a measure even exists. When LI was implemented: We had slower networks Assets were sourced from the region textures were sourced from the region CPUs were slower GPUs were slower LI thus has no real accounting for client side performance, any linkage is more incidental and than intentional. LI (as we know) is the largest of the 3 defining factors, the streaming cost, the physics cost, the server cost. Nothing in there says "rendering cost". I would go so far as to say that in reality LI was only ever a defence mechanism for the servers. streaming cost = how much effort is it going to take the server to send this to clients physics cost = how much effort is it going to take the server to calculate the physics collisions server cost = how much effort is it going to take the server to run the scripts and manage the objects outside of the previous 2 cases Even the LOD mechanism plays to this. The LOD algorithm factors in both scale and proximity, why? Because on average, far away things that are small are far less likely to need to be sent (by the server) to users, thus saving effort. Every single part of this is targeted at managing the server load based on a historical server setup. These days the we are typically in a different place Faster networks (includes higher bandwidth, less packet loss, meaning fewer retransmissions) Assets are almost entirely served from the content delivery networks (in SL, not necessarily in OpenSim) CPUs (on servers) are faster GPUs don't actually figure in this. The limits on a "sim" were undoubtedly calculated based on a point in time average/typical capability of the machines installed in the LL data centres, taking into account all their limitations at that point in time. So the conclusion from this is that LI is not a relevant measure of performance as perceived by you, nor was it ever intended to be. Moreover, the correlation to region performance is significantly watered down for Second Life and to a lesser extent for Open Sim too. LI is therefore just a tax on rezzing things that gives them a value/cost that only remains relevant because of how land is rented and the correlation of space to "land impact"
  20. The cap with vsync is determined by your drivers not the viewer so it is hard to say where it comes from, I'll see if I can dig out something. The other complication (as mentioend elsewhere), which may or may not muddy the waters for you, is GSync and freesync adaptive rates. Frankly the whole thing is a rabbit warren It has been great to see the improvements that people are getting and, of course, a big thank you to the LL team that worked so hard on the core improvements.
  21. I suspect it was in the brief discussion of the future problems we are storing up for ourselves in my mesh body analytics posts (this one maybe - Why crowds cause lag) Towards the end is an appenidx of sorts where I briefly discuss some of the common myths and assumptions. We are now in that future space where the CPU bottleneck has been widened enough that for many of us we are seeing our GPUs work harder. Now you are facing overdraw issues. In an ideal world we would have to only shade each pixel once, that would give you a minimum frametime on your GPU and is dictated by your fill rate. As the techopedia article says there is no accurate or standard fill rate calculation but if for example your GPU had 1000 pixel pipeline cores and a 2Ghz clock speed then you might be able to shade 2 billion pixel per second. Let's now say that you have a 4K screen at 3840x2160 pixels. That gives you a max theoretical framerate on the GPU setup of ~240 fps. In reality, you rarely shade a pixel once only, various overlays and shadow passes etc complicate matters, but these are ok as they tend to be consistent. The issues arise when you friend walks on to the screen in their obscenely over complex mesh body, wearing their obscenely over complex clothes spewed out from marvellous designer by a creator who does care about you electricity bills, hardware health or FPS, and all of those tiny triangles are being drawn to the screen in a small 256x256 patch. There's only 64K of pixels and 250K of triangles....overdraw hell, your GPU is wasting time drawing and redrawing and redrawing the same pixel pointlessly. This is also why we grumble about alpha blend a lot. Alpha blend is by definition overdrawing as it needs to use the previous pixel colour to calculate the next one. Drawcalls were the bane of FPS last year, we've moved on and now triangles are back with a vengeance.
  22. With energy prices as they are here in the UK, I can embrace the FPS and be warmer 🙂
  23. These "problems" are well-known and exist across the board at least for openGL solutions, I've not checked whether other engines do the same. The implementation of VSync is an openGL feature not a viewer feature. Vsync should not be confused with FPS Limiting and most people are using it for this reason and that's the mistake that underlies pretty much this entire conversation. VSync by definition has to sync to the end/start of a frame, therefore it can only ever be some fraction of the frequency of your monitor, you cannot have a 60Hz monitor and run at a constant 40 FPS, The game/viewer/whatever has to wait for the transition in the frame in order to sync. meaning it is syncing each frame, every other frame, every third frame etc. without that, it would not be in "sync". G-Sync and Free-Sync technologies alter this equation by adjusting the monitor refresh to the frame rate, though how well that works for SL which is typically running in a windowed form I have no idea. Not everyone has G-Sync or Free-sync in any case. Read the following article for example, which suggests a number of external frame rate limiters but concludes with the statement that in general if you are lucky enough to have an in-built limiter then use it. https://www.gpumag.com/fps-limiters/ Meanwhile, a short google excursion will yield many results relating to VSync behaviour in all kinds of games Take this reddit thread for example, but many similar things exist.
  24. Without seeing the stats it is hard to be certain but in general a shadow quality that high will cause a lot more data to be shuffled between CPU and GPU, causing far larger SwapBuffer times (this is the delay enforced by the graphics driver while it gets all its work done) With the perf changes we are now utilising the GPU a lot more heavily so in conjunction with the extra load of high detail shadows it does not surprise me that you'd see a degradation like that, which as you say would be reduced by lowering the shadow quality setting. A fair chunk of supposition on my part in this statement but I suspect it is a reasonable bet.
  25. The frame limiter in the viewer set a time budget for each frame. If I set it to 20FPS then it will set my limit at 50ms. If my frame takes 30ms to draw, then it will sleep for the remaining 20ms. If you take more than 50ms then it doesn't do a thing. It is a proper cap to the performance and has no impact on the frames below that cap. I'm not sure what the frame limiter in the NVidia tools does. but as it cannot force the viewer to sleep, I presume it must block on the screen swap similarly to the vsync but without syncing. IF I am right in that assertion then it will be better than vsync but the built in limiter is arguably best as that is localised to the viewer, I could probably argue the other case if I wanted 🙂 Vsync is just a poor choice because as we've seen it is not a cap at a fixed rate but can cap at various lower rates and causes jitter
×
×
  • Create New...