Jump to content

Beq Janus

Resident
  • Posts

    608
  • Joined

  • Last visited

Everything posted by Beq Janus

  1. Indeed, for now that is the case. That does not make it right. The reason I say it is worse than useless is that it quite literally makes people do the wrong thing on many occasions. There is a separate thread in the viewer forums where I shared a snapshot of the the so-called worst offenders based on their ARC alongside their actual render time on the machine I was running on. There was next to no correlation between the two. Here it is. The above image shows the Beq-hacked version of a new tool inside the viewer that the lab are planning to release. In my one I have altered the white line graphs to show the actual rendering time, but it remains (in this image) sorted by ARC, you can very clearly see that ARC has no bearing on actual FPS performance. Here is the Lab's original, same UI just that the white line is indicating the ARC instead. Sorry for the blur I cropped it from a video clip I made. Notice here how by sliding the uhm slider we are "causing the top of the list people to be jelly dolled"? Great idea, in theory, but when you look at the image from my version where the white lines are the true render cost. you can see that by jellydolling the top ARC people you are 1) de-rendering the innocents and 2) not really achieving the intended result. In fact, in my example at the top just removing that one FPS hog person would have saved more FPS than jellydolling the top 6 or maybe 7 using ARC, and with the ARC option you'd still be left with the worst offender. You might as well just pick avatars based on their hair colour. I want to be clear here though. The tool itself is reasonably well-formed, the intention is great. I like the UI shown above, I am now lobbying and asking LL at every opportunity not to release it until there is something better than ARC in place. I will be updating my blog with my progress on what I hope will be an improved version of this tool. The trouble is that ARC is not just for the jellydolls it is used elsewhere, people such as yourself use it to decide if their products are good (which is a really great thing, we need to take care over such things) but giving people the wrong advice is depressing and coutner productive. We desperately need tools. What we don't need is more badly misleading tools.
  2. The viewers test the "texture entry" attribute's colour transparency value. That is essentially the "Transparency" box on the build/edit floater or the texture face alpha property in lsl terms. llSetAlpha(0.0, face); So in that regard it won't affect the complexity. Complexity itself is a worse than useless value and while I realise that (too) many people still consider it the arbiter of good quality products it is ever so badly flawed so don't place too much stock in what complexity says.
  3. Part 2 just landed - https://beqsother.blogspot.com/2021/09/find-me-some-body-to-lovebenchmarking.html It's not as easy on the eyes as the last one. Pretty much. No major overhaul is likely until we get a completely new pipeline
  4. I've been meaning to get this all written up for a while now. Here is a two part blog on why we need to ditch alpha cuts and start making alpha layers again. The first part should be consumable by most. The second is more of a deep dive into the numbers and probably less interesting to most. part 1 - https://beqsother.blogspot.com/2021/09/why-crowds-cause-lag-why-you-are-to.html part 2 - https://beqsother.blogspot.com/2021/09/find-me-some-body-to-lovebenchmarking.html
  5. Part #1 of my blog is finally up. I'll start writing up the more numbers-based second part today.
  6. 😄 For the most part the shaders in FS correspond directly to those in LL. There are some cases where we have small tweaks and changes, typically to either fix a bug/problem that has not been addressed upstream or to maintain some feature or other that FS users want us to hang on to (and in fact the vast majority of those things tend to be outside the pipeline) If there is a repeatable use case I'm happy to look at it. Raise me a Jira to chew on. Outside of hard fact I cannot investigate speculation, or at least it is not generally a valuable use of my time to try to. The biggest difference in shaders introduced with EEP is in the water rendering where the new refraction and reflection rendering is very costly in comparison to before (it is arguably also more accurate - but whatever) If you are seeing problems since EEP you might want to look at turning down the reflections in your settings and see if that fixes it. Try opaque water too.
  7. All of the above. At the pipeline level, every texturable face is a separate "mesh", unfortunately the word gets overloaded here so at times it can be hard to know what we are referring to. When I talk about a mesh for the rest of this reply, I will be talking about a single texture face on an object. Every rigged mesh is processed separately, the render pipeline is constructed of multiple passes and different types of mesh need to be in different passes, if you have alpha blend transparency you go into a different set of passes to when you have an opaque texture for example. the implication of this is that every mesh gets dispatched to the GPU separately. It is this "dispatching" that I refer to as a drawcall(). Because the drawcall overhead is so high, the more of them you have the slower things go. I will write up a full explanation tomorrow if I can. It is 1am now so time to sleep, not to start a deep technical dive 🙂 That Jira is actually on about something slightly different. I also happen not to believe that IMG_INVISIBLE does anything valuable at present. It is mostly used within the bake system not more generally. However, all the viewers, certainly all the TPVs have code that will drop fully transparent "meshes" (see above note on what I mean by mesh) before they get rendered. This does not fully eliminate the overhead, but in rendering terms it mitigates the vast majority of the cost. This is not really the case. SLink Redux uses BOM exclusively, it has no alpha slicing and yet it supports breast and buttock mods quite happily. Such meshes which are enabled/disabled through transparency, then it fall into the same category as multiipose feet and multiistyle hairs. Which is to say that so long as the unused ones are fully transparent then the worst of the performance issues are avoided. I want to be very clear here that should some magic wand be waved and all of a sudden these disastrous bodies were all removed and replaced with lower segment versions, we'd most likely be able to see the downside of these transparent ghosts, but right now, in a world of heinous mesh body designs they are a very minor evil. (i.e. even if it is not being drawn it is using RAM and takes CPU time to load it and process it, right now that cost is lost in the screaming nightmare of alpha cuts) For this you'd need to be able to wear and unwear an alpha layer. This would achieve the effect, in fact you have far greater control over things using this method as you are not limited to wear a body creator has placed the cuts. The problem is that I do not think we can add alpha layers from a script at present without the use of RLV. Even with an experience. This is the only change that you'd need. Going back to the general "Beq advocates removing alpha cuts" yes, Beq totally does, but that is not to say it is an absolute thing. The trick that you can see with SLink Redux or Inthium Kupra both of which are pure BOM bodies; is that they minimise the number of meshes and as a result are far more efficient to draw. I am not at all saying "thou shalt make all things as single mesh or be forever damned", it is use as few meshes as possible to achieve what you need. If one mesh body is made up of 240 meshes and another is made up of 24 meshes, the 24 mesh body will draw 10x faster. It is quite literally that simple and quite literally that linear for most people on most hardware. Liz and I worked on a full set of benchmarks to test all of this and the results are pretty compelling, but they also need a lot of explaining as there's a lot of data in there. I will do my best to finish my blog post on this and link it here tomorrow or Sunday, RL permitting. I've been trying to get this out for a few months though 😞
  8. Beq Janus

    LODs

    It's really just a hack though. It does no harm and I don't particularly worry about it, there is a slight argument that it will load faster but that's unproven and it would certainly not be noticeable. The reason this "works" is simply that the download cost of mesh is based upon the size of each LOD in the asset file. The Asset file "zip" compresses the mesh data and by sorting it in different ways you can achieve better or worse compression. For most meshes it makes little to no difference (it might be enough to shift you from 3.7 to 3.4 and thus save 1LI in the inworld sense. A few months back I actually got around to "making stuff" instead of fiddling with code. I made a bunch of hand tools for New Babbage where I live. I wanted each of these to work well enough that you could have them on a tool bench or workshop scene on the background or be carrying them in your hand, and ideally I wanted them to be 1LI (though I don't typically wed myself to these figures as people make far too many sacrifices for the sake of 1LI as it is) The problem we have is that the current Land Impact system (deliberately) penalises triangle usage in the lower LIs harshly, especially so for small items such as these. As general (sweeping) approximation you have at most 20 triangles in the lowest LOD before you have no chance of getting your prized 1LI. Imposters are the way forward, not the lazy crumpled triangle nonsense of the item in the OPs image, which sadly we see far too much of (and is of course the very reason I wrote the LOD viewer capability for FS in the first place). For the wrench, which is adjustable. making an imposter per segment allowed the impostered view to remain correctly adjusted (a detail nobody but me would likely ever notice!!) https://i.gyazo.com/709b8bc22060110c18ffbdcf9b7fe993.mp4 My process for making LODs is typically to start form the high, or occasionally the medium, and deliberately cut away smaller sections of mesh. Get used to think about how small they will be on screen when they are at a given LOD in order to drive your design decisions. This can be tough, and the trade off used by @Chic Aeon of either padding the bounding box to inflate the radius (which will affect the LI) or just accepting that the item is not going to be seen at that distance is entirely valid (I swear my RL house keys have the lowest LOD zeroed, as I frequently fail to see them when they are right there on the table). Here is a small video of me quickly whizzing through the attempts I made at making viable LOD models. https://i.gyazo.com/f1d06b356e38500465cf44ff71072357.mp4 Note here, that most of the tools I am making are well suited to imposters, you see them side on for the most part, if they are not side on (on a tool rack or similar) then you probably don't care much (they'll be lost against the avatar holding them etc. The exception being something like the oil can which was a total pain because it needs a proper volumetric LOD model and it is asymmetrical meaning I cannot use the "plant pot" imposter trick of a star mesh. In the end each one was 1LI, and each one is a single texture. I use a 512 for all except (apparently - I just looked) the wrench and the matches to which I have granted a 1024 but I think that's because I haven't bothered to try the 512s yet!. I also consider those a guilty pleasure given that I don't sell stuff much so I'm not polluting anyone else's Second Life 😉
  9. Indeed I meant something else entirely 🙂 , but as a texture accounting scheme this is ok. In one sense I am not entirely sure what I am looking for here either. When we render a mesh most of the CPU focus is on preparing the geometry for the GPU. This includes associating textures with faces and binding them. It is not clear to me whether the binding cost happens synchronously i.e at the time we do the bind in the code. Or later on, when opengl decides to flush things. In the synchronous case we should be good, my stats have it covered. If it is happening asynchronously then that cost would be missing from my stats and almost certainly un-attributable to the avatar. I am not a graphics expert by any definition so a lot of the time there are nuances that are missed. I am happy with these stats as I can prove that while they may not be 100% fully accountable they are highly representative of the costs. It's always nice to have a fuller picture though.
  10. I'll be sharing my implementation once I have it in a sensible form, the main objective is to get something to LL that can make sure that this new floater adds some value. This is literally the first step, I have a related "overlay" mode which is fun but of course does not work well with Realtime stats as the overlay itself is changing the avatar render 😄 Indeed, and therein lies a problem, we can do those kind of things in benchmarks but it is not an end-user tool. I don't think you can easily extract per avatar costs at GPU level. I'll take a look into that once I get further on with this. The other problem which is related is textures. I'd be interested to hear from other devs as to whether there is a way to fully account for the texture transfer cost per avatar. I don't think we can. What I do have (not yet integrated) is measurements of the swapbuffer latency which to some extent relates to the volume of information being pushed to the card but it is per frame and cannot be easily subdivided as far as I know. Even so, this is a step in the right direction I hope. The more information we give users the better choices they can make.
  11. Whoo it works!! It needs some proper loving but essentially it works. This (very early test build of Firestorm) has a measurement of the actual cost of rendering avatars. It builds upon the existing performance floater from LL, but I have integrated some proper accounting that measures the actual time spent on the CPU for each avatar. Of course CPU is not the whoel story but the stats I am using are effectively the proportion of each frame spent rendering a given Avatars geometry and any shadows. etc. I'll be adding more... For the purposes of this test I left the list sorted by complexity (ARC) and updated it so the graph (that white line) shows the true render cost. As a result you can see that the people at the top of the list are not typically the problem. Now, this being a first run through of this integration, the values might be misleading, so the following video shows a test. https://i.gyazo.com/c98b325ebd4bfb22699458db20be7fb1.mp4 First I disable the Bea person. Notice how the FPS noticably goes up (a couple of FPS when we're only running at 12) I re-enable them. Down it drops, I then disable Poyi, there is no change, and even with snowflake, arguably the second largest render cost and we see only a slight change by comparison. Early days but I think this highlights my concerns pretty well.
  12. I totally agree. The problem is that your body is not good relative to the hair. It will almost certainly be 100 times worse (making an assumption that a mesh hair is likely to be <5 draw calls. We desperately need these tools to raise awareness of the problems we have. Unless this gets fixed we will be teaching people the wrong lessons. This is my concern, you should be empowered to make choices based on good data. If you have a higher complexity because of an item, but you are content with that item then you are making an informed choice. the problem is that already you are feeling dissuaded from the perfectly good unrigged hair towards the rigged hair because the tool is misleadingly showing it as being better. It almost certainly is not. There is a reason why the unrigged hair shows higher. The unrigged hair has to have LODs that are populated because they switch LODs correctly. Rigged hair, due to ancient unfixed bugs, do not switch LODs when they "should" and as a result the creeators can choose not to porivde lower LOD models, this reduces the complexity score. What it does not do is change any aspect of the rendering time. In both cases the viewer will be drawing a few (<5) batches of data totalling ~25K triangles (or whatever) of hair. If anything the otherwise identical rigged hairwill take longer to display because there is a lot more mathematics involved in applying the weights to deform it to the rig. These are Techy details that most people won't want to understand, and the tools should be saying "This one good", "This one not so good" in a way that we can trust. Right now it is blatantly misinforming us. 😞
  13. I have very very mixed feelngs about this viewer. It needs a radical change to avoid making problems worse. A great example. This is exactly why this viewer cannot be released. It has just convinced you to change something, quite possibly for the worse. The 12 ARC is not that badly skewed, you can get hair with 10 or 20 times that (incorrectly calculated). In this case it may be that you have a rigged hair with multiple styles incorporated and a number of textures. If it were higher I would more confidently guess that your "problematic" hair was unrigged. Unrigged hair is penalised by ARC, unequally compared to its rigged equivalents. Given the same base mesh, the rigged ones will take at least as long to render, and yet their ARC can be an order of magnitude lower. Thus the entire premise of the data displayed in this floater is flawed. Back to the floater itself. I very much dislike the renaming of existing variables such that they have different names in preferences to that in the floater, this means that anything learned in the new floater does not easily transfer to the main preferences. These should be made the same. The avatar list is a nice display, however it comes to the very heart of my dismay. It uses ARC which is so fundamentally wrong as to be misleading. The Maitreya body,as an example (but not the worst), is a significant rendering overhead due to it clinging on to alpha segmentation. This is not a poke at Maitreya, ALL multipart bodies are very bad performers, in the case of Maitreya, a full body will require in the region of 3-400 draw calls (batches of triangles sent to the GPU), Legacy is the worst, with Belleza close behind. Male bodies are typically worse than female ones too. Compare this to bodies that have sensibly embraced BOM properly, such as Slink Redux or the Kupra which have far less (typically 10-15% of the number of draw calls), the render time in my experiemnts is almost linear in terms of drawcalls (almost irrespective of the triangle count even on older laptops without GPUs). ARC does not come close to reflecting this, it is mired in concerns of triangles. Tirnagle counts do have an impact, it is just hidden in the noise when draw calls are over used. The result is that you have a sorted list with supposedly the "worst" lagging avatars at the top, but which is generally misleading and in many cases utterly wrong. You can easily get into the situation where someone wearing a comparatively efficient SLink or Kupra body and a relatively low overhead unrigged mesh hair will appear at the top of the list, will appear with a very long line at the top of your list, meanwhile a person with a body full of alpha segments and a rigged mesh hair is way down lower. By using the tool you eliminate the efficient avatar and keep the inefficient one, utter madness. Moreover, we'll see this used to persuade people, either through finger pointing by their friends/associates, or through good-willed self-improvement, to ditch their efficient outfits for worse ones, achieving the opposite effect to that intended. I am in the process of developing a way of scoring these that will alter this and give more accurate data personalised to your machine. Assuming I can get it to work, I'll be contributing it the lab in the hope that we can avoid this new "finger pointing" disaster. TL;DR The presentation of the floater is not bad, the idea good, the information that it provides completely flawed and in many cases counter-productive .
  14. Do you have a screenshot to show what you mean? It does sounds like you've messed up Anti aliasing somewhere. If you have changed your driver settings externally so that the driver settings override the application then that can prevent the viewer from overriding it. One option is to hit the little "recycle" arrow on the right of the performance/quality slider in the viewer preferences. This will reset to the default for your hardware, and should (hopefully) wipe out any weirdness. Then you'll have to go through and fiddle with them again to get them how you want of course.
  15. seems reasonable (given I thought that was how it was working) I'll try to remember to look at this next time I'm in that code. If you feel like raising me a Jira that'll make sure I don't forget.
  16. I guess it depends to some extent on what "jerkiness" means. imposters only update periodically and are thus jerky by definition but I am pretty certain that is not what you mean. The imposter rendering is not great in SL. You still have to render the entire avatar, attachments and all, but you don't do it quite so often. I've not actually measured imposter render cost directly but I would assume it to be, per avatar, higher for one frame (a cost which is supposedly amortised over multiple frames to make it worth while). This is pure speculation, as I have not tested any of this, but if you have a large number of avatars being rendered ready for impostering, and given that this takes a bit more effort and they then need to be redrawn some defined period later (IIRC redraw is triggered by a number of things, time, camera movemment etc) then every time we recalculate the imposters we get a burst of activity and a longer than normal frame time, followed by shorter frames. this is by definition a jerky experience. On the other hand by rendering all these mesh avatar monstrosities you are getting a consistent, but lower FPS. The logic to my reasoning only really holds up where for your specific hardware and the circustances in the scene being drawn imply that the cost saving of the imposters is low relative to the true cost (that's a lot of ifs and buts and guesses) There have been a number of changes to how jelly dolls are rendered that have flowed down from the lab in recent releases. These are related to imposters and it is possible that those changes have tipped the balance against the imposters. As I say, lots of speculative thoughts in that. I will try to measure the reality at some point and see if I can see any weirdness.
  17. It can be the case, it comes with one of those annoying "it depends" tags. When you have settings set low some things are moved from shaders on the GPU to being processed on the CPU. If, as is the case for the majority of people with a semi-modern graphics card, you are CPU bound in SL then this is definitely going to make you slower overall as it is increasing the load on the CPU that is already working hard while leaving the largely idle GPU twiddling its thumbs. It really depends on a great number of things. Constant grey textures, meaning that they never rez? At a high level, there are 3 factors that affect texture rendering 1) How quickly they can be retrieved (Network or Cache - check whitelisting, make sure your cache is on a fast drive if possible.) 2) How quickly they can be decoded and made ready to use (Concurrent decode helps here - the size of textures affects this a lot.) 3) How quickly they can be moved between CPU and GPU. (The size and number of textures in a scene affect this a lot - this is harder to quantify as it comes down to a mix of RAM and VRAM and bus - once the texture is decoded it is sitting in RAM but needs to be sent to the GPU to be used, the sheer number of textures fighting to be shown is a typical bottleneck here but it is very dependent on you hardware) Ironically, with the potential for mesh to "form" faster, you are more likely to see a fully rezzed object with no textures than you were previously. In the past it took a bit longer to unpack the mesh so you didn't see the model as quickly. This can add to the feeling of things being slower (whether they are or not). It is impossible really to comment on general impressions of speed or slowness without actual benchmarks there are far too many variables in SL. One other thing to keep in mind is that things getting slower over time does not necessarily mean that the viewer has become slower. Quite often the scenes have become more complex, so it is always best to do side-by-side comparisons rather than rely on historical data unless you are 100% sure that the scene is the same. Things like this are why it is so notoriously hard to benchmark anything in SL.
  18. This in particular sounds very much as if you do not have your cache folders whitelisted in you AV. Try this https://wiki.firestormviewer.org/antivirus_whitelisting
  19. It is late, and I am very tired, apologies for any silly typos and any stupid mistakes but I think that this summarises the key points. The changes that are in the latest FS that affect performance are almost entirely off of the main thread, as such there will be almost no FPS improvement, there is a small reduction in the idle processing and a possibility of beneficial side-effects through the other changes but given the measurement uncertainty (the variance or jitter) in the viewer FPS I don't think for a second that anyone would be able to detect that reliably. Any repeatable changes in FPS would be very very surprising to me, the placebo effect cannot be discounted in all these observations. TL;DR This viewer release IS overall faster than the previous courtesy of a handful of tweaks, and observed based on a number of controlled benchmarks conducted on hardware that I own. The changes I detail below should imply a general speedup of scene building and rendering but not in FPS, as always your mileage may vary. *shrugs* If you are seeing a large slow down in FPS (or indeed a large increase) it is very unlikely to be anything that we have done as there are no changes that are expected to have that effect (good or bad). The key (potentially) performance related changes are: New disk cache : Should mean faster cache retrieval but frankly it is unlikely to make much difference. It will almost certainly have no impact at all on "new to me" places as these are all going to be stuck on network retrieval before they even reach the cache. Network fetch is conducted in subsidiary threads. Concurrent image decoding: This is the decompressing of jpeg2000 data into RGBA data that is ready for the GPU. There is an interaction with the disk cache here as we load the file, it is then passed to the KDU jpeg library for decoding. There are numerous threads doing this now, so what we can say with certainty is that given say 2000 textures to load and 4 threads to do this on, the total time to process these has reduced significantly. This means that the decoded texture data is ready for use earlier than it was previously, we are also using more of the CPU resources of the machine that previously and as this is done on separate threads it is not impacting the FPS (for good or ill) Faster mesh decoding: mesh data (each LOD) is sent as a blob of zip compressed bytes. Previously these were decoded using a temporary buffer and copied into place. We have removed the intermediate buffering and this leads to an overall reduction in time taken, but also reduces memory and cache churn which has an effect on latency. If this change has an effect it is in making the mesh available sooner than before, this means that shapes will resolve from "grey blobs" to "grey meshes" quicker than they did before. Putting these together what would I expect the typical user to notice? Not very much. None of these are on the main thread and thus the FPS is going to remain fundamentally the same as the previous release. As always, there are a heap of other changes in any release and while none of those are expected to impact the FPS for better or worse, never say never. While the FPS should remain largely the same, the textures may appear faster esp when previously cached (and thus not reliant on the vagaries of network fetching) and meshes will take their form a little faster than before. For me this gives things a slightly more robust feel. It is hard to say that things are "faster" but they feel as though things are "ready" sooner, which is pretty much what I would hope for. Given that a new scene after a TP arrival takes many 10s of seconds to resolve any marginal improvements here are hard to quantify without explicit tests that do not rely on the fallibility of human observation, or the dynamics of the SL interactions. If things are markedly slower then I would suggest a careful review of all of your AV exclusion rules, time and again this is the culprit even when people swear black is white that it is not. You don't have to convince me, be honest with yourself, and do yourself a favour and double, triple, quadruple check those until you develop OCD about it. Make sure your cache is on the fastest storage you have. When you are "benchmarking" make sure that you do everything that you can to create the same scenario. In particular, make sure that you have the same number and type of avatar in the scene if you have any expectation of even approximate comparison. Use pre-cached textures, this means you will need to pre-cache before each run because of the different cache format. If you do not use pre-cached textures then you are adding a large uncertainty into the mix. Even with a cleaned cache a subsequent fetch of a texture will likely hit an edge server for the CDN, the assets having been cached as near to you as the CDN can achieve. An initial fetch on a UUID that has never been seen before, (by you, or anyone else using SL in the same geogrpahical partiiotn that the CDN allocates you to) has a higher overhead as it will need to be fetched from the central servers. Review all the settings, shadows, ALM, DD, LODmultiplier etc etc Finally, if you have a really old CPU, or simply don't believe the threaded image decoding is adding value set the concurrency level to 1 and this will behave almost identically to the previous release, wherein a single thread (not the main) is responsible for all the decoding. In all other cases I strongly advise leaving it at 0 (the default)
  20. I kinda agree, except that any time we do that we have 50% of people telling us we're wrong anyway.
  21. With the new cache system VFS has been retired so that message just replaces that during that initial startup phase.
  22. Sadly not. this is managed by the server-side and not able to be specified by the viewer. As others have noted you can achieve some of this through scripts and Liz (@polysail) and I have been prodding LL to fix a server-side problem that will allow full names of components of linksets to be retained. In conjunction with this change (you will be able to export a list of positions and rotations - and any other attributes) from Blender, Maya etc and paste those into a notecard to adjust positions of complex builds. See https://jira.secondlife.com/browse/BUG-202864 for more details on this.
  23. Yes, these apply across platforms. I've also fixed the block that prevents the OpenSim/non-havok build from using Analyze. It will now correctly allow the non-Havok hull decomposition which, while not as good as the Havok one, is useful to have.
×
×
  • Create New...