Jump to content
ChinRey

Lag - prims, meshes or sculpts?

Recommended Posts

I've finally found some time to update my builders' blog. It's the first in a series of posts that are (hopefully) going to eventually cover all important aspects of lag. I started by looking at the geometry, comparing our three main building materials, prims, mesh and sculpts. I've tried to simplify as much as possible with a minimum of "tech talk" but it is actually a very complicated topic so if you are easiy bored, you may not want to read through it all in one go. Maybe bookmark it and keep it as a reference instead.

Feedback is always apreaciated, either here or on the blog. I do not claim I know everything about good building and there's always room for improvement.

https://chinrey.blogspot.com/2019/03/lag-geometry-prims-meshes-and-sculpts.html

  • Like 2
  • Thanks 2

Share this post


Link to post
Share on other sites
Quote

Rendering an object isn't a single operation, it takes several steps and the various materials we can build with loads the different steps differently

I think I know what you mean, but your verb has either become disconnected from the subject, or should be plural.

Quote

 (and lots of things that once existed but has been deleted)

same problem

Quote

The Second Life assets database has several billion of entries

Extra word.

Quote

even a failry minor grid may well build a million entries base fast

one typo, and end of sentence doesn't make any sense.

Quote

 it's not at all unusual for the gpu to spend more than half it's processing time idly waiting for the gpu to finish its part of the job. 

Too many gpus.

Quote

What Mole mesh is? 

Sentence fragment?

 

  • Thanks 2

Share this post


Link to post
Share on other sites
On 3/17/2019 at 12:57 PM, ChinRey said:

I've finally found some time to update my builders' blog. It's the first in a series of posts that are (hopefully) going to eventually cover all important aspects of lag. I started by looking at the geometry, comparing our three main building materials, prims, mesh and sculpts. I've tried to simplify as much as possible with a minimum of "tech talk" but it is actually a very complicated topic so if you are easiy bored, you may not want to read through it all in one go. Maybe bookmark it and keep it as a reference instead.

Feedback is always apreaciated, either here or on the blog. I do not claim I know everything about good building and there's always room for improvement.

https://chinrey.blogspot.com/2019/03/lag-geometry-prims-meshes-and-sculpts.html

You keep saying "fitted mesh" when I assume you mean "rigged mesh." Generic rigged mesh is rigged to the main skeleton bones and "fitted" mesh is also rigged to the collision bones. If you aren't combining the two concepts you should make it clear why "fitted" mesh is so much worse than ordinary rigged mesh, which you say nothing about.

  • Like 1

Share this post


Link to post
Share on other sites
29 minutes ago, Theresa Tennyson said:

You keep saying "fitted mesh" when I assume you mean "rigged mesh." Generic rigged mesh is rigged to the main skeleton bones and "fitted" mesh is also rigged to the collision bones. If you aren't combining the two concepts you should make it clear why "fitted" mesh is so much worse than ordinary rigged mesh, which you say nothing about.

That's a good point. I don't really have much information about how rigged mesh performs. The general opinion seems to be that it is the same as fitted mesh but as you say, it does use a simpler rigging so it's reasonable to assume it's less cpu heavy.

I've added a brief chapter about rigged mesh and some other buildign materials I overlooked to the article.

Share this post


Link to post
Share on other sites
23 hours ago, ChinRey said:

it does use a simpler rigging so it's reasonable to assume it's less cpu heavy.

Assuming that the rendering pipeline is similar to other realtime engines, deformation data is computed by the GPU from the polylist object computed by the CPU. So if this is true also for the SL viewers, the heavy CPU performance hit you refer to is due to the mesh construction/topology/vertex amount, while the rigging (deformation data) is computed by the GPU realtime. In this sort of scenario, it's conceivable to consider more joints data = more calculations, but it's also worth noticing that the per-mesh joint data limit splits these calcs into more "streams", depending from how the fitted mesh was sliced (in case of avatar bodies) or separated into pieces, making the process lighter. @Beq Janus might shed some light on this part as i'm not sure how viewers handle this.

  • Like 2

Share this post


Link to post
Share on other sites
22 hours ago, OptimoMaximo said:

Assuming that the rendering pipeline is similar to other realtime engines, deformation data is computed by the GPU from the polylist object computed by the CPU. So if this is true also for the SL viewers, the heavy CPU performance hit you refer to is due to the mesh construction/topology/vertex amount, while the rigging (deformation data) is computed by the GPU realtime. In this sort of scenario, it's conceivable to consider more joints data = more calculations, but it's also worth noticing that the per-mesh joint data limit splits these calcs into more "streams", depending from how the fitted mesh was sliced (in case of avatar bodies) or separated into pieces, making the process lighter. @Beq Janus might shed some light on this part as i'm not sure how viewers handle this.

Yes, we need somebody familar with the viewer code to answer this. But there is a significant amount delay before a fitted mesh is rendered properly at all. It's ahrd to see how that can be gpu related, it must be the cpu.

Share this post


Link to post
Share on other sites
On 3/17/2019 at 4:57 PM, ChinRey said:

I've finally found some time to update my builders' blog. It's the first in a series of posts that are (hopefully) going to eventually cover all important aspects of lag. I started by looking at the geometry, comparing our three main building materials, prims, mesh and sculpts. I've tried to simplify as much as possible with a minimum of "tech talk" but it is actually a very complicated topic so if you are easiy bored, you may not want to read through it all in one go. Maybe bookmark it and keep it as a reference instead.

Feedback is always apreaciated, either here or on the blog. I do not claim I know everything about good building and there's always room for improvement.

https://chinrey.blogspot.com/2019/03/lag-geometry-prims-meshes-and-sculpts.html

Interesting reading, I don't 100% agree with some of the assertions but I don't 100% disagree with any of it 🙂

I would note that fitted mesh and rigged mesh are no different at all in the viewer, in the end, a vertex can have 4 influences from a set of 110 bones used with the mesh as a whole. Whether those influences are derived from so-called collision bones or regular bones makes no difference. In all cases, the matrix palette for the transforms is recalculated at least every frame, for every drawable. One of the optimisations that I introduced for Animesh, that has a benefit beyond Animesh, is a caching of the matrix palette so that it is calculated just once per drawable per frame. Prior to this release, it was recalculated every render pass, which added a significant additional maths overhead to advanced rendering for things like shadows, and materials. These are all computed on the CPU by the way, then passed down and unpacked in the shaders, though I should note that I am far from an expert in the rendering pipeline, Animesh was my first real foray into that space and a large learning curve to even get to where I did; it would not surprise me to find that the full answer is more nuanced, but right now I don't believe that there is very much magic happening on the GPU for this. 

6 minutes ago, ChinRey said:

Yes, we need somebody familar with the viewer code to answer this. But there is a significant amount delay before a fitted mesh is rendered properly at all. It's ahrd to see how that can be gpu related, it must be the cpu.

Just as I was writing my answer 🙂 but yes I concur... I don't think enough happens on the GPU but at the same time, it is not always that case that just because it could be done on the GPU it should be (but that is a different tale).

A few comments on the blog itself.

Re: Asset serving

Quote

The difference is marginal though so as far as the assets server is concerned, it doesn't matter if it's prim, mesh or sculpts. It's the number of parts that counts. However, prim builds do tend to be made from more parts than meshes and sculpts so for that reason prims are generally heavier on the assets server than the other two.

1

While this is true, it is worth keeping in mind that both Sculpts and Mesh and have an additional fetch overhead this is not quite the same as the Asset Server discussion really, the point is that the data fetched for a mesh and a sculpt is in two parts, asset data, then "mesh" data, where "mesh" can be defined either as a sculpt image or as a triangular mesh.

From the CPU section

Quote

It's hard to quantify how much impact the various bulding materials have on the cpu load but the obvious differences are large enough it's easy to rank them:

  1. Prims: very fast
  2. Regular mesh: quite fast
  3. Optimized sculpts: rather slow
  4. Unoptimized sculpts: slow
  5. Fitted mesh: very slow
 

I've not tested this... So I will bow to your assertions in the absence of any other credible argument. However, Prim rendering is not that efficient and it would be my guess that creating an exact equivalent of a given prim in Mesh could well be faster because in Mesh it is explicitly formed, where prims are in large part procedural. I am speaking here from a point of "informed ignorance", by which I mean I am stating what ought to be the case given what I know, but it is an area I have not paid close attention to. That said, taking the metadata description of a prim and drawing it, will be slower than slapping a bunch of triangles into a vertex buffer (see my comment on memory pressure though). Another example of such informed ignorance is in my belief that because prims have a number of faces, and these are to some extent hard-coded (to some extent because while a cube prim has 6 material faces, a hollow cube has more); it is my belief that a prim will always be rendered in as many parts as it has material slots, even if those are textured identically, and while the same is true of Mesh, the creator has explicit control over the number of faces and can choose to simplify/optimise. Thus (to give a somewhat stupid example) a plywood default prim will likely be drawn as 6 faces. A fully-plywood cube could be constructed to be of a single texture face and thus render faster. 

Extending this point, it is always worth remembering that what we might think of as a single mesh is drawn as up to 8 meshes, one for each texturable surface. There are good opportunities to optimise UV and VRAM use, and rendering performance by managing these texture slots and to echo your sentiments on alpha. If you have alpha on one texture face, consider giving it a separate texture sheet because it will remove shader passes from the faces that don't have the alpha present and avoids some of the glitches too. I can't say "always" here because there are arguments for the texture reuse where the data handling benefits may outweigh rendering cost. Nobody ever said this was going to be easy 🙂

Finally a word on bottlenecks, in your blog on CPU, you bridge a little into the subject of GPU because of their relationship.

Quote

 This part of the rendering flow has been all but overlooked until recently but it turns out it's a major bottleneck and it's not at all unusual for the gpu to spend more than half it's processing time idly waiting for the cpu to finish its part of the job

2

The simple fact here is that "your mileage may vary". I alluded to an optimisation I made in the Animesh release of Firestorm, it slashed the number of matrix calculations the CPU is making per frame, especially for those of us running with shadows and ALM, this was a CPU saving that was not inconsiderable, but did it result in a significant speedup for everyone?

No, not really...the answer as to why is "because CPU was not your bottleneck", this is evident in fact, if you look at your CPU utilisation, it is not generally thrashing a core at 100% (people on Linux report otherwise, but I think that is a peculiarity of the viewer on Linux more than anything else), mostly your CPU is busy but it is not running flat out. If it is....then good news, my change WILL have helped you, but if your machine is like mine, then the bottleneck is not CPU and not GPU, it is IO and memory/cache utilisation. 

94968123d4def2ca59f0b815ef08e8f6.png

In profiling my system I found that more than 50% of elapsed time the CPU was waiting for DRAM to catch up, pulling things from RAM into Cache was/is causing the CPU to stall and wait. So I might get better frame rates if I buy faster RAM , yay for me but it doesn't help the general case, or perhaps find a way to ensure better cache locality in the viewer pipeline (if only it were that easy)

 

 

 

 

 

  • Like 2

Share this post


Link to post
Share on other sites
Posted (edited)

I'm far less of an expert from Beq on this topic, so I will defer to her "informed ignorance".  I just wanted to point out that Prims appearing before everything else in a scene has nothing to do with their render-impact, but rather the fact that they're constructs of the viewer itself.  However, people tend to think that means they're more efficient since they just poof appear.  From what I understand about the viewer is that you don't have to download a prim mesh every time you want to look at one, rather you're just getting info from the server "prim box, these inputs".  As such they appear in the scene before everything else.  Which, as Beq noted ~ doesn't mean that they induce less GPU lag than say a similarly constructed box that someone uploaded.

Edited by polysail
  • Like 1

Share this post


Link to post
Share on other sites
5 hours ago, polysail said:

I'm far less of an expert from Beq on this topic, so I will defer to her "informed ignorance".  I just wanted to point out that Prims appearing before everything else in a scene has nothing to do with their render-impact, but rather the fact that they're constructs of the viewer itself.  However, people tend to think that means they're more efficient since they just poof appear.  From what I understand about the viewer is that you don't have to download a prim mesh every time you want to look at one, rather you're just getting info from the server "prim box, these inputs".  As such they appear in the scene before everything else.  Which, as Beq noted ~ doesn't mean that they induce less GPU lag than say a similarly constructed box that someone uploaded.

Prims are more "efficient" by way of bandwidth (which, obviously enough, is why they appear to quickly) - they also are prims. Fewest mesh triangles and polygons and vertices and all that mumbo-jumbo - why they are called prim (as in Primitive Shapes) - this also makes them more efficient in computer-construction: the shapes are limited, the design you can do with them in a geometric sense is limited. The CPU can handle this easily.

The only way any other method can be more efficient is when it comes to textures. But anyone with an ounce to 3D modeling knowledge and understanding knows this already.

  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)

I can't possibly reply to everything Beq wrote, at least not right away - it'll take a bit of time to digest all that info. But a few points.

On 3/21/2019 at 2:25 PM, Beq Janus said:

I don't 100% agree with some of the assertions but I don't 100% disagree with any of it 🙂

That's the way it should be. My article is not static, it's under constant revision as more info becomes available.

 

On 3/21/2019 at 2:25 PM, Beq Janus said:

I would note that fitted mesh and rigged mesh are no different at all in the viewer, in the end, a vertex can have 4 influences from a set of 110 bones used with the mesh as a whole.

Is that confirmed info? Theresa had some concerns about this and I didn't know the answer.

 

On 3/21/2019 at 2:25 PM, Beq Janus said:

One of the optimisations that I introduced for Animesh, that has a benefit beyond Animesh, is a caching of the matrix palette so that it is calculated just once per drawable per frame. Prior to this release, it was recalculated every render pass

That's great news! :)

 

On 3/21/2019 at 2:25 PM, Beq Janus said:

There are good opportunities to optimise UV and VRAM use, and rendering performance by managing these texture slots and to echo your sentiments on alpha.

Oh yes. I was going to add another article with some practical tips how to reduce lag and this is definitely one of them.

 

On 3/21/2019 at 2:25 PM, Beq Janus said:

However, Prim rendering is not that efficient and it would be my guess that creating an exact equivalent of a given prim in Mesh could well be faster because in Mesh it is explicitly formed, where prims are in large part procedural.

The part you quote was about cpu, not gpu load and a prim will always be more efficient there. Of course that doesn't mean multiple prims will be faster than a single mesh but it's clear that the viewer code was specially optimized to handle prims as efficiently as possible while the bits that were bolted on to it to handle sculpts and meshes are not nearly as streamlined.

As for the gpu load, procedural and semi-procedural obejcts like prims and sculpts will always have a few superfluous triangles and vertices but they are also far easier to handle when it comes to LoD. We should be careful to put too much significance on raw triangle and vertice (and pixel) count anyway. They are serious lag factors yes, but their "lag curves" are not linear. You can add elements up to a certain level with little or no negative effect but once you take it past a "switch point", you get a significant lag increase.

This is an illustration I made to demonstrate something else, but it isn't too irrelevant here. It's a mountain landscape I'm working on and I'm struggling to find something interesting to add to it. Here it is with just a (mesh) road:

931771835_Skjermbilde(2073).thumb.jpg.563dae724c4acceab999e7ebf29f7a18.jpg

Here I've added three big rock sculpts:

2073015789_Skjermbilde(2072).thumb.jpg.258ba6d691766c99c02d67cf09e20cd5.jpg

That's 3072 vertices and 6144 triangles and they didn't reduce my frame rate at all. In fact, adding those sculpts increased my fps slightly.

(Incidentally, I believe those "switch points" in the lag curve is the reason why the SSP demo is so much laggier than the sims the Moles are working on. It's the only explanation I can think of. I dont want to speculate further on that until we get some realistic load test result or see how the official launch performs though.)

 

On 3/21/2019 at 2:43 PM, polysail said:

I just wanted to point out that Prims appearing before everything else in a scene has nothing to do with their render-impact, but rather the fact that they're constructs of the viewer itself.  However, people tend to think that means they're more efficient since they just poof appear.

That's true but streaming cost is actually a significant lag factor. Not only does it add to the bandwidth requirements, downlaoding data also keeps the cpu busy. This is why LL developed prims in the first place and why they were so reluctant to add polylist mesh to SL. The difference in data amount between the three main building materials is profound.

The geometric shape of any prim, no matter how twisted it is, can be defined from 14 bytes of data. That incudes all LoD models. In reality it probably uses a little bit more but not much.

The geometric shape of a regular 1024 vertice sculpt requires 3073 bytes - 3 KB - of data, all LoD models included. But because of the awkward way sculpts are implemented, they require four times as much in reality, so 12 KB - slightly more than 12,000 bytes. Sculpts can be done with fewer vertices - and less data - but that's an undocumented feature and hardly ever used.

Assuming SL meshes use 32 bit vertice coordinates (it may be higher) and ignoring weight data, each vertice of a mesh needs 30 bytes of data and each triangle 9 bytes. LoD models are not included here, each needs its own set of vertice and triangle data. This is a conservative estimate, in reality data amount for vertices are quite a bit higher, but I don't know enugh about exactly how meshes works and I'd rather underestimate than overestimate here.

Two examples what that implies:

  • Cube:
    • Prim: 0.014 KB
    • The theoretical well implemented sculpt: 0.4 KB or possibly 0.2 KB
    • The actual SL sculpt: 12 KB (can be reduced to 1.5 KB or possibly 0.8 KB with a smaller sculpt map)
    • Mesh (with 32 bit coordinates): 1.6 KB
  • Torus:
    • Prim: 0.014 KB
    • The theoretical well implemented sculpt: 3 KB
    • The actual SL sculpt: 12 KB
    • Mesh (with 32 bit coordinates): 44 KB

 

Edited by ChinRey
Typos
  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...