Jump to content
CoffeeDujour

This is why we can't have nice things.

Recommended Posts

35 minutes ago, Penny Patton said:

Your computer has a limited amount of memory, even moreso since SL will not use all of your available memory.

Textures are stored in that memory while being displayed on your screen.  If that memory is filled, several things happen:

  • Sharp decline in framerate
  • Texture thrashing (when textures keep derezzing due to being shuffled in and out of memory)
  • Stuttering (where SL freezes up as you try to move the camera around, because it's desperately trying to move textures in and out of memory)

All of these problems are caused by the specific way SL handles texture downloading, local storage, decompression and moving to the GPU. Those points alone account for 99% of the overhead. Once the texture is on the GPU, it's basically free. Your GPU's primary job is to shuffle textures around and it's damn good at it, way better at this than geometry and lighting (this is incidentally why Nvidia have the Quadro line).

The decline in framerate is the progressive downloading, local file I/O and huge amount of decoding that gets done. It's a circular catch 22, when things are stressed, this makes it worse, goto 10.

35 minutes ago, Penny Patton said:

There are objects, from tiny attachments to larger environmental objects, that use hundreds of MB worth of textures. To the point where it's not uncommon to see avatars using a couple hundred MB to nearly a full GB of textures.

If you have the memory, and it's available to SL, there is no penalty to using it.

Inspecting objects to get their full resolution texture usage doesn't match viewer behavior when you're not looking at it. It's more like .. this object has a certain max memory footprint, but in actual use it can easily be below that.

If you do everything you can to force the viewer to load everything full rez, then you're going to have a bad time.

35 minutes ago, Penny Patton said:

 In addition, textures need to be downloaded, and you have a limited amount of space dedicated to your SL texture cache. This means excessive amounts of bandwidth are used not just to download all these textures, but to repeatedly redownload them. This results in several issues:

The cache is getting a rework by LL. It does generally work, but it has some pretty harsh pitfalls.....

35 minutes ago, Penny Patton said:
  • Excessively long rez times

....decoding textures is expensive, the cache doesn't save decodes and every texture must go though multiple progressive decodes. Under normal use (when you're not inspecting everything) it will stop at a level based on screen space, distance, etc. Typically using a fraction of the memory that the full resolution image might require.

The issue is that this is very slow. Hopefully the new cache will save decoded images. This alone will dramatically improve performance, there are a lot of other tricks that can be used to boost this even further.

35 minutes ago, Penny Patton said:
  • That problem where rigged mesh bodyparts appear floating around the space where your avatar should be for several minutes before finally snapping to your avatar

This is nothing to do with textures, more with how the mesh package is structured. The viewer gets and renders the geometry before it knows about the rigging, essentially it just assumes it's a static mesh till it's told otherwise.

35 minutes ago, Penny Patton said:

Despite what some people think, there is no way we're going to see some magical software fix that allows for unlimited texture detail. Seriously, if anyone could figure how to pull that literal miracle off and patent it, they'd be rich. Every videogame developer in the world would be licensing it off of them. If there were an existing method to do this, then videogames would employ this miracle rather than carefully managing texture use (which is what they actually do). There are technologies employed to manage or reduce the memory use of textures where possible, but these technologies are always paired with efficient use of textures, not as a replacement for it.

 

Right now we have a perfect storm. I/O heavy processes with an expensive decode combined with a render engine that's weighted heavily towards getting stuff on your screen as fast as possible. Dropping down a step in texture resolution does not happen correctly

The viewer waits till it's running out of VRAM (the bias figure in Texture console), then bins an arbitrary (typically full resolution) texture from the pool in order to make space for the new item. The viewer then re-adds the removed and forgotten texture to the decode pool and starts from scratch. The way the texture to drop gets chosen is not ideal and the result is thrashing as one part of the viewer bins it, another screams "but mooom we need that one" and puts it back, only for the it to get immediately re-binned and forgotten. Rinse and repeat.

It is done this way because the decode phase is stupid expensive. Right now, making sure each frame has only the ideal resolution textures would cripple the viewer with I/O and decodes.

An updated cache (if it saves decoded information) will have a dramatic impact on performance. Even loading a large decoded texture from a slow HDD and passing it to the GPU will be significantly faster than the decode phase.

Other possibilities include keeping lower resolution decodes in system memory so the swapping can happen instantly (effectively putting a small secondary cache of low resolution versions of active textures into a viewer managed ram drive).

Without the super expensive decode phase, stepping texture resolution down on the fly one level at a time becomes a real possibility. ... because in case i've not made it clear by this point, decoding textures is stupid expensive.

 

Till we get the new cache, you can improve texture performance by not having the texture console open (it has a non trivial impact on how the decode pipeline works) and not inspecting everything and everyone forcing items to be loaded full rez. Textures are not automatically decoded to full resolution every time under all circumstances, but inspecting or jamming your camera up super close is a sure fire way to make this happen. 

  • Thanks 2

Share this post


Link to post
Share on other sites
2 hours ago, CoffeeDujour said:

The viewer waits till it's running out of VRAM (the bias figure in Texture console), then bins an arbitrary (typically full resolution) texture from the pool in order to make space for the new item. The viewer then re-adds the removed and forgotten texture to the decode pool and starts from scratch. The way the texture to drop gets chosen is not ideal and the result is thrashing as one part of the viewer bins it, another screams "but mooom we need that one" and puts it back, only for the it to get immediately re-binned and forgotten. Rinse and repeat.

Do you actually know that, or are you guessing? Texture removal is random? Not based on texture priority or distance or something reasonable? Does it get dropped from the fast cache (the file, not RAM) too? That could be improved. If it's that dumb, it was probably coded  under the assumption it was unlikely to be used. The texture system is actually pretty good in terms of implementation; it just needs attention to policy and evaluation.

"Policy" means "what texture should be shown first at what resolution?". The code for that is spread over many modules and hard to tune. There's a texture priority system. but its priorities need work. Has it been touched since the switch to CDN asset serving? I've pointed out before that priorities are different when you're standing still, moving, or looking around. When you're looking right at something at close range, it should get a big texture priority boost and be forced in. Moving fast, don't bother getting high-rez textures for the stuff you're going past; focus on what's ahead. Looking around a lot but not moving fast ("shopping mode"), go for equal resolution in all directions.

Texture decompression uses a lot of CPU time, but not in the main thread. SL's CPU load problems are almost all main thread on a multi-CPU machine.

(Interesting thought. Right now, if you give an object a color, it shows in that color, rather than grey, before the textures load. For many objects, like buildings, trees, and roads, that's a good start for a distant object. The world would look better during loading if all objects had a default color, the average of its textures. With an alpha, too, so those big flat planes of trees were translucent green until their textures loaded. As grey solids they look awful and block the view. The effect would be that the world starts out cartoonish, rather than grey, and then becomes realistic.)

Edited by animats
  • Like 1

Share this post


Link to post
Share on other sites
3 minutes ago, animats said:

Do you actually know that, or are you guessing? Texture removal is random? Not based on texture priority or distance or something reasonable? Does it get dropped from the fast cache (the file, not RAM) too? That could be improved. If it's that dumb, it was probably coded  under the assumption it was unlikely to be used. The texture system is actually pretty good in terms of implementation; it just needs attention to policy and evaluation.

That's why texture trashing is noticeable. A texture gets dropped from the GPU and then re-added to the decode queue (as its still required by an asset in the scene). It decodes though each of the discard levels .. the final one loads (as it's a major part of the scene) and pushes past the memory limit and immediately gets canned. Round and round we go.

 

3 minutes ago, animats said:

Texture decompression uses a lot of CPU time, but not in the main thread. SL's CPU load problems are almost all main thread on a multi-CPU machine.

On a separate thread does not mean independent of the main thread. Depending on FPS a certain amount of textures are decoded each frame (and a single decode level counts as 1) .. if your struggling with single digit FPS then the number of decodes per frame is also single digits.

We did try threading the decode of individual textures, but the threading overhead made it slower. Textures in SL aren't big enough to benefit.

Share this post


Link to post
Share on other sites
3 hours ago, CoffeeDujour said:

This is nothing to do with textures, more with how the mesh package is structured. The viewer gets and renders the geometry before it knows about the rigging, essentially it just assumes it's a static mesh till it's told otherwise.

Which takes longer to get told if your bandwidth is clogged with gigs of textures. I never knew it was even a problem until I started to see people complaining about it and tested it in some texture heavy sims. If you're trying to download a whole bunch all at once, everything takes longer to download.

Share this post


Link to post
Share on other sites
8 hours ago, Penny Patton said:

Which takes longer to get told if your bandwidth is clogged with gigs of textures. I never knew it was even a problem until I started to see people complaining about it and tested it in some texture heavy sims. If you're trying to download a whole bunch all at once, everything takes longer to download.

the net code lags behind the decode pipeline, your bandwidth is not being fully used.

Share this post


Link to post
Share on other sites
10 hours ago, CoffeeDujour said:

 

On a separate thread does not mean independent of the main thread. Depending on FPS a certain amount of textures are decoded each frame (and a single decode level counts as 1) .. if your struggling with single digit FPS then the number of decodes per frame is also single digits.

Why is that throttled so severely? The viewer knows how many CPUs are available.  That seems like a legacy decision from the days when most computers had only one CPU and the textures didn't come from a CDN. Is there ever enough texture decode work to tie up a second CPU? I never see more than about 125% CPU utilization, indicating the main thread is maxed out but the download and decode threads are not.

Quote

That's why texture trashing is noticeable. A texture gets dropped from the GPU and then re-added to the decode queue (as its still required by an asset in the scene). It decodes though each of the discard levels .. the final one loads (as it's a major part of the scene) and pushes past the memory limit and immediately gets canned. Round and round we go.

Ouch. Picking a distant texture, reducing its resolution, and putting back a smaller version would be a more effective way to relieve texture memory pressure. I've seen code in Firestorm which seems to be there to replace a texture with a smaller version, and a comment to not use this too much because it's an expensive operation. But it beats thrashing a cache-like system.

Quote

We did try threading the decode of individual textures, but the threading overhead made it slower. Textures in SL aren't big enough to benefit.

Right, that's an OpenJPEG feature. When your biggest texture is 1K x 1K (yes, I know some legacy 2K textures exist) it's probably not worth it to multi-thread at that level.

Share this post


Link to post
Share on other sites
3 hours ago, animats said:

Why is that throttled so severely? The viewer knows how many CPUs are available.  That seems like a legacy decision from the days when most computers had only one CPU and the textures didn't come from a CDN. Is there ever enough texture decode work to tie up a second CPU? I never see more than about 125% CPU utilization, indicating the main thread is maxed out but the download and decode threads are not.

Ouch. Picking a distant texture, reducing its resolution, and putting back a smaller version would be a more effective way to relieve texture memory pressure. I've seen code in Firestorm which seems to be there to replace a texture with a smaller version, and a comment to not use this too much because it's an expensive operation. But it beats thrashing a cache-like system.

Right, that's an OpenJPEG feature. When your biggest texture is 1K x 1K (yes, I know some legacy 2K textures exist) it's probably not worth it to multi-thread at that level.

Programs have to be explicitely coded to benefit from multiple CPUs, it's not something that just works out of the box. Adding parallelisation to a program brings a lot of issues because all those threads will process their work at different speed and have to "somehow" lead to a cohesive result without the entire system having to wait on a given thread to finish (which would completely negate the benefits of parallelised computing). In addition you have the issue of ensuring that whatever date one thread os going to work with isn't gonna get modified by another as they are working.

There are entire books on the challenges it brings. Most softwares out there typically aren't taking advantage of multiple CPUs.

Edited by Kyrah Abattoir
  • Like 1

Share this post


Link to post
Share on other sites
21 hours ago, Love Zhaoying said:

Is there “texture abuse” like “prim torture”?

Only in the sense that prim torture is an unneccessary evil caused by thoughtless builders too. Remember to sedate the prim before you shape it and it won't feel a thing.

  • Thanks 1
  • Haha 1

Share this post


Link to post
Share on other sites
51 minutes ago, ChinRey said:

Only in the sense that prim torture is an unneccessary evil caused by thoughtless builders too. Remember to sedate the prim before you shape it and it won't feel a thing.

Compared to necessary evils, like noob humiliation, etc.

  • Like 1

Share this post


Link to post
Share on other sites
3 hours ago, Kyrah Abattoir said:

Programs have to be explicitely coded to benefit from multiple CPUs, it's not something that just works out of the box.

If the parallel jobs are not time critical I found futures and promises to be working pretty well for lazy trivial parallel jobs out of the box...

  • Like 1

Share this post


Link to post
Share on other sites
8 hours ago, Kyrah Abattoir said:

Programs have to be explicitly coded to benefit from multiple CPUs, it's not something that just works out of the box. Adding parallelisation to a program brings a lot of issues because all those threads will process their work at different speed and have to "somehow" lead to a cohesive result without the entire system having to wait on a given thread to finish (which would completely negate the benefits of parallelised computing). In addition you have the issue of ensuring that whatever date one thread os going to work with isn't gonna get modified by another as they are working.

The texture system already has several threads. Texture download and decoding happens in parallel with the main thread that's doing the drawing. The main thread puts requests for texture UUIDs, and a priority of which ones it needs most, on a list, and the texture download system runs in parallel trying to fulfill those requests. Interestingly, the main thread can change the priority of a pending request; if you leave the area before a texture arrives, there's no need to download it. The viewer's texture download system is quite elaborate and handles all the hard cases. But its policy code (what to do first?) needs work.

(I had to fix a bug in there once, so I've looked at the code in Firestorm. The self-compiled development version of Firestorm is usually broken; if you want to use it, you have to be able to fix it.)

The big scarce resource in the viewer is main thread time; when the frame rate drops, you're usually at 100% of one CPU on the main thread. But if you have more CPUs, they can be doing other things.

Geometry download and decode is done outside the main thread, too, I think, but I haven't looked closely at that code.

  • Thanks 1

Share this post


Link to post
Share on other sites
12 hours ago, animats said:

Why is that throttled so severely? The viewer knows how many CPUs are available.  That seems like a legacy decision from the days when most computers had only one CPU and the textures didn't come from a CDN. Is there ever enough texture decode work to tie up a second CPU? I never see more than about 125% CPU utilization, indicating the main thread is maxed out but the download and decode threads are not.

Imagine a highway with many lanes, one for each CPU core. There are many vehicles on the road and they are all driven by idiots (especially all those Chrome dump trucks). Some of the cars flying along are texture decode processes and in each one sits a hamster making a sandwich. There is a bendy-bus on the highway, that is the main thread. At some point, the hamsters in the cars have to pass their sandwiches to the old goat that runs the bar at the back, whom for reasons best left out of this tale of woe is a pedantic jerk. He will accept one sandwich at a time and only wants the exact sandwiches ordered by his currently seated guests (penguins, probably), the number of seats varies, everyone has to sit and place and order and no one can start eating till everyone has their sandwich and grace has been said. It's often lamented that hamsters are terrible drivers, get caught up in all kinds of traffic, don't arrive when they are expected in an orderly fashion, sometimes run each other off the road, crash, or screw up the order and hand in half chewed ball of bread covered in mayo.

* Passing sandwiches from moving cars to a bus, in traffic, is fiendishly complicated. Hamsters only have short arms. Larger critters with longer arms are slower to get going and tend to get themselves wrapped around the buses wheels.

** Fitting the Bus with a hopper into which sandwiches can by tossed fails because hamsters can't throw very well and the old goat has better things to do than continually check the hopper to see whats appeared in it. LIkewise attempts to give everything over to an ever increasing fleet of hamsters tends to only result in squabbles over condiments.

*** Sandwich ingredients are procured by a separate fleet of cars driven by rabbits pulling off the highway, buying CDN brand Happy Meal and then throwing out the pickles.

**** A Kitty did experiment having multiple hamsters in multiple cars making the same sandwich and while they performed admirably, dinner was always late.

***** The Kitty suggested an assembly line would make a better analogy, but I felt a tale of hamsters in a sweatshop being bossed about by a possibly fictitious cat to be a little dark.

If this tale has you more baffled than ever, that's intentional. I hope the confusion you're now feeling adequately communicates the complexities of multi threaded coding.

 

Edited by CoffeeDujour
  • Thanks 1
  • Haha 3

Share this post


Link to post
Share on other sites
43 minutes ago, CoffeeDujour said:

Imagine a highway with many lanes, one for each CPU core. There are many vehicles on the road and they are all driven by idiots (especially all those Chrome dump trucks). Some of the cars flying along are texture decode processes and in each one sits a hamster making a sandwich. There is a bendy-bus on the highway, that is the main thread. At some point, the hamsters in the cars have to pass their sandwiches to the old goat that runs the bar at the back, whom for reasons best left out of this tale of woe is a pedantic jerk. He will accept one sandwich at a time and only wants the exact sandwiches ordered by his currently seated guests (penguins, probably), the number of seats varies, everyone has to sit and place and order and no one can start eating till everyone has their sandwich and grace has been said. It's often lamented that hamsters are terrible drivers, get caught up in all kinds of traffic, don't arrive when they are expected in an orderly fashion, sometimes run each other off the road, crash, or screw up the order and hand in half chewed ball of bread covered in mayo.

* Passing sandwiches from moving cars to a bus, in traffic, is fiendishly complicated. Hamsters only have short arms. Larger critters with longer arms are slower to get going and tend to get themselves wrapped around the buses wheels.

** Fitting the Bus with a hopper into which sandwiches can by tossed fails because hamsters can't throw very well and the old goat has better things to do than continually check the hopper to see whats appeared in it. LIkewise attempts to give everything over to an ever increasing fleet of hamsters tends to only result in squabbles over condiments.

*** Sandwich ingredients are procured by a separate fleet of cars driven by rabbits pulling off the highway, buying CDN brand Happy Meal and then throwing out the pickles.

**** A Kitty did experiment having multiple hamsters in multiple cars making the same sandwich and while they performed admirably, dinner was always late.

***** The Kitty suggested an assembly line would make a better analogy, but I felt a tale of hamsters in a sweatshop being bossed about by a possibly fictitious cat to be a little dark.

If this tale has you more baffled than ever, that's intentional. I hope the confusion you're now feeling adequately communicates the complexities of multi threaded coding.

 

You've just wonderfully explained traffic in LA

Share this post


Link to post
Share on other sites
4 hours ago, CoffeeDujour said:

It's often lamented that hamsters are terrible drivers,

They don't make good sandwiches either - always too much lettuce.

  • Haha 1

Share this post


Link to post
Share on other sites

Thats all very interesting talk of the cache, and its potential evolution. 😍 Larger textures means excellent texture atlases, and increased visual quality, if used well.  Enhancing the LOD system would certainly make turning up draw distance more appealing and create a far more immersive POV.  As I mentioned elsewhere it'd be nice for us to have occlusion planes and zones to build with too ;) People will very quickly notice the improvements if the Cache and LOD systems come off well, it would be like a whole new world just as exciting as any of SL's greatest achievements, exploring the mainland would be a lot more fun and inviting too.

Edit:  I wanted to add something ive thought about quite a few times.  Wouldn't it be nice if the viewer had some inbuilt textures? Or even procedural textures that could be mashed up and customized by the end user in their creations?  This could relieve a lot of damn near exact textures from being overused, also there need be no server calls to get them.  And, if there will be texture penalties coming, perhaps the least of which would apply to these.

 

Edited by Macrocosm Draegonne

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×