Jump to content

Why do viewers hve (such low) limits on bandwidth and cache size?


Jennifer Boyle
 Share

You are about to reply to a thread that has been inactive for 970 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

23 minutes ago, Jennifer Boyle said:

My main question is completely stated in the title. I think, but am not sure, that the limits have not changed in many years, during which time my connection has evolved from 3 Mbps DSL to 1,000 Mbps fiber, and storage has evolved from conventional hard drives to SSDs.

The bandwidth slider in the viewer only applies to certain types of data & is broadly misunderstood - really it's the max speed the servers can send UDP data to you client. If it's too high, you get packet loss and have a bad time. Really it should be removed entirely.

The speed at which objects and textures are fetched from the CDN is determined by how much money LL are paying the CDN (it can certainly go a lot faster than it does for us).

The cache has problems when it's over a certain size and can actually perform slower than downloading the desired asset from scratch, we are supposed to be getting an improved cache, LL did some initial work and then the project stalled .. they don't appear to be assigning it any ongoing developer time.

Edited by Coffee Pancake
  • Like 1
Link to comment
Share on other sites

I had posted about this a while back, musing that at the time SL was created a 10 or 12mbs internet connection was considered blazing fast. I really have no idea and there seems to be multiple explanations . I have a 300mbs up/down connection - I set mine in debug settings at 280000kbs. been like that over a year, no issues. And btw monitoring the task manager when first going into a new sim, I rarely ever see a receive of more than like 85 mbs , and that is a brief spike. I think it is just some old code needed back in the day that really has no use anymore, but it just keeps getting carried on

  • Haha 1
Link to comment
Share on other sites

1 hour ago, Coffee Pancake said:

The bandwidth slider in the viewer only applies to certain types of data & is broadly misunderstood - really it's the max speed the servers can send UDP data to you client. If it's too high, you get packet loss and have a bad time. Really it should be removed entirely.

The speed at which objects and textures are fetched from the CDN is determined by how much money LL are paying the CDN (it can certainly go a lot faster than it does for us).

The cache has problems when it's over a certain size and can actually perform slower than downloading the desired asset from scratch, we are supposed to be getting an improved cache, LL did some initial work and then the project stalled .. they don't appear to be assigning it any ongoing developer time.

You are too kind.  That bandwidth slider has devolved from being useful to being blatantly deceptive as the bulk of the traffic it once managed by sending a message to the service has taken higher roads, from service via TCP and from CDN via TCP.

A few people may still get some use from it where TCP is being messed with by ...  "entities", and service support for sending assets via UDP has been re-enabled, with some rather grim limits.

Link to comment
Share on other sites

The bandwidth slider, as Coffee states, is a bit of a red herring these days. Only UDP traffic is affected and I think the goal is to eliminate all UDP.

At a higher level, SL performance is the product of server, connection and viewer power. Imagine this from LL's perspective. As server compute power and connection bandwidth increase, you can allocate those improvement between two extremes, presuming a constant user base. (Growing the business is a related but different calculus.)

1) Throw all the extra performance at each resident's/region's experience.

2) Increase the number of residents and regions being supported by any quantity of improved performance, leaving each resident and region experiencing constant performance while you scale back your infrastructure expenditures.

The optimum solution for LL is somewhere between the extremes and is dependent on the perceived value of SL to its residents, as reflected in their expenditures for any level of performance.

Put simply, it's complicated, it depends, and it's probably not what you think.

 

  • Like 1
Link to comment
Share on other sites

As @Coffee Pancakepoints out the bandwidth slider has a very differnt purpose to what it had in the past. The slider is a throttle to protect not only the viewer but also the server and was a specific control back in the day when all assets were streamed from the server directly. It applies only to UDP traffic, today this is traffic such as the messaging from the server to viewer informing it of updates to positions and transferring your avatar "context" between regions when you TP. 

UDP is an unreliable protocol, this means that data packets can not arrive and will not be recovered automatically. Managing the bandwidth minimised the packet loss, by reducing the contention between packets and other traffic. 

Today, the majority of our assets are pulled from the CDN over HTTP, HTTP is a reliable protocol, meaning that packets lost are automatically recovered. The fact that they come from the CDN means that they are not using the server bandwidth, but they are using yours. That bandwidth is not constrained by the bandwidth slider. 

So what does it do today? Today it still manages the size of the "pipe" between the server and the viewer. most of the time this is very low bandwidth and setting it up to be high is (for the most part pointless. There are potentially some edge cases to this however, The UDP protocol is used to "squirt" all your avatar info between regions (and to some extent your viewer) when you cross borders/TP. The size of this "squirt" can be quite large and there is an entirely unproven theory that increasing the size of the pipe reduces the risk of disconnects during TP/sim crossing. 

With this in mind I asked (around 12 months ago) for the lab to review the current settings with a view to updating the defaults in Firestorm. At the time SL was mid-transition to AWS and it was agreed that making any changes to Firestorm  and/or the grid was not a good idea. I raised it again earlier this year and got no conclusive response back, it is not considered a worthwhile task for the server team at present. and I have no grounds to disagree as this is, after all fuelled, by user feedback and speculation rather than testable evidence. 

Why wouldn't FS just upgrade their default? We could, but I am worried that if we did so we would have a ripple effect that would cause problems. It is all very well individuals saying "oh but I made mine 89432849328904MB and it was fiiine", that does not mean that everyone following suit would see the same. If we were to make it larger for every FS user then that potentially has an impact on the server side. Consider this scenario (the numbers are made up for illustrative purposes, and it is putting "real values" to those numbers that I feel needs to happen.

Thought exercise: At the present time all viewers in a busy region are using a total of 2% of a server's bandwidth, if there are 20 regions on a shared machine or sharing a network device then combined traffic is now peaking at 40% on that device. If we were to double or triple the individual viewer bandwidth we are now at 80% or 120% potentially denying service, causing widespread packet loss, etc.. 

Is this realistic? If not what would the implication of this be? Nobody knows, and as such I am rather wary of just slapping in a change to the viewer and hoping it has no bad side-effects.  As Firestorm carries the vast majority of users we need to consider the cumulative effect of things. A more likely scenario is a related issue where the short term bursts of traffic caused by sim crossings push over a limit and the increase that benefits a handful right now, ends up overall worse when we all have it. 

Having said all that, there is every likelihood that the viewer setting has little to no actual effect on the server side, that the server will simply ignore it making the changes on the viewer meaningless. However, without proper engagement from the server team at LL we can only guess. Overall I am with @Coffee Pancake on this, the setting should simply be removed if it has no impact, but I don't believe we have all the information we need to answer that. Perhaps @Simon Lindencould discuss this at the server meeting tomorrow. It happens at a bad time for me RL-wise but I will try to attend.

Edited by Beq Janus
minor typos and grammos
  • Like 1
  • Thanks 2
Link to comment
Share on other sites

When asking questions about Second Life viewer cache, perhaps one should consider that there are multiple cache structures.  I agree that the texture asset cache seems to get unwieldy at larger sizes.  This is possibly due to the structure used where the initial 600 bytes of each texture is stored as a record in a large file and the remainder of the texture is stored alone in a file.  I once experimented with a texture cache method that eliminated the 600 byte record fragment database and simply stored the entire texture alone in a file by leveraging the code that was used long ago when LL packaged some region and avatar assets with their viewer.  I found it to be much faster and it didn't be become unwieldy even with hundreds of thousands of textures in the filesystem, until I had to do filesystem maintenance.  Then it sucked.  My texture cache grew to over 100 gigabytes on a Samsung EVO 850 SSD.  A faster drive would have helped some but maybe not a lot.  This cache system did not have a mechanism to remove items that haven't been used for a while.  I didn't finish the implementation when I found Linden Lab had abruptly removed some of the code I was taking advantage of.

Each sound asset is just dumped alone in a file.

Region objects are cached in a file per region, in a systematic structure with metadata so the viewer can tell if the in-world object is newer than the cached object.  For a long time, starting approximately when the viewer was open-sourced, the region object cache wasn't being read at all due to a typo or a merge error that affected the names of the files containing the region object cache.

There are more caches in use.  I have not investigated them as they have not annoyed me sufficiently.

Edited by Ardy Lay
Link to comment
Share on other sites

3 hours ago, Ardy Lay said:

This is possibly due to the structure used where the initial 600 bytes of each texture is stored as a record in a large file and the remainder of the texture is stored alone in a file. 

Is that being removed as part of the "file cache" update?

Link to comment
Share on other sites

10 hours ago, Beq Janus said:

The slider is a throttle to protect not only the viewer but also the server and was a specific control back in the day when all assets were streamed from the server directly. It applies only to UDP traffic, today this is traffic such as the messaging from the server to viewer informing it of updates to positions and transferring your avatar "context" between regions when you TP. 

 

10 hours ago, Beq Janus said:

So what does it do today? Today it still manages the size of the "pipe" between the server and the viewer. most of the time this is very low bandwidth and setting it up to be high is (for the most part pointless. There are potentially some edge cases to this however, The UDP protocol is used to "squirt" all your avatar info between regions (and to some extent your viewer) when you cross borders/TP. The size of this "squirt" can be quite large and there is an entirely unproven theory that increasing the size of the pipe reduces the risk of disconnects during TP/sim crossing.

Yes. There's a related problem and a related mechanism. The LL viewer and ones using the same code are mostly single-threaded.  (The code dates from the era when personal computers only had one CPU core.) The same thread that's refreshing the screen is handling the incoming data from the network. This, by the way, is why "ping time" goes up when FPS is low. Timing is complete when the refresh loop comes around and reads the network, not when the packet actually arrives.

So, too much incoming UDP data during one frame time would reduce FPS. That's the real reason for the throttle.

If you have lots of bandwidth, no throttle, and a viewer choking on drawing the scene, the arriving packets could use up too much of the frame time and lower the FPS even more.

So there's a backup system - the packet discarder in the receiver. If too many UDP packets come in during one frame time, some packets are discarded. There's a quota on how much UDP data gets processed per frame. This is why "packet loss" goes up when FPS is low, even if the network itself is not losing any packets.

As Beq points out, UDP packets are not, of themselves, reliable. The essential lost messages do get resent after a second or so. SL networking appears to use a fixed retransmit timer. So dropping messages does not cause the sender to slow down. That's not good.

In practice, this isn't a major problem most of the time, now that assets come via the asset system over TCP. That part works exactly like fetching images from a web server.  But when a viewer connects to a new region and has no info about that region in the cache, it asks the server to tell it everything about that region. So the server floods the viewer with too much information. Which can result in packet drops and retransmissions. Delaying certain messages more than one second will cause a region crossing to fail every time, for example.

All this interaction is delicate.The packet-discard system relationship with FPS makes the networking rather brittle. So it's hard to predict what will happen if you change something. Working on this would require instrumentation and testing. I can understand LL's reluctance to mess with this.

Link to comment
Share on other sites

5 hours ago, animats said:

So, too much incoming UDP data during one frame time would reduce FPS. That's the real reason for the throttle.

No, not really... UDP messages processing is not at all taxing on the frame rate (when compared to the render pipeline load, it is negligible).

The reason was most likely because, back in 2003 when SL was born, the Internet connectivity of the servers was much less beefy than what it is today. Back in that time, leased lines costed a small fortune, and a single sim server was likely (but it would be interesting to get this inference of mine confirmed by a knowledgeable Linden) very limited in available bandwidth to serve all users in sim (a few Mbps per sim server, at most), so you could not have those users sucking up UDP messages at more than 500Kbps or so (which was not even a big deal, since back in that time, ADSL down-links were limited to 512Kbps at best).

Edited by Henri Beauchamp
Link to comment
Share on other sites

4 hours ago, Henri Beauchamp said:

No, not really... UDP messages processing is not at all taxing on the frame rate (when compared to the render pipeline load, it is negligible).

It's not much, but the throttle does kick in. If you see a packet loss percentage, and pinging the SL servers shows no packet loss, that's probably the throttling system. Whether it should be throttling is a separate issue.

The average UDP rate is low, but the peak rate is much higher. When you enter a new un-cached region, there's a huge blast of full object update and compressed object update messages. Processing them is a fair amount of work. For example, all the geometry for prims is generated on packet reception.

Edited by animats
Link to comment
Share on other sites

14 hours ago, animats said:

When you enter a new un-cached region, there's a huge blast of full object update and compressed object update messages. Processing them is a fair amount of work. For example, all the geometry for prims is generated on packet reception.

The processing of the contents of the messages (i.e. updating objects data, or even decoding textures back when they were sent via UDP) got nothing to do with the messaging protocol itself that would have to be performed whatever the method (UDP or TCP). And whatever the protocol you use, you would see the same amount of time taken to process the messages contents.

However the load for the UDP protocol itself, as it is implemented in SL, is negligible and certainly not the reason for the throttling.

Link to comment
Share on other sites

8 hours ago, Henri Beauchamp said:

The processing of the contents of the messages (i.e. updating objects data, or even decoding textures back when they were sent via UDP) got nothing to do with the messaging protocol itself that would have to be performed whatever the method (UDP or TCP). And whatever the protocol you use, you would see the same amount of time taken to process the messages contents.

The problem is peak processing per frame, not total processing time. Throttling spreads out the load over multiple frame times.

(This is the difference between real time programming and non-real-time programming. It's peaks that matter, not averages.)

  • Like 1
Link to comment
Share on other sites

5 hours ago, animats said:

The problem is peak processing per frame, not total processing time. Throttling spreads out the load over multiple frame times.

You do not seem to understand (or I don't express myself clearly enough)... Whatever the messaging protocol, this processing time (which is not dependent on the protocol implementation itself) will be the same. And there is no throttling for the HTTP textures fetcher, for example (and there, there's an awful lot of processing time involved on rezzing the textures, way more than on UDP objects data decoding), or for the mesh repository fetcher (HTTP too).

For example, I can easily reach 150Mbps (with less than 1Mbps of UDP bandwidth: all the rest is HTTP) when rezzing a scene after a TP into an non-cached sim with the Cool VL Viewer: yes, the frame rate drops during rezzing, but it lasts only for a few seconds seeing how fast everything is decoded and rezzed.

Again, the UDP bandwidth throttling has nothing to do with ”spreading out the load on multiple frames”.

Edited by Henri Beauchamp
  • Like 1
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 970 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...