Jump to content

Odd network problem (pathfinding???)


Vulpinus
 Share

You are about to reply to a thread that has been inactive for 2794 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts


Vulpinus wrote:

Even trying to quit the viewer takes a lot longer than normal.

have noticed this also recently. Longer time to quit than before

and also I am now never connecting on the 1st attempt when logging on. Takes 2 attempts every time now, even when logging to different sims

and about 3 out of 4 times I never get any system avatar layer updates when I change them immediately after login. Have to relog and then the layer updates happen

I am on the latest LL viewer. It started happening on the previous version tho

i probably should take notice of which sim server version I log into. But bc I can work round it myself I never

Link to comment
Share on other sites


wherorangi wrote:


and about 3 out of 4 times I never get any system avatar layer updates when I change them immediately after login. Have to relog and then the layer updates happen

I am on the latest LL viewer. It started happening on the previous version tho

 

That sounds like the bakefail bug.

BUG-10391 - [Project Azumarill] Avatar often bakes fails on Azumarill

BUG-11929 - Appearance update is STILL broken after recent changes in 4.0.4 (314579)

 

That bug finally seems to be fixed on the Maintenance-RC: http://wiki.secondlife.com/wiki/Release_Notes/Second_Life_Release/4.0.6.316883

 

 

Link to comment
Share on other sites

\o/

Whirly Fizzle is legend (:

i went to the SLB13 talk with Oz Linden and Landon Linden. Landon Linden is there to keep Oz Linden focussed on whats important to us

Whirly Fizzle is there to keep Landon Linden focussed on whats even more important to us

i dunno what we and me would ever do without you Whirly. Good on you (:

Link to comment
Share on other sites

Whirly, I've just got a good log a capture of a definite saturation event. I'll PM you the link in a moment. The packet capture was similar to what I mentioned previously.

...

Just to be clear, FS taking a long time to quit for me was because of the ongoing saturation of my internet connection. I haven't (yet, :D ) suffered generally from the problems mentioned above by wheroangi.

 

Link to comment
Share on other sites

i am ADSL and NZ. at end of the world. Only penguins in Antartica have slower internets than me :D (:

i am still waiting for fibre to come down my street. Was supposed to be this year but they say is next year now. I has a sad about that. but oh! well. I just wait and do what I can in the meantime

Link to comment
Share on other sites

  • Lindens


At long last it has happened! The users are mad that assets are downloading too fast. I'm calling my mother, tell her her son did good, and retiring. :-)

@Vulpinus: Okay, more serious comments:

The UDP and HTTP paths in the viewer are almost entirely separated at this point. The Statistics panel is really only reporting on UDP (scene updates, certain assets, simulation generally). HTTP (most textures, meshes, inventory) don't show up here. So you're only seeing a part of the story.

What is interesting in the first statistics panel:

  • 1.3% packet loss - not fatal but it does mean retry is in effect and a source of rubberbanding.
  • 651mS sim ping - this ping isn't reliable but taken at face value this indicates a simulator under load.
  • 600MB of unacked data on the simulator. I've never seen a number that large before. I can't even imagine how the simulator got here or what it might be doing to it.

The Texture Console can show you current texture and mesh activity. Textures get most of the console area but there's a mesh line as well with request activity. Watching this console will give you an idea of what is going on.

That concurrent sessions were affected by cannon activity suggests scene changes followed by mesh and texture operations.

This bothers me:

llmath/llvolume.cpp(2417) : 2016-06-23T06:57:00Z DEBUG:#MeshStreaming LLVolume::unpackVolumeFaces: Failed to unzip LLSD blob for LoD, will probably fetch from sim again.llcommon/llsdserialize.cpp(2168) : 2016-06-23T06:57:00Z DEBUG: unzip_llsd: Unzip error: -3


Its warning seems to guarantee pointless request activity. It also suggests a corrupt mesh asset or corruption in transport. Hope they just went away...

HTTP Pipelining. Connection re-use is now standard for most of our HTTP traffic. Pipelining adds serial concurrency by placing multiple requests on the same connection without waiting for previous responses.

Wireshark. GVSP should be a red herring. There is a wireshark dissector for UDP if you want to trust the source. But disabling GVSP in the protocol set may get it to quiet down.

Mesh and HTTP debug settings. Not all of these are dynamic. You may need to restart the viewer before they take effect.

Things to try:

  • Bring the Statistics and Texture Consoles into play and identify the offending traffic. Kind of assuming Meshes but we'll see. Interested in mesh and texture request activity including Cread/Cwrite rates as well as Unacked data, UDP packet rate, packet loss rate from the Statistics console.
  • Review viewer cache settings. If too low, viewer will have to go to servers for assets.
  • Set bandwidth limit down a bit (below 1Mbps). A little less drive may result in better completion.
  • Traffic competition 1. Not certain this is the problem but let's play as if it were. Bandwidth control, like the Statistics console, doesn't interact with HTTP. The limiting controls for HTTP are currently indirect and limited to HTTP pipelining and mesh and texture concurrency settings. Using your network monitor, find settings that cap rate at 75-80% of your 18Mbps download rate. Observe effect on responsiveness.
  • Traffic competition 2. Cisco 1841 is a bit old (from the 90s, yes?). So I'm guessing it doesn't have significant QoS/Traffic Shaping controls. If it does or you have other shaping tools available, I'd look at limiting traffic from the CDN to 80% of ADSL limit (maybe with some borrow to 100%). Downstream shaping isn't always perfect (too late to shape) but it's a tool.

 

Link to comment
Share on other sites

Hey Monty, thank you for joining in the fun :) and for the information.

Yeah - too much data - Seriously, if LL's kit can send it that fast, it has made me look at upgrading to 80Mbps down, 20Mbps up fibre. I'm currently exploring possibilities on that front re suitable hardware. My 1841 is old (about 9 years - not 90's :D ), but it has IOSv15 advanced IP services so it does *everything*. It's just too slow if I upgrade to fibre. It tops out at about 70,000pps.

The saturation events really seem quite random. I can sometimes provoke them by camming a little away, or they sometimes seem linked to an avatar/vehicle arriving in the area. At others, there is nothing going on at all; just me floating along in a well-visited, quiet area. That log I gave Whirly was such an example. There doesn't seem a definite, reproducable cause on the face of it.

I think the stats panel might be misleading during these saturation events. Maybe. I can well imagine that the UDP packet loss is entirely due to the saturation with tcp traffic. Also, the sim ping and unacked data both go to those sort of figures every time it happens, but are normal either side of the event.

I'll try watching the texture console next time I'm there. I hadn't noticed it had a mesh line. From the wireshark logs, practically all the requests were for mesh and hardly anything for texture.

The llmath/llvolume.cpp errors (100s of them) were just in that one log and haven't occured since. Just a localised, coincidental event that one.

Pipelining is fun - I was tracing the conversations in the wireshark capture.

Right then - Things to Try:

I'll keep an eye on the texture console and stats and grab a few screen shots during and after events.

My cache is already at maximum (10GBish)

I had tried changing the mesh concurrency to just 1, and I still got the saturation. Quite impressive really that I'm getting that much data :) I'm pretty sure I restarted the viewer, but I will try again to be sure and see what can be done there.

My 1841 does all of that. I used to have it configured to try to limit incoming nntp data to my server if other stuff was active too. It never really worked... like you say, downstream shaping is a bit late. I'll give it a try though, after I've played a bit with the above stuff.

Link to comment
Share on other sites

  • Lindens


Vulpinus wrote:

Right then - Things to Try:

I'll keep an eye on the texture console and stats and grab a few screen shots during and after events.

My cache is already at maximum (10GBish)

I had tried changing the mesh concurrency to just 1, and I still got the saturation. Quite impressive really that I'm getting that much data
:)
I'm pretty sure I restarted the viewer, but I will try again to be sure and see what can be done there.

My 1841 does all of that. I used to have it configured to try to limit incoming nntp data to my server if other stuff was active too. It never really worked... like you say, downstream shaping is a bit late. I'll give it a try though, after I've played a bit with the above stuff.

And as a stand-alone test, run with a lower (UDP) bandwidth limit.

The unack'd byte count is bothering me.  If we take the example of 15MB unack'd data, that's about 100S of network time at 1.5Mbps to sink that data (assuming it's all yours, half if alt session is going to get a share).  A natural inclination would be to turn the knob to '11' but I'd like to go the other way...

 

Link to comment
Share on other sites

Funny you should mention the Unacked data...

I've just been sailing around Fanci's for thirty minutes. I had UDP BW set to 1Mbps and Mesh and Texture concurrency set to just 1.

I experienced a few, very brief, saturation spikes but mostly my download rate stayed below a few Mbps. I captured one spike that lasted a few seconds and wireshark again showed a number of mesh gets and a ton of tcp segments.

However... I then moored up and stayed that way for a few minutes. I noticed the Unacked data, which mostly seemed to be around a few 100KB or less, was jumping up and down continually. It did this for a least fifteen seconds and reached into MBs - I'm pretty sure I saw 12MB and 17MB pop up momentarily. Then it settled down.

... Strike the above - it's just started jumping again and gone up to 60MB. About five minutes later and still moored up, typing this.

I have no idea what caused that - I had been sailing around that area for ages and didn't see anything happen to trigger it. Throughout that, my internet data was a trickle - less than 100Kbps.

...

Other that the above spikes and weirdness, stats were 0.0% packet loss, sim ping around 170ms. As I'm sat here moored, typing this, the packet loss green bar is slowly creeping up... just at the right hand end of the scale now but still reading 0.0.

Screenshot of texture console while moored:



To be continued... :D

 ETA: I have only had my alt logged on that once that I mentioned, so everything else is all my own fault.

 

 

Link to comment
Share on other sites

No, my traffic shaping rules were disabled ages ago. I didn't see the point of keeping it on, wasting CPU time, when it was minimally effective. It was specific to my server's IP address anyway.

I was checking my 1841's config a few days ago and thought to just check that after you mentioned trying it; I was right.

I even tried putting my SL PC directly on my bridged public subnet for a while to mostly bypass the router... it made no difference.

I've not done any SL investigations for a day or two but I'll be back by the weekend. I'm busy investigating router options and trialing RouterOS on ESXi to duplicate what my Cisco does for when (if) I go fibre. I can't justify the cost of a new Cisco box (1941 or similar) at the moment - or rather I can't afford their new G2 licensing on the security version of it!

Link to comment
Share on other sites

Right then... I've just orderd 80/20 fibre and arranged some faster routing. Should be live within a couple of weeks so, unless anyone sees anything obviously wrong in the logs and wants me to check, I'll leave off the testing for now.

 

80Mbps... fill that SL! :D

Link to comment
Share on other sites

jejejje (:

i want what Vulpinus has got

you can keep your journeys Mr Monty ok (:

+

ps

altho I did get a good journey off you about 18 months ago now. And a gold cup even

Apparently Mr Phil Deakins is attempting to have another go at the same journey. And he is not having much luck at the moment. So maybe he will come by soon and tell you all about it (: 

Link to comment
Share on other sites

  • Lindens

Vulpinus wrote:

It still could be... how fast are the CDN servers?

We'll soon find out.
:D

 Positioned 'close' to the asset servers and with the viewer code out of the equation, GigE is possible without much of a sweat.  But you have to belieeeeeeeeeeeeeve!

Link to comment
Share on other sites

  • 4 weeks later...

I thought I should put this post to rest, now I have my new 80/20 fibre. Not as fast as some have, but it's pretty good! It's FTTC and I actually get about 75/18. That might get better as things settle.

So far, I have not had the issue this thread has been about since upgrading. Yay :smileyvery-happy:

I have seen some very quick data coming from the content server connected to my ISP's network; easily sustaining 40Mb/s to 50Mb/s while a new place loads up, with occasional peaks at close to maximum. It really does make a huge difference when TPing somewhere busy. I once started another thread asking if extra internet bandwidth would be a benefit and never really got an answer (it's too 'personal' really - suck it and see is the only way). I have the answer now.

There is another thing I have discovered that might have been partially responsible for my previous issues. Bufferbloat.

My new kit seems to do very well, but when saturated my ping (not sim ping) rises from 16 ms to 100 ms average at times. That's actually not bad as bufferbloat goes, but I'm still trying some queuing tricks on my router to see if I can lower it. I'm not sure how much is caused by my end though, and how much at my ISP's end.

My guess is that my old kit, when the line was saturated up and down, was suffering much worse with bufferbloat too which would go towards explaining my problems.

So... problem solved and the local CDN server is impressive (thank you Monty :) )

...

Just for anyone curious, my new kit comprises:

  • Zyxel VMG8324 modem/router, in modem-only/bridged mode. The chipset suits the FTTC cabinet I'm on.
  • That feeds a Mikrotik RB850Gx2 router which splits out onto my various VLANs here and does firewalling etc. It's a very nice bit of kit for the price!
  • Oh, and a new TP-Link T1600G-52TS Layer 2+ switch which replaced my perfectly functional but too noisy and hot Cisco 3560G. The TP-Link is fanless - a wonder (and I think unique) for a 52-port gig switch with basic L3 IPv4 and IPv6 routing!
Link to comment
Share on other sites

  • Lindens


Vulpinus wrote:

There is another thing I have discovered that might have been partially responsible for my previous issues.
Bufferbloat
.

My new kit seems to do very well, but when saturated my ping (not sim ping) rises from 16 ms to 100 ms average at times. That's actually not bad as bufferbloat goes, but I'm still trying some queuing tricks on my router to see if I can lower it. I'm not sure how much is caused by my end though, and how much at my ISP's end.


Sorry I couldn't convince you to be a lab rat.  But you have to pick your projects...

The 16 vs 100mS ping time might be something interesting to look at.  Get a baseline traceroute (with perhaps more than the usual 3 probes per hop) at 16mS.  When it gets bad, do another and compare to see where the deltas appear.  It may not be at your ISP and I'd love to see some of that data here.

Bufferbloat is a real thing but one that we unintentionally mitigated.  For various reasons, both textures and meshes are fetched in pieces using 'Range:' headers.  This means relatively little data comes back in each request reducing opportunities to buffer.  The balance was then shifted a bit towards bloat by first upping the connection concurrency and then introducing pipelining (and dropping concurrency), but the 'Range:' stuff persists.

Where it really comes into play (I think, this is all conjecture) is throughout the network.  Your non-bloaty traffic is in competition with buffered traffic from others and every hop has to make the same decision on every packet:  queue, buffer, or drop.  When the pipe to your door is narrow, drops happen at its inlet.  As you improve that, they start happening elsewhere:  undersea cable sites, backbone-isp gateways, other places where there's an impedance mismatch.

Glad it's better and hope you do some tweaking. 

 

Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 2794 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...