Jump to content
Sign in to follow this  
Carol Darkthief

"Ping Sim" in viewer statistics window

Recommended Posts

What does the "ping sim" number in the viewer's statistics window actually measure?

There's a network utility called "ping", which uses special packets. Quoting Wikipedia, "Ping operates by sending Internet Control Message Protocol (ICMP) echo request packets to the target host and waiting for an ICMP response. In the process it measures the time from transmission to reception (round-trip time) and records any packet loss."

If the viewer is doing this, I'm not sure it would be reliable, and I don't see much point in sending an extra stream of packets to the server. I don't think an ICMP message would even be seen by the sim server.

The other obvious way of doing this is to time the reply from the sim server to a request for data. This doesn't need any extra packet, and can directly measure the server reponse. This could be affected by the change in the delivery of data, first with dedicated servers supplying textures, and now with CDN. I once saw Ping Sim increase to over 7000ms, with no reported packet loss, when on a Snack RC sim, which runs CDN

But I could be missing the packet loss. The Packet Loss indicator in the Stats Window may be failing to respond to sudden, short-lived, packet loss. I see the 0% occasionally flicker, but it doesn't hold long enough to read.

It needs somebody who has read and can understand the code to answer this. I have a feeling that there is something lurking in how the numbers are displayed which has probably come from the Linden Code, making what we see unreliable. Do I have rock solid 0% packet loss? I suspect not. Was that 7000ms ping sim a display glitch?

With the viewer labelling the number "ping sim" I am inclined to doubt that the measurement is done with ICMP, but what is it measuring? The SL Wiki is rather vague on this,

And if all you can do is point me at that Wikipedia article on Ping (networking utility) why do you think that answers my question? (A clue: I have just quoted that page and given you the link.)

The "ping sim" is giving me about 30ms higher RTT than using a ping, from the command line, to the same domain name, and the ping times to Phoenix have been similar for a really long time. (Well, Flying Buffalo is in Scottsdale, which is next door.) Either that extra 30ms is something wrong, or it measures something different.

 

Share this post


Link to post
Share on other sites

And just what is "Ping User", because that's the only time that phrase is used on the page. Mostly, it seems to be something totally different on Facebook when you try to find it elsewhere on the net, an "are you there" of social networking apps.

I have found a reference to it in a book about SL published in 2007, which implies it's another number displayed in the Stats Window. And it ain't there any more. And, I'll be honest, I can't see how you can measure just the one-way time instead of the RTT. It makes me wonder, a little, if you know what you're talking about. But let me know if you can find "Ping User". I might have missed it.

Share this post


Link to post
Share on other sites

A Linden stated at the last TPV viewer meeting (which is up on youtube) that once CDN is gridwide ping sim could become largely irrelevant, so might not be worth spending too much time to try and understand it. If you click help > about on your viewer menu while you`re in world you`ll see your current packet loss there.

Share this post


Link to post
Share on other sites


Carol Darkthief wrote:

And just what is "Ping User", because that's the only time that phrase is used on the page. Mostly, it seems to be something totally different on Facebook when you try to find it elsewhere on the net, an "are you there" of social networking apps.

I have found a reference to it in a book about SL published in 2007, which implies it's another number displayed in the Stats Window. And it ain't there any more. And, I'll be honest, I can't see how you can measure just the one-way time instead of the RTT. It makes me wonder, a little, if you know what you're talking about. But let me know if you can find "Ping User". I might have missed it.

They were separate statistics as of late 2005:

http://forums-archive.secondlife.com/111/5e/76871/1.html

Note there isn't a real explanation of the difference. However, the viewer and server are in continuous communication to update your avatar's position in-world - messages are sent and acknowledged constantly. There's no need to send a separate "ping". It would be possible by using timestamps to determine when each update is requested by the viewer, received by the server, and then sent back to the viewer. This would allow both legs of the round trip to be timed separately. You can turn network messages on in the debug console to see this; it's normally turned off because it's extremely spammy.

Share this post


Link to post
Share on other sites

You're more-or-less summarising what I suspect about in-viewer Ping Sim.

But does anyone know? And why is the LL documentation on this so useless?

I'm trying to get my brain around RFC 1323 which defines TCP timestamps. It's not defined in a way which requires the clocks at each end to be in-sync or ticking at the same rate, so I am not sure how anyone thought they could give a seperate Ping Sim and Ping User instead of total RTT.

There are hints that it could get horribly complicated, with various possible methods of averaging RTT for controlling congestion. Speculating more wildly, with the TCP data for an SL connection coming from multiple servers, and now with a possible different route to the CDN server, the UDP/TCP combination could bite. But that's a MEGO situation for me.

Share this post


Link to post
Share on other sites

"Ping User" showed the ping time to the user server, which tracked online avatars for IMs and so on. it was was one of those pieces of SL that did not scale so well, and it was replaced by different systems.

the idea behind that suggestion on the wiki was to compare ping time between the current simulator and some other server on the same network, to help guess if the network was broken or if the problem was local to that simulator.

with SL now hosted across a few data centers, comparing those two pings to diagnose simulator problems would have become dubious, even if that stat was still there.

Share this post


Link to post
Share on other sites

and yes, your doubts about what those pings even mean are valid too. they are not the same kind of ping you would get from a network utility, it's a measure of how long it takes from when a viewer sends out a message, to when the viewer receives a matching acknowledgement back from the simulator. so, it is really a measure of viewer delay plus network delay plus simulator delay. this number can be useful to see if something is wrong, but usually you will have to dig deeper to find out what the problem is.

Share this post


Link to post
Share on other sites


ObviousAltIsObvious wrote:

and yes, your doubts about what those pings even mean are valid too. they are not the same kind of ping you would get from a network utility, it's a measure of how long it takes from when a viewer sends out a message, to when the viewer receives a matching acknowledgement back from the simulator. so, it is really a measure of viewer delay plus network delay plus simulator delay. this number can be useful to see if something is wrong, but usually you will have to dig deeper to find out what the problem is.

There have also been a few recent discussions about how modems/routers are affecting this.

The measurements are when the Data reaches your computer.  ISP's have been doing things with the modems to control and shape traffic and the result has been if you have a lot of incoming data all at once it's getting qued by your modem resulting in a slow down.

Share this post


Link to post
Share on other sites

"a few data centers"?

The last I heard was that the servers were only in Phoenix AZ now. That doesn't exclude more than one data centre, but I've not pinned down any direct-from-Linden-Lab source for that, and it would have been easy enough to post to a blog or official webpage of some sort. It was allegedly said by a Linden at a TPV Developer meeting, but I don't even know which one.

(I sometimes wonder whether the UDP bandwidth setting has any significance with so much now being HTTP. Maybe it could be set lower than the past recommendations? All I've found on that hasn't changed in the last couple of years, while Monty Linden has been hustling away at HTTP for textures and mesh.)

I can see what you're getting at about checking different servers, but why don't they say that? It's a plausible guess, but it also looks like users trying to find a current explanation for some obsolete documentation.

 

Share this post


Link to post
Share on other sites

I know about the ADSL modem buffering problem, and how the only fix may be an expensive "office" quality black box which can handle a lot of connections. I did check with my ISP over the QOS settings they use for Second Life. They say they don't do any fiddling with my modem and, frankly, I am sceptical about that detail. You seem to be describing the buffering problem and then blaming it on ISP doing something.

It's not really my choice at the moment, but I did have a look at a couple of sales sites, and I can foresee having to deal with salesmen over that side of things. Not fun. Some of them haven't even heard of IPv6 yet.

Share this post


Link to post
Share on other sites


Carol Darkthief wrote:

I know about the ADSL modem buffering problem, and how the only fix may be an expensive "office" quality black box which can handle a lot of connections. I did check with my ISP over the QOS settings they use for Second Life. They say they don't do any fiddling with my modem and, frankly, I am sceptical about that detail. You seem to be describing the buffering problem and then blaming it on ISP doing something.

It's not really my choice at the moment, but I did have a look at a couple of sales sites, and I can foresee having to deal with salesmen over that side of things. Not fun. Some of them haven't even heard of IPv6 yet.

I'm really just stating the case.

I apologise that I don't have links to the discussion(s) on the modem problems handy.

But as far as ISP problems go, unfortunately the girl in this Thread had to hide the JIRA because of some of the info she had to give LL in order for them to diagnose the problem or you could see more details.  But as she quoted Monty Linden there:

"You have bad DNS information for our bake service. Instead of talking to Linden servers, the viewer and your browser have been trying to connect back to your own computer. If 'xxxxxx.pt' is your ISP, the problem may be there. Show them the nslookup command and output and they should understand the problem."

And again: "There are some odd cases that arise when talking to the baking service which is why I hopped on this. They haven't all been resolved. But ISP problems are still far more common and, honestly, probably becoming more common."

 

 

Share this post


Link to post
Share on other sites

And the DNS problem did get fixed by using a different DNS server. That still doesn't sound like the ISP messing with the modem the customer is using. There's a chain of DNS servers between the ISP's DNS server and whatever server is "authoritative" for a domain. And I've known that chain get a "bad" mapping between domain name and IP address when something has changed at the authoritative server. It takes time for the change to propagate. Maybe an ISP is running a DNS server with too long a TTL on the DNS entries. A quick check on the Wikipedia entry on DNS shows a link to a report on Slashdot but it was ten years ago.

That old claim does seem to linger in Google searches, and it would explain some things. But, unless Linden Lab are changing IP addresses very frequently it wouldn't explain anything I see. That sort of thing would cause a lengthy total fail, and I've never seen that for Second Life. It is something I have seen in the past for other services. And the fix of trying the Google or OpenDNS servers makes a very obvious difference if the ISP's DNS server were doing something weird.

It's not the place for solutions, but I do know what the scary devil monastery is. And sometimes Second Life and Linden Lab do seem to need a LART.

Share this post


Link to post
Share on other sites

First of all we are all in the same boat here. None of use know the exact answer to your questions so there is no point in getting grumpy about it.

Secondly, you are probably right about the documentation. The statistics bar was never completely documented and the documentation that there is is out of date by several years. There have been changes between then and now in the way it works and there have been some recent changes that may well have broken or changed the meaning of what is displayed.

An example of this is bandwidth. The description in the wiki may have been true at some point in time but ever since there was any HTTP usage it was wrong because that graph only displays UDP usage. (it has since been renamed UDP Data Received).

Thirdly, i believe that Linden Lab consider the statistics bar to be a part of the development interface and don't feel obliged to document it.

There is no one in the forums that can do anything about this. You could open a jira but the jira is triaged on the basis of what will bring the most advantage or benefit to the most users so this is not going to rank high.

Having said all this I still find the statistics bar useful, just take it with a pinch of salt.

 

ETA: The 'ping user' may be a reference to: The ping user EXEC command can be used to diagnose basic network connectivity. More info here: http://psut.jo/sites/asabha/new_page_2.htm

 

 

Share this post


Link to post
Share on other sites

In the past year they have been using Chandler AZ and Dallas. Virginia appears to have been dropped. The user server, on the other hand, was located in San Francisco before it was retired.

I can see what you're getting at about checking different servers, but why don't they say that? It's a plausible guess, but it also looks like users trying to find a current explanation for some obsolete documentation.

That bit of documentation was old and informal, predating that wiki page. It did originate in informal checks used by LL and users alike, and probably shouldn't have been in there, since it was obsolete by the time that version of the documentation was produced. The meaning of the old "Ping User" stat is not something that needs to be guessed about, you can see how mUserServerPingStat was populated in old viewer sources.

Share this post


Link to post
Share on other sites

Looking at the code (I had not had occasion to look at this before), the Ping Sim measure appears to be based on a separate Ping message (our own message type transmitted over UDP, not ICMP). Those messages are mixed in periodically with the other UDP messages that are more or less constantly flowing between the viewer and the simulator. Because it's the application that is turning that message around rather than a low level part of the network stack, the fact that it is consistently higher than ICMP ping to the same host isn't surprising. 

It's hard to guess what that 7 second display meant... it's possible that the simulator was temporarily not responding for a few seconds but eventually caught up, but without much more data that's just speculation.

When we first set up the Snack channel, we deliberately used regions that were getting a lot of avatar activity so that we would get a better sample of the resulting texture and mesh fetching activity and how it changed the load on the simulators. Due to some unforseen aspects of how we did it, we ended up making some simulator hosts handle quite a bit more activity than is typical; in a very few cases, that resulted in noticable problems for users (we moved some regions out of the test as a result). You may have seen one of those episodes.

The tests of the same CDN change on the BlueSteel RC at normal loads have not shown any problems like those, so we are confident that they were just a result of the unusual load pattern.

As others have commented here, both the Ping Sim and the packet loss numbers are based only on the UDP; they don't reflect what may be happening on the HTTP/TCP connections, since any packet loss on those is detected and corrected automatically by the network stack.

One of the reasons that it can be bad to set your Bandwidth setting very high is that it allows the amount of UDP to increase too much; since UDP does not have any other form of congestion control, too much of it can cause loss of TCP packets, resulting in much reduced performance on those connections.

Share this post


Link to post
Share on other sites

We moved out of the facilities we had been using in Dallas; it's all in Phoenix now. Well, we call it Phoenix... I assume it's in that general area somewhere - I don't need to know, and usually you won't either.

With the use of the CDN, your network distance from the datacenter will make much less difference than it did before.

Share this post


Link to post
Share on other sites


Oz Linden wrote:

<snip>

One of the reasons that it can be bad to set your Bandwidth setting very high is that it allows the amount of UDP to increase too much; since UDP does not have any other form of congestion control, too much of it can cause loss of TCP packets, resulting in much reduced performance on those connections.

First, many thanks for the clarifications and explanations.

But this comment really begs this question:

In This Thread we have someone raving about how good the new HTTP Viewer is.  But then they state:

"Set the draw distance to 512 meter and increase your maximum bandwith to 10000 kbps (normally you should not set it higher than 1500 kbps, but now you can)."

Why are we even allowed to do this if it is bad or is there some type of hard cap in the Viewer that we are not seeing?

The whole Bandwidth issue has been confusing for many, many users.  About as bad as the MeshMaxConcurrentRequests in Debug that still comes up.

Share this post


Link to post
Share on other sites

Thanks, Oz. That clears things up a lot. Knowing that Ping Sim is UDP to the region server does eliminate a lot of my guesswork about the high Ping Sim readings I see.

May I suggest a revised wording for the description in http://wiki.secondlife.com/wiki/Viewerhelp:Statistics

Ping Sim: How long it takes data to go from your computer to the region sim server you're currently using. This does not reflect texture and mesh loading through the CDN service but measures UDP communication to the sim server itself and is affected by both by the internet connection and the load on the server hardware. It will normally be higher than an internet ping or traceroute time to the same address.

I have seen claims, some rather old, that combinations of UDP and TCP traffic on the same link can lead to problems. It doesn't matter where the CDN server is, it's all going to converge on my broadband connection, but the telco engineer vans are exhibiting breeding behaviour in my part of the world, Maybe something will suddenly improve.

Share this post


Link to post
Share on other sites

It's probably a coincidence, but I'm seeing a better connection now. There are still bursts of high Ping Sim, but they are much smaller than they were. They may be related to events such as the arrival of a new Avatar in the region, but I wouldn't lay a bet on that.

I am seeing some of the same when I teleport, and maybe a very short burst of packet loss. 

Share this post


Link to post
Share on other sites


Perrie Juran wrote:


Oz Linden wrote:

<snip>

One of the reasons that it can be bad to set your Bandwidth setting very high is that it allows the amount of UDP to increase too much; since UDP does not have any other form of congestion control, too much of it can cause loss of TCP packets, resulting in much reduced performance on those connections.

First, many thanks for the clarifications and explanations.

But this comment really begs this question:

In
we have someone raving about how good the new HTTP Viewer is.  But then they state:

"Set the draw distance to 512 meter
and increase your maximum bandwith to 10000 kbps
(normally you should not set it higher than 1500 kbps, but now you can)."

Why are we even allowed to do this if it is bad or is there some type of hard cap in the Viewer that we are not seeing?

The whole Bandwidth issue has been confusing for many, many users.  About as bad as the 
MeshMaxConcurrentRequests
in Debug that still comes up.

Well, I did get my answer to my question in the Thread I linked.

So for anyone reading, this 10,000kpbs setting is a recommended "test set up" for HTTP Viewer.  LINK.  So it is specific to this Viewer.

Share this post


Link to post
Share on other sites

If you look back at that link the text above the line

"This service is live on Aditi now, in the “DRTSIM-258” channel "

was edited when snack channel went live. The text below refers to the aditi test which was live for a week or 2 before snack channel was set up. The recommended settings are for the aditi test, not CDN in general.

If you read that page but ignore all information about "agni" and "snack" it will read more or less how it did when the CDN test went live on aditi. There wasn`t a public version of http viewer at that time.

The test was for any viewer and the 10,000 kbs advice was purely for the aditi test.

Share this post


Link to post
Share on other sites


Perrie Juran wrote:

So for anyone reading, this 10,000kpbs setting is a recommended "test set up" for HTTP Viewer. 
.  So it is specific to this Viewer.

 

Hmm, that was probably not a good setting suggestion.  There is an outstanding Jira on values over 3Mbps.  Keeping it at or below 1.5Mbps is probably most reliable right now.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...