Jump to content

Andrew Linden

Retired Linden
  • Posts

  • Joined

Everything posted by Andrew Linden

  1. Nalates is correct. The way I would put is this: The viewer measured ping time includes the time it takes for the viewer to get around to processing the network packets. Although a packet may have already arrived at the ethernet port and is sitting in the queue, its "arrival time" is measured when it is actually processed, which is done in the same thread as the render work.
  2. I just finished coding up a fix for this bug, and it should work for all ghosted region boundary objects. It will have to get tested and bundled into a server update so the earliest it could hit RC would be around 2013.10.09.
  3. I have a fix for this but it didn't make it in time for this week's update (for unrelated technical problems). It is currently deployed to the DRTSIM-208 channel on the beta grid (aditi) which includes the region "Gibson" if anyone wants to check it out. Antony Fairport, who first reported this problem in a jira issue, has tested it and has verified that it works.
  4. This is the first I've heard of an "invisible avatar" problem. We'll be needing some details to investigate, such as the region name, time of event, would couldn't see who and where they were standing when it happened. Ideally this info would be put into a jira issue where the investigation could be tracked (rather than putting it in this thread). BTW, it occurs to me that there is a feature where a parcel owner can specify that people outside the parcel cannot see people inside, and visa-versa. I wonder if one of the parcels in the club is misconfigured with this option. BTW, one way to test this is that when anyone driving a vehicle wanders into such a parcel the whole vehicle should disappear.
  5. Turns out my theoretical fix did not work, but I did get some clues today that I hope will help me figure it out.
  6. Yes, the problem only affects people with certain internet connections. In particular connections that suffer UDP packet loss at relatively low bandwidth settings. Doing a normal HTTP download speed test on such a connection may indicate a higher bandwidth than is supported for UDP traffic -- TCP traffic may be treated differently than UDP. The workaround is to lower you SL viewer bandwidth settings in the preferences. The one resident who tried this was able to make the scripts load at 350kbps or lower. This will not necessarily degrade your SL experience as much as you might expect. The bandwidth setting only applies to UDP traffic -- HTTP traffic (which includes texture downloads) automatically adjusts itself to shat your connection can handle and is not regulated by the setting. I've got a theoretical fix for the problem but am having trouble reproducing a UDP restricted environment to test it. I've temporarily deployed the experimental server to the Gibson region on the beta grid. If anyone would like to test it there I will help by getting your account imported to the beta grid.
  7. Yes Ciaran, that sounds like the same thing. To really see it in action you would crank your viewer bandwidth settings down to a few hundred kbps, then turn away from a big crowd of Dwarfins, wait several seconds for the Dwarfins to move around, then turn around very quickly so they are all in view. The low bandwidth settings will cause a significant amount of time to pass before all of the linked pieces to get updated correctly.
  8. I logged into to Ruiz/159/80/1502 as per the instructions in BUG-1779 and finally noticed what you're Levio is talking about. For me it was rather hard to see becuase most of the meeroos would snap together pretty fast, but I dropped my viewer bandwidth settings down to 300kbps (from 1Mbps) and could reproduce it much easier. Each visible part of a meeroo is a child-prim. The root is an invisible prim that typically doesn't actually move when the meeroo sits down or stands up -- each child prim moves to a new local transform to produce the effect of motion to a new stance. The problem is that each prim is being scheduled independently of the others so they arrive out of order, and may be incorrectly sorted so that some for one meeroo will arrive much later than others for the same animal. Hrm... I will ponder how to fix this.
  9. @Pierre - No, reducing your cache size will not help. That setting just controls how much hard drive space you are willing to devote to the cache, and in this context includes both the texture cache and the object cache which are stored in separate formats and different locations. It doesn't control how big each per-region object cache file will be, just how big of a filing cabinet you have for those files. A workaround was mentioned earlier in this thread. Something about "flipping alpha masks and atmospheric shaders settings". If that works then it seems like the best workaround that I know of. Another theoretical workaround (assuming that this really is a bug in the viewer cache retrieval system) would be to manually delete the object cache files before login. By deleting these files out from under the viewer you will force the server to re-send the cacheable data without clearing your texture cache, which is the real bandwidth saver. If the objects still don't show up right then the viewer bug isn't in the cache retrieval system, but is in the render code itself where it figures out what should be in the render pipeline. The object cache files live in different places depending on your operating system. For Windows XP I happen to know that they live in C:\Users\$USER\AppData\Local\SecondLife\objectcache\ and have names of the form: objects_1001_1200.slc.
  10. "But why is the client suddenly having a problem it didn't have two months ago? " What has changed is that the server is now treating much more content as "cacheable". The server's definition of cacheable used to be something like: "is static and does not have a script". The new definition of cacheable is: "has not changed position or appearance in the last couple minutes". The viewer bug must exist inside the code that retrieves object data from cache and is more noticeable now because the viewer-side object cache tends to be bigger. I've brought the problem to the attention of one of the developers who is working on some viewer-side changes that will compliment the next server-side interestlist changes (*), so we hope to figure it out soon. (*) The server will soon support two hints from the viewer: (1) "I don't have an object cache file for this region at all" --> the server will then be able to bypass some of the initial "cache probe" messages of the protocol which looks something like this: Viewer: Hello, I would like to connect. Server: You are connected. Server: Do you have cache for object 123 whose version is 456? Viewer: No, I do not have cache for object 123. Server: Here is the data for object 123. Under the new system: when visiting a region for the first time the conversation will look more like this: Viewer: Hello, I would like to connect, and BTW I can tell you right now that I have no cache. Server: You are connected. Server: Here is the data for object 123. (2) "I'm willing to cache ALL object data in the region, including stuff that is too far away or too small to see" --> the server will eventually stream all cacheable data to the viewer, including sky boxes far above. It sends the non-visible data with a delay so that if you're flying through the region on a jetpack and leave soon after arriving you won't get the extra data, and once it starts sending the non-visible objects it will be in a lazy fashion -- lower bandwidth than when sending what is in front of you.
  11. I can explain why the avatar bounces when you raise the terrain under its feet: The terrain info is stored in two formats: (1) a 2D array of floating point heights (a "heightfield") that is used for visualization and (2) a big list of triangles (a "mesh") which is used fo collision purposes. Whenever you modify the terrain you are actually changing the heightfield data (1), which then must be translated into mesh form (2) before you avatar can collide with it. The conversion to mesh is computationally expensive and can take a significant amount of time to complete. To reduce lag caused by the mesh calculation the simulator employs two strategies (a) it waits several seconds after the last terrain modification just in case the person editing it is going to make another change (no need to compute the mesh if it is about to change again) and (b) it performs the mesh calculation on a separate thread, which allows the simulator to continue moving things along in the meantime. The mesh format was added to help optimize the physics engine. It is faster than a heightfield shape for collisions in general and most ray-trace events, however the benefits of going to mesh did not outweigh the work required to get it done... until we added the pathfinding characters. The performance improvements for computing paths over a mesh shape is much much better than doing the same for a heightfield collision shape, hence the work was done. So, when you make a change to the terrain it can take several seconds for the collision shape to update, but why does the avatar bounce? There is special code in the simulator that checks to see if your avatar is under the terrain. If it finds this to be true then it will lift your avatar to the surface. Unfortunately the code that performs the underground check is still using the heightfield data rather than the mesh data. The way to fix it would probably be to perform a ray-trace up against the terrain mesh from the avatar's feet and then if the ray hits the terrain we'd know the avatar's feet were actually below the mesh. In the meantime... don't stand where you're raising the terrain.
  12. The problem was caused by the new interestlist code which tries to not subscribe to objects that are outside of the camera view. When crossing a region boundary your viewer would extrapolate your camera forward, the vehicle would be created behind the camera, and the server would not subscribe to it. The faster the vehicle the more likely this would happen. The solution was to special case interestlist subscription to whatever you're sitting on -- the server now ALWAYS subscribes to your seat. We already had some special case code for seats, but it needed a little more.
  13. Janet, you mention that your cache is maxed, which sounds like you're talking about the cache on your local filesystem (hard drive), which can be configured to be large or small. I was talking about storage in memory (RAM). The viewer will take a lot of RAM, but there are limits to what it can get -- the rest of your operating system and other applications also need RAM to function. Also, there is special RAM in your graphics card which is also limited. If you are in a texture-heavy region (many large unique textures) then it may be that the viewer cannot hold all those visible AND those behind you, so it will swap out non-visible textures behind you to make room for the stuff it is trying to show in front. Also, it probably doesn't do all of this switching at once because it takes time to sort through and move lots of data around and to do so in one solid chunk of work would cause your render frame rate to block for a significant fraction of a secondb -- a big lag event. The work is probably parceled out across multiple frames so the scene will update in a timely fashion -- less lag over a longer period. You might try reproducing the problem in a region with lots of objects but much fewer textures (a big build still in its plywood stage). You might also try to discover how much memory is on your video card and consider testing the texture-heavy scene on another system with a higher-grade graphics card (if you have access to such).
  14. After looking through the SL UDP packet code a bit more I have a little better understanding of how it works than last time I spoke about it. I learned that when we actually pack data for a UDP packet we limit the payload to 1200 bytes. We're aiming for a max of 1500 bytes for the entire packet, counting UDP header and some ACK information that we add to the end of the normal 1200 byte payload, so we leave 300 bytes for that stuff. Once the final payload is computed (including the initial data and postpended ACKs, but not including the UDP protocol header) we check the size and print a WARNING to logs if it exceeds 1500 bytes, but we send it anyway. This is a little bugged. We should be checking for "data_length > MTU - UDP_header_size" at this point. This wouldn't be hard to fix -- a little googling reveals that UDP_header_size = 64 bytes. However the question is: when aiming for 1200 byte payloads, what is the maximum size of the true packet? Most of SL's packet types easily fit into just a few hundred bytes, but the ones that stream object data are packed to the maximum -- if we're ever exceeding an MTU then it would occur when packing those. I suppose I should survey the maximum packet sizes of a typical connection to a very full and busy region to see how big they are getting and if we are exceeding 1492, or maybe 1456, then maybe we should aim lower than 1200 bytes. The UDP packet size bug that was affecting Magnum last week would cause payloads to go over 3000 bytes! Interestingly most network routing hardware would transparently split these packets and reassmble them before they arrived at the client computer but not all -- some people were seeing terrible packet loss. If we are occastionaly exceeding the true MTU of some connections then most people wouldn't be affected but it might show up as a steady background of packet loss for a few.
  15. Janet, what you're describing sounds like a behavior of the viewer. It is probably performing a lazy scan for textures that are not being rendered in view and slowly unloading those from memory -- in an effort to prepare the memory for any new textures that might show up, and to help keep your render frame rate high. This is definitely not related to the server delaying updates for objects behind you since these objects are not chaning and thus would not be getting updates anyway.
  16. @Triple -- Thanks for the update. I think the fix to the UDP packet acks would drop your bandwidth from the terrible 1GB/H back down to the normal 5MB/H. Improvements beyond that must be attributable to other changes. Some of the lower bandwidth you're seeing is the lack of packets updating the old cloud density layer (now gone), and a slower update for the wind layer, however I estimate the savings there to be only about 0.15 MB/H. A possible source of savings would be from the improved server-side culling of updates for objects that are not visible. If there are any objects that are moving around or changing appearance near the bot but are outside its view then update rates for those objects will be much attenuated in Magnum. @Everybody else -- This behavior where updates for out-of-view objects are not sent until they are within view causes a visible glitch that some people have noticed. In particular, if you have a pile of objects that all move around or change appearance while your view is turned away and then you turn around to look at them you'll briefly see them in their old state and then they will update in your view after about 1 second or less. I haven't seen anyone complain about this glitch recently and I'm curious to know if it is still annoying anyone. If anyone has an opinion about this glitch I'd like to hear it. I've got a "fix" that reduces the glitch, but doesn't actually eliminate it. I'm planning on submitting that work as its own RC project and it is probably a couple of weeks down the RC schedule. I'd like to know how much the glitch will annoy people between now and the release the "fix".
  17. Triple, with some details about the problem from Latif Khalifa I believe was able to find the real problem in the server that is sending so much data to libOpenMV bots, and probably too much data to regular viewers as well. The SL UDP protocol appends ACKs for recieved packets on the end of outgoing packets that have extra room. Meanwhile some of the UDP packets are "zero coded" which is a simple compression scheme sometimes applied to packets that have large blocks of zeros. The bug (if my understanding of the code is correct) affects ACKs on the end of zero coded packets -- those ACKs may be lost or read incorrectly, which can make the protocol think that other packets have been lost --> it may resend. The SL UDP protocol does not require a packet to be zero coded -- it is up to the sender to decide to compress its packets or not. The receiver should be able to handle them in either case So, in theory there is a workaround: configure libOpenMV to NOT send any zero coded packets. This configuration may be beyond most libOpenMV users, but I figure I would mention it in case there are any intrepid residents who want to try it before we can get an update to Magnum.
  18. I belive I've found the problem. There is a bug in our size calculation of the ObjectUpdate message that is causing us to send packets that are larger than an ethernet MTU (1500 bytes). So the current theory is that most network connections can handle such packets by breaking them up into smaller packets that fit their protocol and then reassemling them on the other side, however some connections must fail to do this. We're going to update some regions for test on the beta grid (aditi) with the new code. I believe Morris and Ahern will be included in this set.
  19. Sorry Jessica, I didn't mean "hack" in any pejorative way. I meant it as a "solution to a problem using unexpected methods or tools" -- I'm definitely not against such "hacks", and I was quite intrigued with this TPV innovation when I heard about it years ago. For some people "hack" has a negative connotation, so perhaps a better word would be "workaround" since the expanding draw distance feature is to reduce a "bug" that is external to the viewer, namely the poor near-to-far sorting by the server. Meanwhile, I've been thinking about the symptoms and also reviewing some of the changes in the server code looking for clues.... I have a tentative theory that I'm not satisfied with, but I'll share it anyway. The new interestlist code packs the same average bandwidth per viewer over a simulator frame, however it builds the individual packets a little faster and thus hands them to the network hardware in tighter temporal clumps. Picture, if you will, bursts of data that have the same area and frequency as before, but higher peak and narrower width. My most favorable tests for sending a storm of full updates showed a speedup of about 50% less time (say 105 msec to pack instead of 200 msec), so while the average bandwidth is the same the momentary bandwidth on the wire might double. The theory is that these narrower bursts of data represent higher momentary bandwidth that is overwhelming some packet buffers of network equipment between the servers and some viewers. One way to test this theory is for those afflicted to try the following: (1) Via the preferences reduce the total bandwidth. Preferences --> Setup --> Total Bandwidth (2) Reduce the draw distance. Preferences --> Graphics --> Advanced --> Draw Distance (3) Relog I don't have a lot of confidence in this theory, but if someone could test it and report back we can either elevate or retire the idea. One last thing I'll mention. Although I can't find the post now I recall someone mentioning in this thread that one of the symptoms they noticed was that some linked sets were only showing up as their root prims. When the visible prims were selected the rest of the object showed up. This problem can be caused by out of order packets where the child prim info arrives before the viewer knows anything about the root -- the viewer is not very smart about connecting the child prims to their roots in this situation. Out of order packets don't explain all the problems reported here, but are indicative of general network problems. Edited to fix bad formatting.
  20. Thanks for the info everybody. Firestorm is off the hook -- it has been tested and works for some people, so the source of the problems must lie elsewhere. I've got a theory about the bot problems and a corresponding fix lined up for next week. The problem is that the current simulator on Magnum will initialize the viewer's camera as having a zero draw distance when they first connect to the region. It expects the SL client to send a non-zero draw distance in the AgentUpdate messages, but some bot implementations may specify zero draw distances. One might expect that a zero draw distance would minimize data from the server, however by a quirk of the logic it now causes the visibility of some objects to be constantly cleared and then streamed again. This thrashes the server a bit and causes it to send a lot of packets to the bot. The fix is to sanitize the "Far" feild of the AgentUpdate messages at the server. If a client sends an unreasonable value there we will clamp it within an expected range. Also, the draw distance is initialized to a non-zero value -- just in case the client NEVER sends an AgentUpdate. If you have a bot on Magnum you probably want to configure it such that it sends a non-zero draw distance in the AgentUpdate messages. This appears to be possible in libOpenMV bots -- dunno about other varieties. Some (but not all) of the reports here sound exactly like bad network connectivity: rubber banding and packet loss in conjunction with objects not showing up or taking a long time to appear. But why are most of these reports being seen on Magnum? I'm not sure if the Magnum servers are all located in just one of our colocation facilities or are distributed across several. I'm going to query some of our operations people to see if they have any ideas about this. One of the other changes currently on Mangum is that the server uses the CameraCenter field of the AgentUpdate message for ALL visibility logic, instead of using the location of the Avatar (previously it was a mixture of the two positions). If an SL client is sending zero, or some other incorrect value, for CameraCenter then I would expect the visibility calculations to be wrong -- some things woudn't show up because the server thinks the camera is elsewhere and is sorting objects accordingly. I would expect all TPVs to send the correct CameraCenter, but some simple bots might not.
  21. We ran one test with Firestorm from a residential connection but were not able to reproduce the problem. The test involved teleporting from pointA to pointB and back to pointA to see if all the pile of scripted objects in pointA would all show up on return -- they did. We noticed that the content took longer to show up in Firestorm, whereas it shows up almost immediately in the official SL client. I wonder if Firestorm is using the a gradually increasing draw distance hack to help content get sorted near to far? I've heard rumors of this feature in TPV's but do not know if Firestorm is using it. For this test we had a 64m draw distance. I believe Sky Linneaus is typically running with a draw distance that is larger than the official viewer even allows (512m). We should repeat the test with a maximized draw distance in a region with a lot of content. Whirly, I take it that the packetloss is primarily happening on TP arrival?
  22. If someone could send me a landmark to a region or skybox where these problems are showing up I'd appreciate it. I'm going to check out Lusk to see if I can witness any of Ardy's reports. Tiple, I don't think Sky's problem is related to bots since that region is usually empty. Sky, your region on the Magnum channel is public, but you've got some parcel setting that prevents teleports. I'm going to temporarily add myself to the Estare access list so I can look around to try to reproduce the problems.
  23. Sky - I'll number your symptoms so we can specify: (1) Objects and avatars not showing up (2) Objects showing up but slowly (3) Slow fall through floor of skyboxes (this is about avatars specifically?) (4) Objects not updating position until they are back in camera view (5) Unable to open objects to view their contents Everything could be explained by a bad net connection except (4) which is known new behavior in the Mangum RC. I'm not positive that there was a network glitch, but some more info may be able to rule it out. Are symptoms (1)-(3) and (5) happening constantly or sometimes? Do they affect everyone at the same time or only some people? Item (3) sounds like "rubberbanding" which can happen when experiencing very long ping times (packet travel time) to the server -- did you notice other objects rubberbanding at the time? and do you see horizontal rubberbanding as well as verticall "fall through"? Item (5) sounds like outright network failure or extreme packet loss. Did you notice packetloss in the stats (CTRL + SHIFT + 1) when this was happening? BTW, I've got a fix that will reduce (4) significantly, but it is not yet ready for RC.
  24. Triple - I downloaded the LibOpenMetaverse code. It appears that libOpenMV's "bot" code is their "TestClient" program. I did some quick searches through the codebase for certain keywords and it appears that the TestClient does indeed support sending AgentUpdate messages, which makes sense since these messages must be sent if your bot is going to move around. However it is not clear whether a valid camera "Far" clip is sent in your case, or if it is sending zero. The values that are put into the AgentUpdate message depend on some config files. It makes sense to me that someone running a chat bot, or a parcel search bot, would set the camera's Far clip to zero to minimize streaming of visual objects which the bot would probably just ignore. I also looked over our own code again, but more carefully, and I'm prepared to put more confidence into my theory. I will try to submit a patched version of the code for update next week. It will have to pass QA over the weekend so there is no guarantee that it will get released, but I'll be working on that today.
  25. Triple - I've been examining the code to figure out why the simulator might start sending lots of data to a bot. I think I identified one mode: if the bot never sends a valid AgentUpdate UDP message (which includes info about the camera's view angles and draw distance) then the server-side culling will be using some initialized values that might trigger resends of data. I'll change the initialized camera properties to be more sane defaults and try to get this out in an update, but if you happen to know if your bot implements the AgentUpdate message or not I'd love to hear back. At the moment this is my only theory for the cause of the problem.
  • Create New...