Jump to content

Henri Beauchamp

Resident
  • Posts

    1,188
  • Joined

Everything posted by Henri Beauchamp

  1. I fly and sail with 512m draw distance. 256m is not enough to see where you are going. Not an issue with the Cool VL Viewer though (it is fast enough). 😜
  2. Whatever you wrote earlier is not the issue I reacted about (and yes, I did read your former messages, and even visited the sites you linked to). What I reacted about is your dangerous shortcut in the very message I reacted to (what you wrote before this message did not shock me), and more precisely, the part in this sentence of yours I cited in my last post: This shortcut you did means ”LL TOS approval = Discord TOS approval”, and is all wrong. This is exactly what I reacted about. Period. And there is strictly nothing ”rude” in my post, so please do not be ”rude” in your turn...
  3. Re-read my message above . The SL TOS is not the Discord TOS !... A SL Resident obviously agreed to the SL TOS, but who tells you they also agreed to Discord's one (I certainly did not, and thus why I am not using Discord) ? As as per the SL TOS and Community Standards, you are not allowed to retransmit out of SL in any way the chat you get with someone in SL, unless all the chatting persons have explicitly and beforehand agreed to this. Now, if you are using your chat relaying tools for specific purposes, in a place (e.g. a shop, a private sim, etc) where any entering residents would be forewarned that you are retransmitting their chat, this is acceptable (if they are not happy about it, they can move on to another place)... Same thing for a group for which you would relay the IMs, on the condition this is clearly specified in the charter of the said group (if they joined the group, they agreed to its charter).
  4. I do, and I get at the very minimum +10% fps rates under Linux when compared with Windows 11, both optimized to the extreme, with same clocks on CPU and GPU, and for the bloated Windoze 11, every useless/superfluous ”service” turned off (or right out uninstalled/removed/destroyed), including Search, Defender, etc. The difference in fps rates can go up to +25% in favour of Linux, depending on the scene being rendered. But what is the most impressive, is the difference in smoothness: under Windows, the same viewer (whatever the viewer, provided it got both native Linux and Windows builds) will experience way more ”hiccups” and unstable frame rates than under Linux. The difference there is massive. Not to mention stability, especially with some Windows drivers (I am in particular thinking about AMD's OpenGL drivers here)... Have a look at this post for some OpenGL performances comparisons.
  5. Do not even think about trying Linux, or it will make you cry, so much faster, smoother and stabler it is compared with the other (lesser) OSes... 🤣 macOS always had a very lame/ancient/partial/bogus OpenGL implementation, so it is no surprise at all it is so much slower with SL viewers. If you want good performances in SL out of Macintosh hardware, then install Linux on it !
  6. There is one. Not using it myself (I hate Discord), but exhuming a chat log taken during Open Source meeting, here is the info about it: Channel: https://discord.gg/gP7H7XVAP3 Invite request form: https://docs.google.com/forms/d/1I0jtI2N_od9MxkECctnjpFa-W8Vc5Qke41gJcf0v5Yg/ It was setup to discuss contents creation and stuff, but there are likely other ”rooms” (or whatever they call it in Discord) for other stuff...
  7. Today, I made an immense effort (no, I'm not even kidding here), and filed up a JIRA, which took me a lot of precious time (that I could have much better spent developing my viewer instead) and made my old-fart-self grumble and pester against this poorly designed piece (to stay polite) of web site, with my password forgotten by the JIRA (again, and as pretty much every time I use it), small text boxes to fill up the form when I need a wall of text, no ”draft” saving for that form, so that you can gather any missing data from another OS (with reboot needed) and come back to the form editing when you got that data, etc, etc... 😞 So, here you go Linden Lab: https://jira.secondlife.com/browse/BUG-234564 It will not be said that I do not do every effort to help improving SL...
  8. This is good for the timeout part, then (and a proof that libcurl is the culprit for those silent retries we get in C++ viewers). Nope, not for poll requests... IIRC, only a few capabilities were configured with HTTP Keep-Alive (e.g. GetMesh2). However, and even though you get the proper server-side timeouts at your Rust code level (which is indeed a good thing), you still have the issue with the race condition occurring during the timed out HTTP poll request tear down (as explained by Monty in the first posts of this very thread): you are then still vulnerable to this race condition, unless you use the same kind of trick I implemented... or Monty fixes that race server side.... or we get a new ”reliable events” transmission channel implemented (I still think that reviving the old (for now blacklisted) UDP messages would be the simplest way to do it and would be plenty reliable enough).
  9. The timeout happens server-side after 30s without event. If you do not observe this at your viewer code level with a 90s configured timeout, then you are also the victim of ”silent retries” by your HTTP stack. Fire Wireshark (with a filter such as ”tcp and ip.addr == <sim_ip_here>”), launch the viewer and observe: when nothing happens in the sim (no event message) for 30s after the last poll request is launched, you will see the connection closed (FIN) by the server, and there, the rust HTTP stack is likely doing just what libcurl is doing, retrying ”silently” the request with SL's Apache server... Note that you won't observe this in OpenSim; I think this weird behaviour is due to the 499 or 500 errors ”in disguise” (you get a 499/500 reported in body, but a 502 in the header) we often get from SL's Apache server (you can easily observe those by enabling the ”EventPoll” debug tag in the Cool VL Viewer: errors are then logged with both header error number and body)...
  10. No, it's an entirely different issue, and I'd wish we could get back to my original post, which is all about shadows (or lack thereof)...
  11. The documentation is in the code... 😛 OK, not so easy to get a grasp on it all, so here is how it works (do hold on to your hat ! 🤣 ) : I added a timer for event polls age measurement; this timer (one timer per LLEventPoll instance, i.e. per region) is started as soon as the viewer launches a new request, and is then free-running until a new request is started (at which point it is reset). You can visualize the agent region event poll age via the ”Advanced” -> ”HUD info” -> ”Show poll request age” toggle. For SL (OpenSim is another story), I reduced the event poll timeout to 25 seconds (configurable via the ”EventPollTimeoutForSL” debug setting), and set HTTP retries to 0 (it used to be left unconfigured, meaning the poll was previously retried ”transparently” by libcurl until it decided to timeout by itself). This allows to timeout on poll requests viewer-side, and before the server would itself timeout (like it would after 30s). Ideally, we should let the server timeout on us and never retry (this is what is done and works just fine for OpenSim), but sadly, and even when setting HTTP retries to 0, libcurl ”disobeys” us and sometimes retries ”transparently” the request once (probably because it gets a 502 error from SL's Apache server, while this should be 499 or 500, and does not understand it as a timeout, thus retrying instead), masking the server-side timeout from our viewer-side code. This also involved adding ”support” for HTTP 499/500/502 errors in the code, so that these won't be considered actual errors but just timeouts. In order to avoid sending TP requests (the only kind of event the viewer is the originator for and may therefore decide to send as it sees fit, unlike what happens with sim crossing events, for example) just as the poll request is about to timeout (causing the race condition, which prevents to receive the TeleportFinish message), I defined a ”danger window” during which the TP request by the user shall be delayed until the next poll request for the agent region is fully/stably established. This involves a delay (adjustable via the ”EventPollAgeWindowMargin” debug setting, defaulting to 600ms), which is subtracted from the configured timeout (”EventPollTimeoutForSL”) to set the expiry of the free-running event poll timer (note: expiring an LLTimer does not stop it, it just flags it as expired), and is also used after the request has been restarted as a minimum delay before which we should not either send the TP request (i.e. we account for the time it takes for the sim server to receive the new request, which depends on the ”ping” time and the delay in the Apache server); note that since the configured ”EventPollAgeWindowMargin” may be too large for a delay after a poll restart (I have seen events arriving continuously with 200ms intervals or so, e.g. when facing a ban wall), the minimum delay before we can fire a TP request is also adjusted to be less than the minimum observed poll age for this sim, and I also do take into account the current frame rendering time of the viewer (else should the viewer render slower than events come in, we would not be able to TP at all). Once everything properly accounted for, this translates into a simple boolean value returned by a new LLEventPoll::isPollInFlight() method (true meaning ready to send requests to the server; false meaning not ready, must delay the request). In the agent poll age display, an asterisk ”*” is added to the poll age whenever the poll ”is not in flight”, i.e. we are within the danger window for the race condition. I added a new TELEPORT_QUEUED state to the TP state machine, as well as code to allow queuing TP request triggered by the user whenever isPollInFlight() returns false, and to allow sending it just after it returns true again. With the above workaround, I could avoid around 50% of the race conditions and improve the TP success rate, but it was not bullet-proof... Then @Monty Linden suggested to start a (second) poll request before the current one would expire, in order to ”kick” the server into resync. This is what I did, this way: When the TP request needs to be queued because we are within the ”danger window”, the viewer now destroys the LLEventPoll instance for the agent region and recreates one immediately. When an LLEventPoll instance is deleted, it yet keeps its underlying ”LLEventPollImpl” instance live until the coroutine which runs within this LLEventPollImpl finishes, and it sends an abort message to the llcorehttp stack for that (suspended, since waiting for the HTTP reply for the poll request) coroutine. As it is implemented, the abortion will actually only occur on next frame, because it goes through the ”mainloop” event pump, which is checked on start of each new render frame. So, the server will not see the current poll request closed by the viewer until next viewer render frame, and as far as it is concerned, that request is still ”live”. Since a new LLEventPoll instance is created as soon as the old one is destroyed, the viewer immediately launches a new coroutine with a new HTTP request to the server: this coroutine immediately establishes a new HTTP connection with the server, then suspends itself and yields/returns back to the viewer main coroutine. Seen from the server side, this indeed results in a new event poll request arriving while the previous one is still ”live”, and this triggers the resync we need. With this modification done, my workaround is now working beautifully... 😜
  12. The diagram is very nice, and while it brings some understanding on how things work, especially sim-server side, it does not give any clue about the various timings and potential races encountered, servers-side (sim server, Apache server, perhaps even the SQUID proxy ?)... You can have the best designed protocol at the sim server level, but if in the end, it suffers from races due to communications with other servers and/or because of weird network routing issues (two successive TCP packets might not take the same route) between viewer and servers, you still see bugs in the end. What we need is a race-resilient protocol; this will likely involve redoing the server and viewer code to implement a new ”reliable” event transmission (*), especially for essential messages such as the ones involved in sim crossing, TPs, and sims connections. I like Animat's suggestion to split message queues; we could keep the current event poll queue (for backward compatibility sake and to transmit non-essential messages such as ParcelProperties & co), and design/implement a new queue for viewers with the necessary support code, where the essential messages would be exchanged with the server (the new viewer code would simply ignore such messages transmitted over the old, unreliable queue). (*) One with a proper handshake, and no timeout, meaning a way to send ”keep-alive” messages to ensure the HTTP channel is never closed on timeout. Or perhaps... resuscitating the UDP messages that got blacklisted, because the viewer/server ”reliable UDP” protocol is pretty resilient and indeed reliable ! Try the latest Cool VL Viewer releases (v1.30.2.32 & v1.31.0.10): they implement your idea of restarting a poll before the current poll would timeout, and use my ”danger window” and TP request delaying/queuing to ensure the request is only issued after the poll has indeed been restarted anew. It works beautifully (I did not experiment a single TP failure in the past week, even when trying to race it and TPing just as the poll times out). The toggle for the TP workaround is in the Advanced -> Network menu. 😉
  13. Well, PBR is already ”live” on the main grid (in a few test regions), and Firestorm already got an alpha viewer with PBR support... So it's time for you to look at it ! 😛 I'm afraid no... LL opted to do away entirely with the old renderer (EE ALM and forward modes alike), and there will be no way to ”turn it off”. The only settings you will be able to play with are the ones for the reflections (reflection probes are extremely costly in term of FPS rates, and won't allow ”weak” PCs to run PBR decently when turned on). Of course, you will be able to use the Cool VL Viewer, which already got (for its experimental branch) a dual renderer (legacy ALM+forward, and PBR, switchable on the fly with just a check box), but it will not stay forever like this (at some point in the future, everyone will have to bite the bullet and go 100% PBR, especially if LL finally implements a Vulkan renderer, which is very desirable on its own)...
  14. Citation from the blog: Well, it would be all nice and dandy (with indeed a tone mapping that is at last ”viewable” on non-HDR monitors), if there was not a ”slight” issue with the new shaders: they ate up the shadows ! Demonstration (taken on Aditi in Morris, with Midday settings): First the current release viewer v6.6.15.581961: Second, the newest PBR RC viewer v7.0.0.581886, Midday with HDR adjustments: And even worst for the shadows (but better for the glass roof transparency), with the same RC viewer and legacy Midday (no HDR adjustment): Notice all the missing (or almost wiped out) shadows (trees, avatar, in particular), as well as how bad the rest of the few shadows look now, when compared to the ”standard”... I raised this concern as soon as I backported the commit responsible for this fiasco to the Cool VL Viewer (and immediately reverted it), but I was met with chirping crickets... Let's see if crickets do chirp here too, and if residents at all care about shadows.
  15. It should not crash in the first place: report that crash either via the JIRA for SL official viewers, or to the developer(s) for TPVs (the support channel will vary from one TPV to another, but all TPVs should have a support channel), providing the required info (crash dump/log, viewer log, etc) and repro steps where possible.
  16. Currently, such a race will pretty much never happen viewer-side in the Agent's region... The viewer always keeps the LLEventPoll instance it starts for a region (LLViewerRegion instance) on EventQueueGet capability URL receival, until the said region gets farther than the draw distance, at which point the simulator is disconnected, then the LLViewerRegion instance is destroyed, and the LLEventPoll instance for that region with it; as long as the LLEventPoll instance is live, it will keep the last received message ”id” on its coroutine stack (in the 'acknowledge' LLSD). However, should EventQueueGet be received a second time during the connection with the region, the existing LLEventPoll instance would be destroyed and a new one would be created with the new (or identical: no check is done) capability URL. For the agent's region, I so far never, ever observed a second EventQueueGet receival, and so the risk to see the LLEventPoll destroyed and replaced with a new one (with a reset ”ack” field on first request of the new instance) is pretty much inexistent. This could however possibly happen for neighbour regions (sim capabilities are often ”updated” or received in several ”bundles” for neighbour sims; not too sure why LL made it that way), but I am not even sure it does happen for EventQueueGet. I of course do not know what is the LLAgentCommunication lifespan server-side, but if race happens, it can currently only be because it does not match the lifespan of the connection between the sim server and the viewer. In fact, ”ack” is very a badly chosen key name. It is not so much of an ”ack” than a ”last received message id” field: it means that unless the viewer receives a new message, the ”ack” value stays the same for each new poll request it fires and that do not result in the server sending any new message before the poll times out (this is very common for poll requests to neighbour regions). Note also, that as I already pointed out in my previous posts, several requests with the same ”ack” will appear server-side because these requests have simply been retried ”silently” by libcurl on the client side: the viewer code does not see these retries. For LLEventPoll, a request will not been seen timing out before libcurl retried it several times and gives up with a curl timeout: with neighbour sims, the timeout may only occur after 300s or so in LLEventPoll, while libcurl will have retried the request every 30s with the server (easily seen with Wireshark), and the latter will have seen 10 requests with the same ”ack” as a result. Also, be aware that with the current code, the first ”ack” sent by the viewer (on first connection to the sim server, i.e. when the LLEventPoll coroutine is created for that region, which happens when the viewer receives the EventQueueGet capability URL), will be an undefined/empty LLSD, and not a ”0” LLSD::Integer ! Afterwards, the viewer simply repeats the ”id” field it gets in an event poll reply into the next ”ack” field of the next request. To summarize: viewer-side, ”ack” means nothing at all (its value is not used in any way, and the type of its value is not even checked), and can be used as the server sees fit. Easy to implement, but it will not be how the old viewers work, so... Plus, it would only be of use should the viewer restart an LLEventPoll with the sim server during a viewer-sim (not viewer-grid) connection/session, which pretty much never happens (see my explanations above). That hardening part is already in the Cool VL Viewer for 499, 500, 502 HTTP errors, which are considered simple timeouts (just like the libcurl timeout) and trigger an immediate relaunch of a request. All other HTTP errors are retried several times (and that retries number is doubled for the agent region: it has been of invaluable help a couple years ago, when poll requests were failing left and right with spurious HTTP errors for no reason, including in the agent region). This is already the case in current viewers code: there's a llcoro::suspendUntilTimeout(waitToRetry) call for each HTTP error, with waitToRetry increased with the number of consecutive errors. Already done in the latest Cool VL Viewer releases, for duplicate TeleportFinish and duplicate/out-of-order AgentMovementComplete messages (for the latter, based on its Timestamp field). Frankly, this should never be a problem... Messages received via poll requests from a neighbour region that reconnects, or a region the agent left a while ago (e.g. via TP) and comes back in, are not ”critical” messages, unlike messages received from the current Agent region the agent is leaving (e.g. TeleportFinish)... I do not even know why you bother counting those... As I already explained, you'll get repeated ”ack” fields at each timed out poll request retry. These repeats should simply be fully ignored; the only thing that matters, it that one ”ack” does not suddenly becomes different from the previous ones for no reason. That's a very interesting piece of info, and I used it to improve my experimental TP race workaround, albeit not with an added POST like you suggest: now, instead of just delaying the TP request until outside the ”danger window” (during which a race risks to happen), I also fake an EventQueueGet capability receival for the Agent's sim (reusing the same capability URL, of course), which causes LLViewerRegion to destroy the old LLEventPoll instance and recreate one immediately (the server then receives a second request while the first is in the process of closing (*), and I do get the ”cancel” from the server in the old coroutine). I will refine it (will add ”ack” field preservation between LLEventPoll instances, for example), but it seems to work very well... 😜 (*) yup, I'm using a race condition to fight another race condition ! Yup, I'm totally perverted ! 🤣
  17. See this old post of mine: AMD drivers may not be the only culprits (though, running a ”fixed” viewer regarding VRAM leaks, won't fix leaks happening at the OpenGL driver level, when the said driver got bugs).
  18. I am totally conscious about this, however we (animats & I) proposed you a ”free lunch”: implementing those dummy poll reply messages server-side (a piece of cake to implement server side, and which won't break anything, not even in old viewers) to get fully rid of HTTP-timeouts-related race conditions. Then we will see how things fare with TeleportFinish already, i.e. will it be always received by viewers ?... There is nothing to loose trying this, and this could possibly solve a good proportion of failed TPs... If anything, and even should it fail, it would allow to eliminate a race condition candidate (or several), and reverting the code server side would be easy and without any consequence.
  19. This is not the issue at hand, and not what I am observing or would cause the race condition I do observe and I am now able (thanks to the ”request poll age” new debug display in the Cool VL Viewer) to reproduce at will; this is really easy with configured defaults (25s viewer side timeout, and experimental TP race workaround disabled): wait until the poll age display gets a ”*” appended, which will occur at around 24.5s of age , and immediately trigger a TP: bang, TP fails (with timeout quit) ! The issue I am seeing in ”normal viewers” (viewers with LL's unchanged code and that my changes only allow to artificially reproduce ”reliably”), is a race at the request timeout boundary: the agent sim server (or Apache behind it) is about to time out (30s after the poll request has been started viewer side, which will cause a ”silent retry” by libcurl), and the user requests a TP just before the timeout occurs, but the TeleportFinish message is sent by the server just after the silent retry occurred or while it is occurring. The TeleportFinish is then lost, so what would happen in this case is: The sim server sent a previous message (e.g. ParcelProperties) with id=N, and the viewer replied with ack=N in the following request (with that new request not yet used, but N+1 being the next ”id” to send by the server). The user triggers a TP just as the ”server-side” (be it at the sim server or Apache server level, this I do not know) is about to time out on us, which happens 30s after it received the poll request from the viewer. At this point a Teleport*Request UDP message is sent to the sim server. The poll request started after ”ParcelProperties” receival by the viewer times out server-side and Teleport*Request (which took the faster UDP route) is also received by the sim server. What exactly happens at this point server-side is unknown to me: is there a race between Apache and the sim server, a race between the Teleport*Request and the HTTP timeout causing a failure to queue TeleportFinish, is TeleportFinish queued in the wrong request queue (the N+1 one, which the viewer did not even start, because the sim server would consider the N one dead) ?... You'll have to find out. Viewer side, libcurl gets the server timeout and retries silently the request (unknown to the viewer code in LLEventPoll), and a ”new” (actually the same request, but retried ”as is” by libcurl) request with the same ack=N is sent to the server (this is likely why you get 3 millions ”repeated acks”: each libcurl retry reusing the same request body). The viewer never receives TeleportFinish, and never started a new poll request (seen from LLEventPoll), so is still at ack=N, with the request started after ParcelProperties still live/active/valid/waiting for server reply, from its perspective (since successfully retried by libcurl). With my new code and its default settings (25s viewer-side timeout, TP race workaround OFF), the same thing as above occurs, but the request times out at LLEventPoll level (meaning the race only reproduces after 24.5s or so of request age), instead of server-side (and then retried at libcurl level); the only difference you will see server-side is that a ”new” request (still with ack=N) by the viewer arrives before the former timed out server-side (which might not be much ”safer” either, race-condition-wise, server-side). This at least allows a more deterministic ”danger window”, thus the easiness to reproduce the race, and my attempt at the TP race workaround (in which the sending of the UDP message corresponding to the user TP request is delayed until outside the ”danger window”), which is sadly insufficient to prevent all TP failures. As for ack=0 issues, they are, too, irrelevant to cases when TPs and region crossings fail: in these two cases, the poll request with the agent region is live, and so it is for the region crossing to a neighbour region. There will be no reset to ack=0 from the viewer in these cases since the viewer would never kill the poll request coroutines (on which stack ack is stored) for the agent and close ( = within draw distance) neighbour regions. But I want to reiterate: all these timeout issues/races would vanish altogether, if only the server could send a dummy message when nothing else needs to be sent, before the dreaded 30s HTTP timeout barrier (say, one message every 20s, to be safe).
  20. Depending on your window manager (Sawfish can do it, but some others can too), you could perhaps add a rule to it to disable window decorations (title bar, buttons, borders) for FS...
  21. LL's viewer current code considers these cases as errors, which are only retried a limited amount of times before the viewer would give up on the event polls for that sim server; these should therefore not happen in ”normal” condition, and it does not happen just because the code currently lets libcurl retry and timeout by itself, at which point the viewer gets a libcurl-level timeout, which is considered normal (not an error), and retried indefinitely.
  22. You can increase the timeout to 45s with the Cool VL Viewer now, but sadly, in some regions (*) this will translate in a libcurl-level ”spurious” retry after 30s or so (i.e. a first server-side timeout gets silently retried by libcurl), before you do get a viewer-side timeout after the configured 45s delay; why this happens in unclear (*), but sadly, it does happen, meaning there is no possibility, for now, to always get a genuine server-side timeout in the agent region (the one that matters), neither to prevent a race during the first ”silent retry” by libcurl... (*) I would need to delve into libcurl's code and/or instrument it, but I saw cases (thus ”in some regions”) when there was no silent retries by libcurl and I did get proper server-side timeout after 30s, meaning there might be a way to fix this issue server-side, since it looks like it depends on some server(s) configuration (at Apache level, perhaps... Are all your Apache servers configured the same ?)... I already determined that a duplicate TeleportFinish message could possibly cause a failed TP in existing viewers code, because there is no guard in process_teleport_finish() for TeleportFinish received after the TP state machine moved to another state than TELEPORT_MOVING, and process_teleport_finish() is the function responsible for setting that state machine to TELEPORT_MOVING... So, if the second TeleportFinish message (which is sent by the departure sim) is received after the AgentMovementComplete message (which is sent by the arrival sim which was itself connected on the first TeleportFinish message occurrence and sets TELEPORT_START_ARRIVAL), you get a ”roll back” in the TP state machine form TELEPORT_START_ARRIVAL to TELEPORT_MOVING, which will cause a failure by the viewer to finish the TP process properly. So, basically, a procedure must be put in place so that viewers without future hardened/modified code will not get those duplicate event poll messages. My proposal is as follow: The server sends a first (normal) event poll reply with the message of interest (TeleportFinish in our example) and registers the ”id” of that poll reply for that message. The viewer should receive it and restart immediately a poll request with the ”id” in the ”ack” field; if it does not, or if the ”ack” field contains an older ”id”, the viewer probably missed the message, but the server cannot know for sure, because the poll request it receives might be one started just as it was sending TeleportFinish to the viewer (request timeout race condition case). To make sure, and when the ”ack” field for the new poll does not match the ”id” field of the TeleportFinish message it sent, the server can reply to the viewer new poll with an empty array of ”events”, registering the ”id” for that empty reply. If the viewer next poll still does not contain the ”id” field of the ”TeleportFinish” request but does contain the ”id” field of the empty poll, obviously, it did not get the first ”TeleportFinish” message, and it is safe for the server to resend it... EDIT: but the more I think about it, the more I am persuaded that the definitive solution to prevent race conditions is to suppress entirely the risk of poll request timeouts anywhere in the chain (sim server, Apache, libcurl, viewer). This would ”simply” entail implementing the proposal made above. By ensuring a dummy/empty message is sent before any timeout would occur, we ensure there is no race at all, since the HTTP connection closing initiative is then exclusively the fact of the sim server (via the reply to the current event poll request, be it by a ”normal” message or by a ”dummy” message when there is nothing to do but preventing a timeout), and the poll request HTTP connection initiation only happens at the viewer code level.
  23. Cool VL Viewer releases (v1.30.2.28 and v1.31.0.6) published, with my new LLEventPoll code and experimental race condition (partial) workaround for TP failures. The new goodies work as follow: LLEventPoll was made robust against 499 and 500 errors often seen in SL when letting the server time out on its side (which is not the case with LL's current code since libcurl retries long enough and times out by itself). 502 errors (that were already accepted for Open Sim) are now also treated as ”normal” timeouts for SL. It will also retry 404 errors (instead of committing suicide) when they happen for the Agent's sim (the Agent sim should never be disconnected spuriously, or at least not after many retries). LLEventPoll now sets HTTP retries to 0 and a viewer-side timeout of 25 seconds by default for SL. This can be changed via the ”EventPollTimeoutForSL” debug setting, which new value would be taken into account on next start of an event poll. LLEventPoll got its debug message made very explicit (with human-readable sim names, detailed HTTP error dump, etc). You can toggle the ”EventPoll” debug tag (from ”Advanced” -> ”Consoles” -> ”Debug tags”) at any time to see them logged. LLEventPoll now uses an LLTimer to measure the poll requests age. The timer is started/reset just before a new request is posted. Two methods have been added to get the event poll age (getPollAge() in seconds) and a boolean (isPollInFlight()) which is true when a poll request is waiting for server events and its age is within the ”safe” window (i.e. when it is believed to be old enough for the server to have received it and not too close from the timeout). The ”safe window” is determined by the viewer-side timeout and a new ”EventPollAgeWindowMargin” debug setting: when the poll request age is larger than that margin and smaller than the timeout minus this margin, the poll is considered ”safe enough” for a TP request to be sent to the server without risking a race condition. Note that, for the ”minimum” age side of the safe window, EventPollAgeWindowMargin is automatically adjusted down if needed for each LLEventPoll instance (by measuring the minimum time taken for the server to reply a request) and the frame time is also taken into account (else you could end up never being able to TP, when the events rate equals the frame rate or is smaller than EventPollAgeWindowMargin). The age of the agent region event poll can be displayed in the bottom right corner of the viewer window via the ”Advanced” -> ”HUD info” -> ”Show poll request age” toggle: the time (in seconds) gets a ”*” appended whenever the poll request age is outside the ”safe window”. An experimental TP race workaround has been implemented (off by default), which can be toggled via the new ”TPRaceWorkAround” debug setting. It works by checking isPollInFlight() whenever a TP request is made, and if not in the safe window, it ”queues” the request until isPollInFlight() returns true, at which point the corresponding TP request UDP message is sent to the server. To debug TPs and log their progress, use the ”Teleport” debug tag.
×
×
  • Create New...