Jump to content

Henri Beauchamp

Resident
  • Posts

    1,205
  • Joined

Everything posted by Henri Beauchamp

  1. I'm more under the impression you are seeking for just one favourable testimony to use it as an excuse to follow your personal feeling/belief that an AMD card would be better suited for you... Just go ahead, and buy whatever suits your own needs/preferences, and take your responsibilities. Just don't come back here to complain ”we” gave you a bad advice, should you find out you committed a mistake. 😜 As for the graphics cards prices, it might be wiser/smarter to wait a little bit: NVIDIA's cards are already seeing an adjustment of their prices as a result of AMD's newest cards releases (competition is a Good Thing ™) . It will take some time to propagate to France (but you could just as well buy a card from a more reactive German supplier), but prices are going to drop a bit in the coming weeks. The second half of October is usually a good moment to buy computer hardware (long enough after people's return from Summer vacations, soon enough before Christmas). There is also the option to wait for a sale/opportunity on the former cards generation (even a RTX 3070 is plenty powerful enough for SLing).
  2. These are very, very low settings for a Core-i9 and a RTX 3080, even mobile ones... You could easily push the graphics settings between High and Ultra, and yet keep your PC running cool enough by using the frame limiter at 60fps...
  3. Not for the full session, no, but typically up to two minutes or so after the ”departure” (actually after the region is out of draw distance): the event poll is terminated when it time outs after the departure, then the region itself is removed from memory, within one minute, the last thing to go being the UDP data circuit, about two minutes after departure. Of course, should the avatar go back to the region, things still around may be reused...
  4. The viewer will use the highest possible Open GL version available from your drivers. The higher the better (better optimized, faster, more features). The viewer currently does not need features specifically introduced in latest OpenGL versions (this might change ”soon”: the PBR viewer already needs OpenGL v3.2 features), but drivers with v4.6 and a good core profile implementation (the core profile gets rid of old OpenGL versions cruft) are much faster. NVIDIA proprietary drivers in OpenGL v4.6 core profile are typically +50% to +100% faster than in compatibility profile, something that is not seen happening with AMD proprietary drivers.
  5. I'd say I am not surprised, given the absence of retries from the server part, when TeleportFinish is not received by viewers... Count me in among the ”guinea pigs”. 😜 I'll gladly help testing a fix (or several) for this years-long TP bug that is plaguing SL.
  6. Yes, I tried to setup a larger timeout than the SL's servers, and to take into account the bogus ”502 in disguise” I get, considering them simple poll timeouts. I then observe a strange thing, that can only be explained by a libcurl weird internal working: the timeouts occur after 61.25 seconds or so (instead of 30.25 seconds or so, which would correspond to the server timeout plus the ping time), and I do see in Wireshark libcurl retrying the connection once on first server timeout (i.e. after ~30s) instead of passing the latter to the application, like instructed to do (setRetries(0)) !... Maybe it is due to that ”502 in disguise” issue (libcurl won't recognize a ”genuine” timeout and retry once ?)... So in the end, the only way for me to see a genuine timeout occurring is to set the viewer-side timeout below the server one... Also, everyone, you all can stop holding your breath: after stress-testing it (and despite more refinements brought to its code), my workaround does not prove robust enough, and I can still see TP failures happening sometimes (rarer than without it, but still happening nonetheless)... I will publish it in next Cool VL Viewer releases (with debugs settings for a kill switch and several knobs to play with, and that handy poll request age debug display for easy repros of TP failures), but it is not a solution to TP failures, sadly. 😢 So, we will have to wait for Monty to fix the server side of things... 😛
  7. The problem is that you do not get that when logged to SL: you get a 499 or 500 error header (and ”502 error” printed in body). Meaning, somehow, the 502 error gets mutated into another, and is then not recognized as such by the viewers. Thus why you cannot let the server time out when connected to SL (everything working as expected when connected to OpenSim, where I do let the server time out in my code).
  8. Success ! I managed to: Reproduce reliably TP failures modes related to event poll requests expiration and restart delays (race condition with the servers). Find and implement a robust work around for those. The problem seen is indeed due to how a TP request by the user can be sent to the server while the poll request is about to timeout, or was just closed and is being restarted as the result of an event poll message receival. If the server queues the TeleportFinish message (or any message, but this one is unique and supposed to be 100% reliable, unlike ParcelProperties & Co) while the viewer is in the process of restarting a poll request, somehow that message will never be received. To confirm this, I use a LLTimer which is reset just before I post (and yield) the request in LLEventPoll. I also use a 25s timeout and no libcurl-level retries for those requests, so that they always timeout on the viewer side and that the said timeout is always seen happening by the LLEventPoll code. I also implemented a debug display for that timer in the viewer window, so that I can easily manually trigger a TP just before or just after the event poll request has expired or started; doing so, I can reliably reproduce the TP failures that so far seemed to happen ”randomly”. As for the workaround, it is implemented in the form of a TP queuing and event poll timer window checking; whenever a TP request is done 500ms or less before the agent region poll request would time out or has been restarted, the TP is queued (via a new LLAgent::TELEPORT_QUEUED state, which allows to use the existing state machine implemented in llagent.cpp and llviewerdisplay.cpp), and the corresponding UDP message (either TeleportLocationRequest, TeleportLandmarkRequest or TeleportLureRequest) requesting the TP to the server is put on hold until the event poll request timer is again in the stable/established connection window, at which point the TP request message is sent. So far (stress-testing still in progress), it works wonders and I do not experience failed TPs any more. If everything runs as expected and I am satisfied with the stress-testing, this code will be implemented in the next releases of the Cool VL Viewer.
  9. EEEK ! Don't do that: viewers would see those ugly ”502 in disguise” errors, which would be considered as poll request failures in the current viewers' code, and only retried a limited amount of times ! With the current viewer code and in SL (*), the poll request timeout must occur on the viewer side (yes, even though it is ”transparently” retried on libcurl level: the important point is that the fake 502 error is not seen by the viewer code). If anything, increasing the server side timeout from 30s to 65s or so (so that a ”ParcelProperties” message would make it through before each request would timeout), would reduce the opportunities for race conditions. (*) For OpenSim-compatible viewers, a (true) 502 error test is added, which is considered a timeout and retried like for a viewer-side libcurl timeout, but this test is only performed while connected to OpenSim servers, which do not lie on 502 errors by disguising them as 499 or 500 ones in their header. Pretty please, make it so that these changes remain backward-compatible... One possible such change would be as follow: Currently, viewers acknowledge the previous poll event ”id” on restarting a request, by setting the ”ack” request field equal to the previous result ”id”. It means that, for TeleportFinish, the server would normally see the ”id” used to transmit it on its side coming back immediately in the ”ack” field of the request following its receival by the viewer. If the server does not get it (because it does not get a new request posted by the viewer), then the TeleportFinish was not received and should be resent. To be 100% sure that the request is not just in flight or delayed, the server could send two different commands in a row on TP: TeleportFinish first, then, for example the ParcelProperties, in a different message (different Id): then if no ”ack” for TeleportFinish has been received, re-issue it.
  10. The LLAppCoreHttp::AP_LONG_POLL policy group does not define the retry attempts, at least not in my viewer... But explicitly setting mHttpOptions->setRetries(0) causes ”502” errors in disguise (502 body, 500 or 499 header) to happen... However, setting mHttpOptions->setTransferTimeout(25) (25s timeouts, i.e. below the server timeout) with mHttpOptions->setRetries(0) seems to work just fine: libcurl then timeouts after 25s and the viewer fires a new poll, as expected (an no trace of retries in Wireshark)... This would eliminate a possible cause for a race condition. And I got an idea to avoid TP failures that would possibly be the result of a race between a received event processing, the triggering of a TP by the user just at that moment, the firing of a new poll, and the TeleportFinish transmission. I'll try to set an ”in flight” flag on starting the poll request, reset it on request return, and on TP test that flag: when not ”in flight”, yield to coroutines until the coroutine for the event poll can fire a new request (setting the flag); the TP would then be fired while the poll request is ”stable” and waiting for a server transmission.
  11. Yup, you are right... Can see this with Wireshark. The retry is likely done at libcurl level... More race condition opportunities ! 😢 Which only advocates for a return of reliable message events such as TeleportFinish to the ”reliable UDP” path provided by the viewer...
  12. In fact, I could verify today that this scenario cannot happen at all in SL. I instrumented my viewer code with better DEBUG messages and a timer for event poll requests. Under normal conditions (no network issue, sim server running normally), event polls never timeout in the agent region before an event comes in. Even in an empty sim, without any neighbour regions, the ParcelProperties message is always transmitted every 60 seconds (and for an agent region with neighbours within draw distance, you also get EnableSimulator for each neighbour every minute). Timeouts only occur for neighbour regions, when nothing happens in the latter, and after 293.8 seconds only. So, when a user requests a TP, the agent region will not risk seeing the poll request timing out just at the moment TeleportFinish arrives, causing a race condition in the HTTP connection tear down sequence, like you described. However, what would happen if, say, a ParcelProperties message (or any other event in the agent region) arrives milliseconds before the user triggers a TP request ?... The poll request N finishes with ParcelProperties, the TP request fires, and what if TeleportFinish is sent by the server just before the viewer can initiate poll request N+1 (reminder: llcore uses a thread for HTTP requests) ?... Maybe a race condition could happen here (depending on how events are queued server side, and how the Apache delays in connection building and tear down could lag/race things; this might explain why TeleportFinish is sometimes queued but never sent, maybe ?)... In any case, I would suggest reconsidering the way TeleportFinish is sent to viewers: what about restoring the old UDP reliable path for it ?... Or implementing a message for viewers to re-request it, when they did not get it ”in time”...
  13. Yes, this might indeed happen... I will have to try and log one such scenario (got nice DEBUG level messages for event polls and, now, server/viewer messaging)... The problem here, is that we do not have a way for the viewer to acknowledge the server TeleportFinish message... The latter used to be an UDP ”reliable” message (with its own private handler), but got UDPDeprecated then UDPBlacklisted in favour of the event poll queue/processing... It was not the wisest move... A possible workaround would be to allow the viewer to (re)request TeleportFinish; in this case, a simple short (5 seconds or so) timeout could be implemented viewer side after a TP has started, and if TeleportFinish has not been received when it expires, then it would re-request it... EDIT: I'm also seeing a possible viewer-side workaround for such cases via the implementation of a ”teleport window” timer. That timer would be reset each time the viewer starts an event poll request for the agent region: when the user asks for a TP, the timer would be checked and if less than, say, 2 seconds are left before the timeout would fire (since it is set viewer side, at least in my code, for SL, it is known), the TP request would be delayed till the next poll is started...
  14. All the TP failure modes I get happen before event polls are even started: the UDP message from the arrival sim just never gets to the viewer, so the latter is ”left in the blue” and ends up timing out on the departure sim as well..
  15. Do have a look at the comments I added in linden/indra/newview/lleventpoll.cpp, in the Cool VL Viewer sources, for the various modifications I implemented to deal with both SL and OpenSim idiosyncrasies... In particular: LLAppCoreHttp& app_core_http = gAppViewerp->getAppCoreHttp(); // NOTE: be sure to use this policy, or to set the timeout to what it used // to be before changing it; using too large a viewer-side timeout would // cause to receive bogus timeout responses from the server (especially in // SL, where 502 replies may come in disguise of 499 or 500 HTTP errors)... // HB mHttpPolicy = app_core_http.getPolicy(LLAppCoreHttp::AP_LONG_POLL); if (!gIsInSecondLife) { // In OpenSim, wait for the server to timeout on us (will report a 502 // error), while in SL, we now timeout viewer-side (in libcurl) before // the server would send us a bogus HTTP error (502 error report HTML // page disguised with a 499 or 500 error code in the header) on its // own timeout... HB mHttpOptions->setTransferTimeout(90); mHttpOptions->setRetries(0); } Yes, it is indeed as bad as it looks... This said, my modified code performs just fine in both SL and OpenSim now, and the failed TP issues still seen are not related with event polls anyway (event polls are simply retried on timeouts).
  16. Il semble en effet qu'après un certain temps ((c)1955 Fernand Raynaud 😛) l'option d'édition disparaisse... 😢 Autre suggestion: ajoutez un message à ce fil de discussion en en faisant un court résumé en Anglais, avec la question ”mise à jour”.
  17. Dans le menu obtenu en cliquant sur les trois points gris, en haut et à droite de chacun de vos messages.
  18. Pas forcément; vous pourriez par exemple éditer le titre de votre premier message pour le mettre en Anglais, et le texte du même message pour y ajouter une version traduite.
  19. Un tableau tiré d'un test de performances qui n'a rien à voir avec SL et qui ne dit rien des conditions de test. Par exemple, quel était le mode de fonctionnement des pilotes ? Profil de compatibilité (compatibility profile: i.e. un profil avec support des commandes Open GL dépréciées/caduques) ou profil strict (core profile) ?... Avec les pilotes NVIDIA, on observe, dans le viewer, +50 à +100% (en fonction de la scène rendue) de performances en mode core profile, au contraire des pilotes d'AMD où les perfs sont quasi-identiques. Donc, si le test a été réalisé en profil de compatibilité, les résultats pour NVIDIA apparaissent moins bons qu'il ne pourraient être en comparaison avec AMD... Un autre point est l'utilisation des profils partagés (shared GL profiles), dont, là encore, NVIDIA profite mieux qu'AMD; un problème de synchronisation de la queue de commandes Open GL, qui doit être faite dans le fil d'exécution (thread) principal avec les pilotes AMD, alors qu'elle peut avoir lieu dans les fils secondaires pour NVIDIA, évitant des ”hoquets” dans le taux d'images par seconde. De plus le test fait référence à Open GL v4.5, alors que la version qui compte vraiment est la dernière, i.e. v4.6; on peut donc se poser des questions sur l'étendue des fonctions testées, en particulier dans les nuanceurs (shaders)... Par ailleurs, les performances ne sont qu'un aspect des choses. La robustesse (absence de plantages avec les pilotes NVIDIA, là où AMD se vautre littéralement), et le respect du standard Open GL (*) en sont deux autres, que NVIDIA gagne haut la main. (*) Il y a dans le code des viewers des contournements de bogues pour les pilotes AMD (et Intel, d'ailleurs), dont les pilotes NVIDIA n'ont pas besoin grâce à leur strict respect de la spécification Open GL. C'est pour cela que le lien que j'ai choisi dans mon précédent message pointe vers le témoignage d'un utilisateur qui a essayé AMD, a été déçu, et a finalement retourné la carte pour prendre une NVIDIA qui elle, lui a donné satisfaction... Notez que je n'ai rien contre AMD (mon dernier PC utilise même un Ryzen 7900X qui est un super CPU, dont je suis très satisfait et que je ne peux que chaudement recommander). Simplement, je me base sur mon expérience passée (certes ancienne) et sur le retour des utilisateurs des ”viewers” (le mien, et les autres), qui concordent parfaitement. Vous auriez plus de chance d'obtenir une réponse en posant votre question en Anglais...
  20. Une recherche sur le forum vous permettra de recueillir d'autres témoignages déjà exprimés. Par exemple celui-ci. A noter que l'aspect ”bogue” des pilotes est très important; en tant qu'auteur du Cool VL Viewer, j'ai eu le retour de plusieurs utilisateurs de cartes AMD rencontrant des plantages dont les traces (crash dumps) pointaient en plein dans le code des pilotes d'AMD...
  21. Les pilotes AMD Open GL sont toujours très inférieurs à ceux de NVIDIA. Sous Linux, avec Mesa, les différences sont moins visibles, mais sous Windows, il n'y a pas de match: NVIDIA bât AMD (à coût de carte et génération de GPU égaux) haut la main ! Comptez +30% de performances avec NVIDIA comparé à AMD. En plus, les pilotes ”améliorés” d'AMD (Adrenalin), qui ont permis de rattraper une partie du retard en terme de performances, sont bourrés de bogues entraînant des plantages, ce que l'on ne voit pas avec les pilotes NVIDIA... Pour Open GL et Vulkan, il n'y a pas à hésiter: NVIDIA est le choix logique et évident.
  22. You might be interested in the Cool VL Viewer v1.30.2.27 (or experimental branch v1.31.0.5) I released today: I revamped the messaging logging so that the DEBUG level ”Messaging” tag logs all the messages (with the exception of the super-spammy and pretty irrelevant ”PacketAck” one) exchanged between the viewer and the server. You may toggle the ”Messaging” debug tag when needed (before or after login/TP/region change/draw distance change), from ”Advanced” -> ”Debug tags” in the login screen menu or ”Advanced” -> ”Consoles” -> ”Debug tags” in the main (post-login) menu. Also, I implemented optional threaded object cache reads (which toggle is ”Advanced” -> ”Cache” -> ”Threaded object cache reads”), meaning the viewer won't block to read object cache file(s) after a ”RegionHandshake” message from the server and will keep processing other messages, replying with ”RegionHandshakeReply” only after the object cache file(s) (the 's' is for the PBR viewer branch) has(have) been read: this enlightens some interesting ”aspect” (bug ?) of the server messaging algorithm. In particular, you will see that the server sends two (!) ”RegionHandshake” messages without even waiting for a first ”RegionHandshakeReply” or some timeout, excepted on login (where only one such message is sent by the login region server, unsurprisingly and as should ”normally” be the case)...
  23. Try the Cool VL Viewer... Or, if the legacy Windlight renderer works faster for you, and you are under Windows, try Genesis (a fork of Singularity).
  24. The ”RegionHandshake” message is likely what you missed: it should be received by your viewer after connecting to a region, and your viewer should reply to it via ”RegionHandshakeReply”. I do not see those messages in the diagrams above... In the C++ viewers, the handler for ”RegionHandshake” is set in llstartup.cpp (like other handlers). That handler is implemented in LLWorld::processRegionHandshake() and calls LLViewerRegion::sendHandshakeReply() for the corresponding region, the latter itself sending the ”RegionHandshakeReply” message back to the simulator server, after it got the objects cache loaded for the region.
×
×
  • Create New...