Jump to content

JoyofRLC Acker

Resident
  • Posts

    176
  • Joined

  • Last visited

Posts posted by JoyofRLC Acker

  1. 40 minutes ago, KT Kingsley said:

    An RC (Release Candidate) channel is a set of regions where the latest server versions are tested in-world before being promoted to the main channel. I don't know why they're rolled out on different days. Perhaps to ease the strain on both the hardware and the wetware.

    Wetware = not a bot?

  2. Is it my imagination or have the restarts been starting later the last few Tuesdays (and certainly finishing later)?

    While restarts have obvious benefits, the process is quite disruptive for any activity that traverses multiple sims on set routes (as opposed to just lurking in one sim contemplating one's navel, or , better, someone else's * cough).   Sailing, flying, driving etc etc etc.

    We have had a long standing event in Blake Sea at 11 am on Tuesdays (for over a decade) - on normal schedule restarts are finished and all is good, but when they are late, its almost impossible.  Effectively Blake Sea is 'down' for several hours.

  3. This is really interesting - just discovered the thread.   Do keep up the pressure on LL. 

    Im not technical so cant contribute to the analysis.   But I would like to add my observations about SL over the last two or three weeks, form the perspective of sailing (racing).  Primarily in Blake Sea.  So we are talking about vehicles with quite "large" and slightly complex scripts. 

    First the TP / sim border issue - that definitely got better; but seems to have deteriorated again somewhat but not to the depths of the worst days a few months ago.  I cant speak for TP much as I dont use it a lot, other than under script control, but the rate of 'falling off' the boat at sim borders is definitely up again.   Any one know why that might be?

    Second, we have been seeing increased 'lag' generally, as evidenced by sluggish response to controls.   I also have a sense that the sim border processing delay has increased but I dont know how to measure that objectively (I just see it as my viewer is set to 'stop' rather than 'predict' for sim borders).

  4. On 6/18/2019 at 12:21 PM, Profaitchikenz Haiku said:

    I'm not going to be able to make today's Server User Group meeting, I have a favour to ask: could somebody notecard me the salient points regarding script performance? I have a couple of Flamian Pobblebeads I can re-imburse them with to cover expenses.

     

    I was also going to ask a question about the priorities given to various duties the server performs following a recent observation, if anybody else here has seen the same and cares to raise the matter:

    During a recent chat with an avatar who had a 332,000 complexity value, bumping up the slider to see them resulted in awful chat lag, and the observation that each individual character typed into the local chat bar gets sent from the client to the server and returned before it becomes visible in the field, which made trying to correct a typo a few letters back from the insertion point next to impossible.

    Possibly relevant to the scripts issue, this problem was noticed on the parcel that currently has <50% scripts run time, and made me think that some of the dialog delays being experienced there might be due as much to character transmission delays as to script delays.  is it possible to alter the priorities given to server network traffic to balance between seeing the pixels and seeing the characters other than by dropping the avatar complexity setting?

     

    I'm no expert, but have been around a while.  I have always thought that there was something pretty wonky about chat text handling.  One of my first realisations this place had a pretty broken* architecture was being told "hands off the keyboard" when sailing across a sim border (this was almost a decade ago).

    * broken isnt quite the right word as it implies a prior state of un-brokenness.  cough.

  5. 4 hours ago, animats said:

    We're going to find out on Saturday how well things are working. Drivers of SL is doing a drive/fly/boat run from the south end of Jeoghot to the north end of Sansara.

    That reminds me .... do we have any data on the extent to which the number of vehicles attempting a particular sim border crossing at the same time has an impact?   Either for disconnect or for agent detach problems?

  6. Theresa, I sort of follow where you are coming from, except that ... I think a lot of people have given up complaining as they consider it normal service.   I know that we certainly don't file tickets every time some one falls off a boat.  If we did, they'd have to buy new servers to manage the JIRA queues!   We tend to file tickets or JIRA bug for sims that are obviously broken, or maybe if we think things are especially bad, eg a couple of months ago.

    There's no doubt that individual set up (location, ISP, W/LAN, computer, SL settings) all play a part.  

    I'm certainly of the view that things are better (as of Friday, I haven't been in-world over the weekend) but certainly not fixed, which I would take to be as things were two months ago.  Specifically prior to that, getting logged out was, for me a rare event.  Falling off the boat, maybe once every third or fourth race.  Then during the bad spell it was Logout about every 10 - 30 sims of sailing, or just standing on my dock editing scripts whatever.  Now its about 50% of races for logout, and at least once per race for falling off so not as good as it used to be.  

    NO changes in my ISP (Broadband 11ms ping and 50+Mbps local; variable to Pgoenix of course) or network.  A few weeks before the trouble started I upgraded PC (Dell 8930 w 16 GB, NV GTX 1060).   So the any adverse changes from say 3 months (ignoring the peak disconnect time period) ago are down to SL.   Sorry for so many words but that is what I mean when I say things aren't fixed (but are better than during the peak recent problem).

    • Like 1
  7. I think a definition of "fix" also has to include clear description, with evidence or code analysis, of the root cause and the changes made to resolve it.

    I'm reminded of the first lectures I attended on structured programming (ok, Im dating myself) and the comment that most "testing" can only prove that a program is incorrect, and can never prove that it is correct.

    They have built something beyond their comprehension.

    My suspicion is that they have tinkered with the 'freeze' at sim borders.  For me its def a few seconds longer, but also its now a genuine freeze with motion actually stopped, ie the boats position at the end of the freeze is right where it was on the border, not skipped ahead into the next sim.

    Friday & Monday I had more or less clear sailing.  Wednesday & Thursday I was getting logged out again.  In fairness one session was where I was inadvertently racing with a DD of about 850m (I did wonder why the fan was on but didnt make the correct deduction alas).

    So, Id say better thanks to the band-aid, whatever it is exactly, but def NOT fixed.

  8. Yesterday and Friday Id say that sailing in Blake was a lot better, if not perfect.   However there is one thing that I think has changed on sim crossings.  (I have the viewer prediction / extrapolation turned off).    Previously if you froze on the border for more than say a couple of seconds, your vehicle would jump forward once visible motion resumed.  In other words you would would be roughly where you would have been without any extended freeze.   Now the freeze is consistently longer - 6 - 10 seconds Id say (vs 2 - 8 )  and the vehicle does NOT jump forward to where it should have been.  In other words the vehicle "really" stops.

    Is anyone else observing this sort of thing?

  9. Ok, now Im confused.   Could someone please classify what the various problems discussed here are.   Also please conform or correct my understanding:

    - vehicle sim crossings have always had a number of problems and the frequency has varied over time.   these are include (but theres probably more Im overlooking):
          submarining / flying
          falling off, agent and vehicle appear to be in different sims (not always adjacent) - this sometimes follows apparently normal travel through 1 or 2 sims but the vehicle is non-responsive
          cam angle /focus goes wonky
          getting logged out (rare until the last few weeks)

    - TP can 'stall' for a number of reasons; and that may follow with a disconnect - this was not too common until the last few weeks

    My understanding is that a disconnect during a vehicle crossing is thought to be strongly related to the TP disconnect problem.   For me the vehicle crossing problem was far more critical as I wasnt having any unusual problems with TP.

    I have also had it said to me, and not just recently, that the other region crossing problems are caused by 'bad server hand offs' which sounds an awful like what is being said specifically about the TP problem.

    LL has done a server release that will "REDUCE incidence of TP disconnects" (my emphasis).   They do not say what they have done, nor that the fix will ELIMINATE the disconnects.   They don't even say how much of a reduction there should be.

    The big concern obviously is that if they haven't gotten to the root cause, the band-aid will fall off at some point.

    Hopefully the next Server User Group will shed more light on the situation.   Personally I'm still nervous.

     

     

     

     

  10. 2 hours ago, animats said:

    Region crossings are not fixed. I tried going from Bellasaria to Jeoghot in a speedboat, which I've been able to do before. I got as far as crossing from Nudibranch to Pearl Drop, just west of Jeoghot. Those are all new sims of open water. Boat disappeared. Stuck at bottom of ocean. Controls stuck. Avatar health "Stop avatar animations and revoke permissions" didn't get control back. A short teleport on the world map did.

    I have a swim HUD, so I tried swimming to a nearby island. A Coast Guard boat and helicopter went by, and I shouted to them, but they didn't reply. I tried swimming after the boat, and at every region crossing, there was a 10 second pause (exactly 10 seconds) and then I was forced to the bottom of the sea. The swim HUD would bring me back up. Kept swimming after the Coast Guard boat, but never caught up. I'd see the Coast Guard boat in the distance once in a while.  They didn't seem to make it all the way to Bellasaria. They disappeared south of the new continent, and later showed up at the new airstrip in Coral Waters.

    Swam all the way back to the southern lighthouse of Bellasaria. That took half an hour. Rezzed a boat at the new rez zone. Once in the boat, everything worked fine, and I boated back to my house and docked the boat.

    We're still at about 1-2 failures per hour. Worse for multiple passengers.

     

    One or two per hour is a big improvement on last week.  Many of the people I race (and myself) with would get a disconnect after 15 - 30 sims.  

    Another data point ... we had two races today around noon SLT Blake (and China thru Baltic).  4 boats completed both course.  This is a first in many weeks. 

    Regarding the 10 second pause, this is interesting.   Today I was racing a boat that is not the best at sim crossings generally (it has other virtues and I was in the mind for a stress test).
    It always pauses for a variable 4 - 8 "seconds" (using 'one missippi' type counting on good sample of crossings )  on sim borders - today the delays were less variable.   And heres the odd thing, today I think the boat did actually stop - there was 'leap ahead' when the freeze ended (I have the viewer extrapolation turned off).

    My sense is (anecdotal, not hard data) is that something is making things better, maybe by masking the problem, but that its not totally fixed.

    One other thing that have noticed all along, and its probably just a symptom that when the avi and vehicle get separated (part of the warm up for a disconnect) there is no change event for the detach triggered in the scripts.

     

  11. 11 hours ago, Ardy Lay said:

    I can teleport around, without failures, fast enough to "hit the fence."  I teleported 28 times in 11 or 12 minutes, at somewhat erratic intervals due to numerous stale landmarks.  I might assume "the teleport issue" has been rectified.

     

    As of yesterday, in Blake Sea (all Main Channel) there were still significant issues.  Not clear if the situation was any better or not, but clearly not solved.

  12. 19 hours ago, MBeatrix said:

    It would be possible to test it more or less properly if the new code had been rolled to RC Magnum, at least for me, as there are quite some contiguous Magnum regions I cross daily.

    Indeed.  My point really was that LL uses what I call "testing in production", as opposed to a separate test system/environment.

    One of the many things about this situation that appal me is that LL was apparently unaware of the problem until some JIRAs were posted - and job 1 had to be to update the system to actually get log data on the problem.

  13. RC channels ... to the extent they are seamlessly attached to the main grid and are scattered as it were they are 'production' sims for all practical purposes.  In most places you cant sail on "just" RC channel or "just" main channel (Blake Sea aside).   Code released to production should be tested. 

    • Like 1
  14. 1 hour ago, Qie Niangao said:

    This is news to me. I've had TP disconnects either way, but not enough volume to say one way or another. I'm assuming that within-sim teleports are excluded in both cases; of course within-sim TPs are very often scripted, but they're probably not part of the distinction being made here. I don't think I've ever heard of anybody getting disconnected during a within-sim teleport but come to think of it, that does kinda reinforce the relationship between the current sim-crossing and teleport problems.

    I didn't mean to make any point at all about "reports"... I guess I shouldn't have used that word because I really meant all the data about the problem, whether reported by users, sim-monitoring code, statistics from normal operations, or anything else. I agree they almost certainly installed some code for collecting extra data on sim-crossings as well as teleports, but the teleport problems are more "all-or-nothing" in the sense that we (or I, at least) never got them and now we do, whereas I've gotten sim-crossing disconnections (yes, I mean disconnections) for as long as I've been in SL (although, to be fair, I have a crappy Canadian ISP, so others may not have as much trouble).

    As I was typing, I was reminded of another challenge in diagnosing race conditions: a few lines of debug code can completely change the triggering circumstances, sometimes completely masking the whole event, or shifting it unrecognizably. They really are the worst bugs.

    Sorry I mis-interpreted - thought you were making a distinction but you weren't really.   For the sake of clarity, my scripted TP are just about always into different sims (and as I say they are nearly always successful even now).  The connection (haha) between regions crossings and TP is that, according to what I was told, a region crossing in fact involves a TP, the agent and the vehicel cross separately and then reattach.  But no idea of the credibility of my sources.

    Until now, region crossing disconnects were VERY rare for me, unlike falling off and cam dislocations which were more common.   I wonder if somehow the latter problems have turned into the former.   ie the underlying cause was always there, but with some new code somewhere its being handled differently by the servers / viewers, and so a different symptom for what is in fact the same cause.

    Finally - Heisenberg is alive and well!

    • Like 1
  15. Surely they have data on sim crossing failures as well as the TP failures - its the same underlying issue I thought.  They dont have to rely on "reports" (but see below) for sim crossings they would have the same server log data.   I do not see why you say (well, imply) that a disconnect on a sim crossing is not a solid all or nothing event.  It is no less catastrophic than a disconnect on a TP.   To be clear, I am talking specifically about the horrendous number of DISCONNECTS on sim crossings that started about a month ago - not the other sim crossing problems that, as you say, have been around in varying degrees for a long time (but are also much increased currently) - like detachment, cam going kaflooey (sorry for the technical term) etc etc - and which may or may not be related.

    Regarding your reference to "reports" - I infer that you are implying they are less useful that the hard data.  I think both may be helpful, the reports may contain hints as to that else is going on, and may yield hints on where to look. 

    An example of the contextual stuff is the apparent fact that TP triggered by scripts are less likely to fail than ones initiated by user action.   (Some one else put me on to this which explains something that had puzzled me ... for myself Ive been having very few TP problems at all the last month, but 90% of TPS are initiated by my Nav HUD) whereas I have been getting very frequent disconnects when sailing; and even just standing on my dock chatting or editing scripts.

  16. Can people please confirm that the viewer is not a factor.  Almost everything Ive read says this is strictly server side, and the LL notice on this says all viewers are affected.

    However a sailing friend seems pretty convinced that the SL viewer is more robust in this regard (sailing / sim crossing specifically) than Firestorm.   Other than possibly different thresholds for time-out disconnects, I cant see the viewer making a difference if it IS strictly server side.

    I'm probably being naive but I find it a bit worrying that it is taking the 'crack team' so long to diagnose the problem.  (Yes, the fix can be lengthy depending on the problem, but Im surprised that diagnosis is proving so difficult).  It suggests to me that there may not be a single cause (other than the basic architectural design and the inherent byzantine nature of region crossings).

    It does also raise the question of how they test system releases.

     

  17. Any info on that they released today (Mon 15th).  It seemed like it was all channels but the notice was not specific, and nothing on the purpose.

    There was not any obvious dramatic improvent at our race today.

     

    PS:  DO we need two categeories, "Regularly Scheduled Maintenance" and "Unscheduled Scheduled Maintenance" or "Suddenly Scheduled Maintenance"  sort of like "unknown unknowns "which is what the TP problem seems to be lol.

  18. On 4/5/2019 at 3:27 PM, Theresa Tennyson said:

    Homeowner (to Contractor): My basement keeps leaking - you come out to fix it and it starts leaking again a few months later!

    Contractor: Your foundation walls are shot - you should really consider getting a new house. They put these houses up really quickly back then and they all have issues.

    Homeowner: But I love my house!

    Contractor: Well, we might be able to jack up the entire house and build new foundation walls. It'd be really expensive and you'd have to move out for months.

    Homeowner: But I have to live in this house, I don't have anywhere else to go!

    Contractor: Well.... we might.... be able to replace the walls piece by piece, but it would still be expensive and there's no guarantee it wouldn't start leaking through the joints we'd need to put between the sections of wall.

    Homeowner: Well, how much would that cost?

    Contractor: (Writes up a quote.)

    Homeowner: That long? That much? I can't afford that!

    Contractor: Well, in that case all I can do is try to fix the leaks again.

    Homeowner (a few months later): My basement keeps leaking - you come out to fix it and it starts leaking again a few months later!

    __________

    The very idea of region simulators/region crossings as used in Second Life is fundamentally flawed, but it's what Second Life is built on. Ultimately, anything you do with Second Life will either be "fixing the leaks" or making such radical changes that it will become unrecognizable and risk destroying 16 years of history.

    Its not fundamentally flawed if you are building a shopping mall / nightclub, which is what I suspect the original idea was.  This architecture is bonkers for anything like flying or sailing etc.  256m "regions"  - come on!

    • Like 1
    • Haha 1
  19. On 4/12/2019 at 12:01 PM, Whirly Fizzle said:

    Not really. Most of the scary sounding warnings & errors in logs are totally normal though.

    Which does rather beg the question!   (Think about it - erorrs are normal....)

    I do wonder about the resources involved in all those errors.   Not so much the writing of the error log (but that builds up - my session legs get to about 3 MB which is a LOT oftext) ... as the processing involved in detecting the error and taking whatever exception code etc.

  20. Speaking of Firestorm logs ...

    Yes I see a TON of warning messages (that look more like errors to me!).   It's absolutely staggering the volume of them.   An hour of sailing generates a log file over 1.5 MB (last night , 3.7 MB) which is a LOT of text msgs.   An awful lot seem to relate to http issues of one sort or another.  Packets our of sync, 404 errors, bad asset type, yadd yadda yadda.
    cef_log also has stuff like
    [0412/013705.830:ERROR:cache_util.cc(134)] Unable to move cache folder C:\Users\...\AppData\Local\Firestorm_x64\cef_cache\Cache to C:\Users\...\AppData\Local\Firestorm_x64\cef_cache\old_Cache_000
    [0412/013705.830:ERROR:disk_cache.cc(169)] Unable to create cache

    Is there any sort of info anywhere on which messages actually matter?

  21. More anecdotal stuff.  This relates to sailing (sim borders have same issues as TPs). 

    My sense is (based on my own experience) that its worse for race fleets than solo sailing.  That is not inconsistent with a latency issue.  

    Also that its much worse in the central part of Blake than either the 'Aegean Channel' (China * heading east) or round Nautilus.  Would be good to see the stats on that.

    Also, while this is strirctly speaking a different issue, Iv noticed an increased number of "falling off" problems (Avi "separates" from boat at sim crossing).   This is an old problem but until last couple of weeks I wasnt much affected by it.   Interestingly the detachment does NOT trigger a changed event in the boat.

  22. Thanks for the clarification, I had read somewhere that the move to cloud servers was under way.

    I find the log files almost impossible to read but the blizzard of timeout  & not found warnings cant be good.  Its at the point where just writing all the error messages must take a fair bit of resource!!

    • Haha 1
  23. On 4/6/2019 at 5:16 AM, MBeatrix said:

    You have a good point there. Maybe.   ........

    ....

    [EDIT] Progressively moving SL to the cloud can be the opportunity to finally replace obsolete stuff by what may work better than more or less OK half of the time. See? I'm hopeful. Hopeful but not really believing it. 

    I would not bet against the move to cloud servers being part of the problem.  Its hard to see how it could improve latency in server handoffs.   (sorry I cant find the irony font).   iirc at one point they got some improvement in Blake Sea by co-locating the Blake servers.  I guess thats out the window now.

    • Like 1
    • Haha 1
  24. On 4/4/2019 at 10:51 PM, animats said:

    "Don't half-heartedly wound problems - kill them dead." - Kelly Johnson, head of the Lockheed Skunk Works, designer of the U-2 and SR-71 spyplanes. What he meant by that is, if it's not working, don't go for a fix that makes it fail less often. Redesign it so it doesn't fail.

    The Linden Lab approach to region crossing problems has been to make some minor change to timing or message order that makes the problem behave slightly differently, then ignore it until users complain again. This approach has repeatedly failed. For years. This is a track record of sustained failure.

    Linden Lab needs to build enough tools and develop enough understanding that all region crossing problems are caught, logged, analyzed, and a permanent fix that always works developed. Nothing short of a sim crash or total loss of communications with the viewer should cause a sim crossing failure.

    The ongoing failure of Linden Lab to fix this problem properly devalues the whole big-world concept of Second Life. We cannot have routine group activities that span regions. Thousands of customers have boats, aircraft, and road vehicles which can seat multiple avatars. Yet we cannot routinely get a group of people together and take a trip without the risk of something failing badly and breaking immersion.  So most of those big yachts stay docked. Group tours are rare. Merely driving around is difficult. Users retreat to private islands or their own little spaces to avoid the risks of venturing out in the big world.

    We, as users, need to keep the pressure on Linden Lab and its management to fix the problem. This needs to be brought up by many people whenever Linden Lab CEO, Ebbe Altberg, speaks in public. Fixing this problem properly may require substantial resources. That's why it needs to be addressed at the CEO level.

    Here is Mr. Altberg's Twitter feed.

    Remember that what they built was a basically a shopping mall.  256 sq m sims and badly kludged logic for moving vehicles across was clearly never meant to be a full world simulation.  Its an architecture issue as far as I can see.

    • Like 1
    • Haha 1
×
×
  • Create New...