Jump to content

requesting region capabilities


jIrisluna
 Share

You are about to reply to a thread that has been inactive for 281 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

Hello.
Since december 15 i have  log-in problem.(after maintenance)
Log-in stuck at "requesting region capabilities 2"
I tried different sims and always same.
Even if i manage log-in can't teleport or once.
Sometimes when i wear hair/ clothes is invisibly.  Only after restart can see. Also sometimes i can"t take off anything from my avi even after restart.

Never had this problem before. 
I tried re-install game,clear cash, delete antivirus etc.
My net connection is fine.
Anyone have same issue? idk what to do. Please helpTT

I'm sorry my english very bad.

p.s: curently my avi become red cloud and my friends can"t see me. I can't even change to default/beginner sl avi.,
 

Edited by jIrisluna
Link to comment
Share on other sites

12 hours ago, jIrisluna said:

Hello.
Since december 15 i have  log-in problem.(after maintenance)
Log-in stuck at "requesting region capabilities 2"
I tried different sims and always same.
Even if i manage log-in can't teleport or once.
Sometimes when i wear hair/ clothes is invisibly.  Only after restart can see. Also sometimes i can"t take off anything from my avi even after restart.

Never had this problem before. 
I tried re-install game,clear cash, delete antivirus etc.
My net connection is fine.
Anyone have same issue? idk what to do. Please helpTT

I'm sorry my english very bad.

p.s: curently my avi become red cloud and my friends can"t see me. I can't even change to default/beginner sl avi.,
 

 

6 hours ago, AvaTrinity said:

Actually - I am having the same issue!  I can log into another AVI, just not this one.  It gets hung up on that screen!  Looking for help!

Have both of you contacted support?

https://support.secondlife.com/

  • Thanks 3
Link to comment
Share on other sites

I got the same issue in the past, just after the AWS ”uplift”... It ”resolved by itself” (read: AWS fixed their network/routing) after a few days...

And for the past three weeks, I am seeing in the viewer logs another thing I saw back in that time as well, namely the ”event polls” failure warnings for some (not all) neighbouring regions...

It does not bode well...

  • Like 1
Link to comment
Share on other sites

  • Lindens
On 12/18/2021 at 5:14 PM, Henri Beauchamp said:

And for the past three weeks, I am seeing in the viewer logs another thing I saw back in that time as well, namely the ”event polls” failure warnings for some (not all) neighbouring regions...

That is not necessarily bad.  These are long-poll-style requests on a 30-second timeout.  Several entities are involved in the timeout so the winner can vary.  You might get a 499 or a 503 or maybe something else.  If you see a 30-second periodicity to a target, that is usually a quiet connection working as expected.

Link to comment
Share on other sites

12 hours ago, Monty Linden said:

That is not necessarily bad.  These are long-poll-style requests on a 30-second timeout.  Several entities are involved in the timeout so the winner can vary.  You might get a 499 or a 503 or maybe something else.  If you see a 30-second periodicity to a target, that is usually a quiet connection working as expected.

Not quite... I sent a note card to you (and Rider) in SL as a follow-up to my initial report but apparently you did not get it or read it. So, here is its contents:

Quote

Greetings,

This is a follow-up to my report about event polls issues in neighbouring sims.
I found the cause of those errors I am seeing.

Event polls time out when no event occurs and the HTTP request initiated by the viewer is therefore not replied in time. Before the usage of coroutines in the viewer, the old HTTP request code used to let the server timeout and simply relaunched a request after receiving the 502 HTTP error from the server.

After the migration to the coroutine HTTP stack, the timeout has been occurring viewer side (because the default timeout parameter used for libcurl was simply changed for a shorter one with this new code), and so the poll event code did not any more look for a server timeout before retrying, but for a libcurl timeout error.

When I backported the code to my viewer, in excess of the libcurl timeout check, I kept the old (longer) timeout and the check for 502 errors, so to stay compatible with OpenSim grids, and with the rationale that a longer timeout means less requests sent to the server, less ports opened and closed on it per time unit, and thus a lighter load. It also covered the corner case when an event would occur just half a second before the viewer-side timeout would kick in and would therefore be missed by the latter before it would have a chance to send the next request...

This worked just fine until July 2019, after a rolling restart (I can be accurate about this date, because I added a comment in my viewer code), at which point SL servers started to send 502 HTML error pages ”in disguise”, i.e. with a 499 HTTP error code in the header. I adapted my code to take this change into account (simply considering that both error code 502 and 499 meant a server-side timeout). And guess what: it happened again. For the past few weeks, some sim servers in SL are sending 502 HTML error pages ”in disguise”, but this time with a 500 HTTP error code in the header !

Here is a capture I made today with event poll and core HTTP debug messages enabled:
-------------------
2021-12-20 19:44:40Z DEBUG: LLCoreHttpUtil::HttpCoroHandler::onCompleted: Error Http_499 - Cannot access url: https://simhost-0c6ac319f200fbb3e.agni.secondlife.io:12043/cap/a944d5ab-0f21-dfdd-ec25-7c71d729a23a - Reason: Malformed response contents
2021-12-20 19:44:40Z DEBUG: LLCoreHttpUtil::HttpCoroHandler::onCompleted: Returned body:
<!DOCTYPE HTML PUBLIC ”-//IETF//DTD HTML 2.0//EN”>
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid?
response from an upstream server.<br />?
The proxy server could not handle the request <em><a href=”http://localhost:13018/agent/b43c4b76-3816-49ce-933d-e1a4eef3226e/event-get”>POST&nbsp;http://localhost:13018/agent/b43c4b76-3816-49ce-933d-e1a4eef3226e/event-get</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
2021-12-20 19:44:40Z DEBUG: LLEventPollImpl::eventPollCoro: Event poll <7> - Region: Audeburgh - Error Http_499:  - Ignored and treated as a timeout.
-------------------

OK, so just the usual ”502 in 499 disguise”, here... And now:

-------------------
2021-12-20 19:45:03Z DEBUG: LLCoreHttpUtil::HttpCoroHandler::onCompleted: Error Http_500 - Cannot access url: https://simhost-0465227a3def3041c.agni.secondlife.io:12043/cap/060bf067-02ee-c847-ab02-5d3865112e3d - Reason: Internal Server Error
2021-12-20 19:45:03Z DEBUG: LLCoreHttpUtil::HttpCoroHandler::onCompleted: Returned body:
<!DOCTYPE HTML PUBLIC ”-//IETF//DTD HTML 2.0//EN”>
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid?
response from an upstream s²erver.<br />?
The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
2021-12-20 19:45:03Z DEBUG: LLEventPollImpl::eventPollCoro: Event poll <2> - Region: Destonia - Error Http_500:
2021-12-20 19:45:03Z WARNING: LLEventPollImpl::eventPollCoro: Event poll <2> - Region: Destonia - Retrying in 25 seconds; error count is now 3
--------------------

A ”502 in 500 disguise” !

In conclusion, SL servers lie about the error they report... :-P
It would be nice if a 502 error could actually be reported with a 502 error code in the HTTP header.

To work around this issue, I simply made the timeout the same as what LL's official viewer is using (i.e. the default timeout of the coroutine-based HTTP stack), when logged in SL; this way, the timeout happens at the libcurl level and the server bogus reply is never encountered (well, at least as long as the server timeout is not made shorter... You never know what the future will have in stock for you...).

This workaround will be part of my next viewer release; not sure if other OpenSim-compatible viewers are also using a longer timeout or not... But now, you know what to reply if their maintainers complain about the 500 errors...

Case closed for me (but you might want to cleanup the HTTP error mess...).

Regards,

Henri.

 

Edited by Henri Beauchamp
  • Like 2
Link to comment
Share on other sites

  • Lindens
5 hours ago, Henri Beauchamp said:

Not quite... I sent a note card to you (and Rider) in SL as a follow-up to my initial report but apparently you did not get it or read it. So, here is its contents:

 

The former.  Not seeing any notecard.

5 hours ago, Henri Beauchamp said:

2021-12-20 19:45:03Z DEBUG: LLCoreHttpUtil::HttpCoroHandler::onCompleted: Error Http_500 - Cannot access url: https://simhost-0465227a3def3041c.agni.secondlife.io:12043/cap/060bf067-02ee-c847-ab02-5d3865112e3d - Reason: Internal Server Error
2021-12-20 19:45:03Z DEBUG: LLCoreHttpUtil::HttpCoroHandler::onCompleted: Returned body:

I checked this specific request and it left the 12043 service with a 499 status after 30.0s (started at 19:44:33z).  There's very little opportunity to change this status on our end although I wouldn't discount it completely.  There is a chunk missing in that second error response.  That is very interesting.

6 hours ago, Henri Beauchamp said:

In conclusion, SL servers lie about the error they report... 😛
It would be nice if a 502 error could actually be reported with a 502 error code in the HTTP header.

This may be overstating the API contract a bit.  🙂  The awful 499 often means timeout but it is also used for other 5xx-like problems.  For normal requests, treat the entire class (499+5xx) as a transitory failure and retry.  For long-poll, if a slow status return, treat request as complete (timed-out/no data), and start next cycle when ready.  Back off judiciously, etc.  (Does the coroutined SL viewer do this properly?   Well, perhaps not.)  We're now cloud-hosted.  And we're using additional pass-thru services.  Even https: traffic may be decrypted and re-launched allowing re-write.  It was always naive of Linden to assert sole purpose to status codes and now that will become apparent.

Timeouts.  Philosophies differ here.  I like accumulating/increasing values with distance from the final source(s) of data.  I.e. the old scheme where viewer had a longer timeout than the next layer in.  It's essential for protocols with unacknowledged entity transfer, of which we have a few.  Racing timeouts have the merits of exposing client code to more failure modes and pushing for resiliency.  I can't say whether these were even factors in the change, however.  But resiliency always because it may happen again.

This is also an opportunity to encourage testing on the RC channels.  There may be API dragons.

 

Link to comment
Share on other sites

28 minutes ago, Monty Linden said:

The former.  Not seeing any notecard.

Are Lindens note-card-capped ?  That's ironic ! 🤣

 

Quote

I checked this specific request and it left the 12043 service with a 499 status after 30.0s (started at 19:44:33z).  There's very little opportunity to change this status on our end although I wouldn't discount it completely.  There is a chunk missing in that second error response.  That is very interesting.

Maybe not ”on your end”, but AWS' end ?... One thing is certain, the 500 error in the HTTP header did not appear by magic in libcurl's code !

Also, I am not shocked that the server-side timeout error code is 502, 499 or 500 (whatever floats your boat), even if I would prefer to see the same error occurring as in all OpenSim grids (all reporting a 502 error); no, the surprising thing is that whatever the error code I get from SL's sim servers (i.e. currently 499 ”as usual” and ”the new” 500), the HTML body for that error message is always titled as ”<title>502 Proxy Error</title>”... I would understand getting a 499 with ”<title>499 Internal Error</title>” or a 500 with ”<title>500 Internal Server Error</title>”, but what is occurring currently is illogical... Could it be that AWS' infrastructure changes 502 HTTP error code in the header to 499 or 500 ?...

 

Quote

Timeouts.  Philosophies differ here.  I like accumulating/increasing values with distance from the final source(s) of data.  I.e. the old scheme where viewer had a longer timeout than the next layer in.  It's essential for protocols with unacknowledged entity transfer, of which we have a few.  Racing timeouts have the merits of exposing client code to more failure modes and pushing for resiliency.  I can't say whether these were even factors in the change, however.  But resiliency always because it may happen again.

My viewer does increase the time between requests on errors (like the old LL code used to do), but these timeouts are not to be considered as errors and are just a ”nothing to report” indication from the server (no event occurred), which prompts an immediate launch of a new (fresh) request by the viewer, should it still be interested by that sim events (i.e. when the agent is still present in a neighbouring sim).

The problem with the new 500 error was that it got considered as a genuine error report from the server, and not just a timeout. After 10 such errors (and an increasing amount of time between each retry), the viewer gave up, considering the server was somehow failing, and no event could be processed any more for that sim as a result.

Not a problem any more for me and soon not one either for my viewer users (next release with the workaround due next Saturday); at least, now, if my viewer ”breaks” due to another server-side change (such as a reduced timeout on the server, that would give up before libcurl does viewer-side), then LL's viewer will break in the exact same way ! 😛

Edited by Henri Beauchamp
Link to comment
Share on other sites

  • Lindens
12 minutes ago, Henri Beauchamp said:

Maybe not ”on your end”, but AWS' end ?... One thing is certain, the 500 error in the HTTP header did not appear by magic in libcurl's code !

This is an https: cap so the response isn't available to anyone until it hits the encryption endpoint.  No MITM opportunity on our side, at present.  No chance for an AWS re-write.  Yeah, this is one case where a 'normal error' status is desired.  What would have been even better would have been for the long-poll to return a bloody 200 with no data:  that succeeded, there was nothing to return.  A missed opportunity.

There's a lot that's suspicious in that response.  If you get more information on such occurrences, maybe send it along via email (monty@) since notecards aren't reliable either, apparently.  :(

 

Link to comment
Share on other sites

56 minutes ago, Monty Linden said:

What would have been even better would have been for the long-poll to return a bloody 200 with no data:  that succeeded, there was nothing to return.  A missed opportunity.

Indeed !... All the other caps are working that way, and do not report errors when a result is simply empty... Not too late, since it would be possible to use a new capability to do just that (with fallback code to the old cap when the new one is not available: there are many such occurrences in the viewer code already).

 

Quote

There's a lot that's suspicious in that response.  If you get more information on such occurrences, maybe send it along via email

I'm already using the fixed version and have no time to dedicate for more testing, I'm afraid (especially these days before Xmas).

But you could test it yourself, by simply downloading the current release of my viewer (v1.28.2.53) before I replace it with the next release on Saturday. Enable the debug console and use the ”Advanced” -> ”Consoles” -> ”Debug tags” floater to enable the debug tags of interest: ”CoreHttp” and ”EventPoll” should provide all needed info (I expanded a lot the logging when compared with LL's official viewer, since it is invaluably helpful in finding bugs and weird issues, including race conditions and such).

Just make sure to connect like any ”normal resident” (i.e. not from LL's own local network), via your normal ISP... Else you will not see the same things happening...

Link to comment
Share on other sites

  • Lindens
29 minutes ago, Henri Beauchamp said:

Not too late, since it would be possible to use a new capability to do just that (with fallback code to the old cap when the new one is not available: there are many such occurrences in the viewer code already).

Yeah, made that an internal Jira.

  • Like 1
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 281 days.

Please take a moment to consider if this thread is worth bumping.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...