Monty Linden

February 16

5 hours ago, Ardy Lay said:

For a "living document", that wiki has a lot of necrotic tissue.

Bwahaha, we used a lot of phlogiston back in those days. Those appliances are long gone. Along with the Isilons and more. I'm surprised the URL patterns still worked or that anything on that page still applies.

I'm trying to bracket when the externally-visible behavior changed. I'm not web but I'm not seeing a cause...

February 16

3 hours ago, Marvin Benelli said:

- the redirection is an endless loop: everything below between the ===== markings is repeated forever until max_redirs gets hit

Yes, without cookies, it just goes into a redirect loop.

February 15

1 hour ago, Love Zhaoying said:

Again, sorry if I was not clear: if there was an (easy or obvious) solution, Monty would have provided it!

Perhaps another user will see your thread and chime in with a helpful response that includes how they got to "search" from within LSL code using llHTTPRequest().

Well, it was late and so I didn't (and still don't) have all the answers. There will probably be some issues filed on this (and I'll mention them here). And I'm going to run down some other answers.

The Search API page is notable for having last been edited in 2011. (Almost typed 1911.)

The curl command (now edited) isn't a solution for you, @VirtualKitten. But it is a tool that allows you to see what is occurring. The exciting thing is that it also exposes some bugs in curl (which we use internally). Curl 7.64.0 and 7.66.0 handle cookies slightly differently and we may be triggering something.

@VirtualKitten, when did this stop working for you?

February 15

Well, I can answer what the root error cause is (though several are involved in this case):

URLRequest Error: 47, Number of redirects hit maximum amount, http://search.secondlife.com/?query_term=Galadriel%27s&search_type=standard&collection_chosen=events&maturity=gma

Too many redirects and that information isn't passed back in any useful way.

February 14

On 2/14/2024 at 7:49 AM, VirtualKitten said:

I think the search.secondlife.com is either a NAME Index for a URL or a similar DNS service that does the same this is why it returns no files . Can you provide the proper search URL please and not a Named IDX or similar service? As I think this is most likely what i transpiring.

DNS chains are silently followed and wouldn't cause this problem (ignoring TLS cert validation for the moment). If you run your search query through a 'curl' command such as:

curl --http1.1 -v -L -o /dev/null 'https://search.secondlife.com?query_term=Galadriel%27s&search_type=standard&collection_chosen=events&maturity=gma'

you will see the sequence of visits that are made as the request gathers state information into cookies. There are a number of possible reasons for llHttpRequest failing on this:

cookie management during the request
redirect count (3xx codes)
other limits (header length, response body length)

February 13

One discovery: search absolutely requires the use of cookies. The llHttpRequest APIs are likely not helping in this area so this may very well require some enhancement.

February 13

I think I agree that this should be better available. The support ticket is one way to try to kick things. You might also want to go to https://feedback.secondlife.com/ and file a bug or feature request on this.

I noticed the redirection chain is going through an auth endpoint and is doing a good amount of cookie churn to pass context along. There may be a way to start in the middle of the chain and get a search result back easily.

February 13

Necropost! Doubly so as oz is no longer here. A quick check with 'curl' shows that that URL needs to go through about seven 3XX redirects to finally produce output and the final response is about 24KB. I suspect http-out limits are generating a 499 error and extended error status would provide more information. (I haven't checked this - just throwing it back out there.)

February 9

There is a possible fix for late EAC taking shape in one of the active projects. It might over-generate messages but a change is coming. Schedule unknown as yet.

February 9

9 hours ago, animats said:

Check me on this. I may have misread something.

Oh, it's almost certainly true. It's what you get with evolution through a series of errors rather than design. The simulator didn't spec or implement the right thing. The viewer made an adaptation keeping old caps around. Simulator then had to follow the viewer keeping old caps alive per viewer spec so implements weak persistence and caching. That cache doesn't always work as expected with undocumented lifetimes, etc., and the viewer adapts freezing the evolved spec at a new behavior. Repeat.

The texture and mesh caps are 'local' resources that will supply grid-wide data, even before CDN, so likely showed up here first. Inventory and other APIs have similar looks-local-acts-global scope. But some really are local and you need those caps. Or you end up using another simulator as proxy.

I really do tip my cap to the OS people for trying to chase this moving target with very little help from us...

February 9

Oh, ho, the anticipation now....

January 27

It's us. Things are recovering. But never trust that aws status...

January 27

10 minutes ago, Henri Beauchamp said:

We could patch libcurl (it's already a patched version that is statically linked against the viewer binary, anyway) to insert a hook in its code and recover the decyphered data...

Yep, another possibility. For development work, I cheat. I just disable encryption everywhere and work in cleartext. That doesn't work on Aditi.

January 27

1 hour ago, animats said:

I see SL traffic event poller replies with no traffic in two forms - zero length body with status 200, and HTTP error 500. The spec says a no-traffic reply should be a 502 status, which I never see from SL sim servers. (The Other Simulator does send 502 items)

The wiki is a liar. Fixing that is going to take some time.

The <200, empty, text/html> responses bother me. We have a closed socket path, a zero-length event path <200, [], app/llsd+xml>, and some 499 paths. Not certain where this is coming from but it may be what the server timeout when a response is in-flight looks like. Ugly.

1 hour ago, animats said:

Hm. Possible, although difficult. What are you looking for.

Nothing for myself. HTTP client libraries tend to 'interpret' edge cases before giving callers a chance to inspect. So you sometimes need to capture wire protocol for naked truth analysis. And key logging at the TLS level is required for this now. I have a Jira to add this to the viewer at some point but that implies unpinning libcurl. So not today but you can do it in other languages at any time.

1 hour ago, animats said:

Yes. Haven't implemented avatar appearance, editing, voice, etc. yet. So those caps are the ones that the Sharpview viewer is using. I could fetch more, of course. Shouldn't affect this issues, since cap fetching comes after receiving the seed cap.

Agreed, it shouldn't but have you seen our wiki? Simulator has a lot of conditional code and some of that is conditioned on what you would normally think is an unrelated sub-system. There's a tendency for the simulator to use SL viewer behavior as specification (which is both wrong and backwards). If you ask for a full set of caps regardless of whether you need them or now, you might see different behavior.

January 27

Okay, grabbed some Bonifacio logs and I'll dig into those later. One thing I didn't expect to see in the Apache log was a 200 with a 0-length body to event-get. Not certain I believe that, so...

A recommendation since you're chin-deep in protocol... look into getting the key log information out of the TLS support in the Rust crate doing the crypto. You'll be able to decode the https: stream and persist the decode and this may be necessary for more digging. And it's the only way to do inspection in the modern TLS world.

Quick question: it looks like you're asking for very few caps in the initial see request. Far fewer than the SL viewer. Is this correct?

January 27

On 1/24/2024 at 5:03 AM, animats said:

Exactly 30 seconds later, the event poller returns a transport error, "Connection aborted / Unexpected EOF". The viewer side polls again.

Exactly 30 seconds after that, the event poller returns a 500 Internal Server Error status. The viewer side polls again.

I'd need accurate time and location information to track down a truthier story. But here goes...

First bullet. Simulator had started sending a response to the client when the 30-second simulator timeout fired. This simply closed the apache<>simulator socket. Apache had managed to start proxying data back to viewer (in response phase) when this happened so all it could do was close the viewer<>apache socket and give up. This would have lost event data.

Second bullet. New connection hits the 30-second timeout in the simulator and the apache<>simulator socket is closed. A response could have been in flight, but not necessarily. Apache hadn't yet started to return a response to viewer (in turnaround wait phase) so had some options: synthesize an HTTP status and response body or just close the client<>apache socket. It selected the former in this case. This might have lost event data or might simply be one of the dataless glitch modes around responses. Response headers and bodies might give insight into what really happened.

As always, either apache or simulator could elect to 5xx some query because of design/implementation choices (*cough*).

Added: It's also possible that the 500-vs-broken socket election is entirely in the Rust environment. In-progress response interruption (and data loss) always reports as broken socket while interruption during turnaround wait possibly becomes a synthetic 500. I don't know the details of the Rust runtime...

January 26

5 hours ago, Kathrine Jansma said:

Ok, thank you for testing. Guess it was just bad luck with my sample regions than.

LSL is getting attention these days and sometimes that means new bugs. If you're seeing a repeatable failure, please file a bug report with script, inventory, and region details. You may not be imagining things...

January 26

Kama Center/Charlesville look fine on win11 with 7.1.2 SL viewer.

January 26

Not seeing (or expecting) any such change. Any more details?

January 17

We're still rolling back some RCs. Sorry for the inconvenience and I hope we'll have some more info for everyone shortly.

January 16

5 hours ago, Twist Mechanique said:

This happens at least 3-4x per day for me. The last time I did a full cache clear and reload to see if it would fix it. (about 5 hours ago.)

Is this always on login for you?

January 13

This is one fingerprint of a spurious inventory failure scenario that we're aware of. A relog will likely complete without error in this case (was that your experience?). If failure does persist, that indicates it is time to escalate to support. But let us know about the transient failures as well. Pressure and feedback drive development.

January 12

8 hours ago, Love Zhaoying said:

Right, but my point (poorly made, too much noise and no noise filter) was that it probably doesn't tell if the toy is actually "being used" (in or on someone's body).

An open API for the anonymous exchange of PGP-signed biometric data. Hmmm...

January 11

5 hours ago, animats said:

It's interesting that single crossings with multiple avatars on a vehicle fail badly. It's not clear why they should.

Yeah, the vehicle unpacking and repacking code is 'interesting.'

January 10

Duplicated (or worse?) RegionHandshake messages came up in one of my deep digs earlier in this thread. Found one source for them. Code appears to be very deliberate in the addition but no one knows the explanation now. And it wasn't documented. Smells like simple carelessness or whitewashing of something not understood (two messages are better than no messages so...).

As you've shown, viewer and simulator restoring synchronization after an RC/TP takes time (it's a protocol exchange). So there's always a race scenario where queued or stale or retransmitted data becomes invalid or irrelevant after the full exchange. And the problem is worse when it's a 3-body problem (one viewer, two simulators). All participants must be prepared to detect and reject these cases.

That situation exists between simulators as well. The transition milestones represented by the messages between viewer and simulators aren't the only transitions behind the firewall. For example, an avatar or vehicle making a crossing isn't fully functional until sometime after the destination simulator declares the move complete. This internal transition can come before or long after the viewer has acknowledged the movement. In the meantime, such an object is free to initiate *new* transitions such as an RC/TP or even just a logout (which requires updates to inventory). This is part of the reason why 4-body transitions (one viewer, three regions) are so fraught.

Viewers do experience these problems in other ways and DMs are one such area. A version of avatar presence is maintained as global state outside of the simulators. One use of this data is to direct DMs to the correct simulator for viewer uptake. But that service is subject to all the delay, ordering, retry, and other protocol issues of any other distributed system. It's notion of presence can be stale or very wrong in certain cases. So each message in a DM stream takes that information as a hint and proceeds to hunt down the actual avatar location according to simulators. But that is also a protocol exchange and doesn't necessarily produce a correct result when transitions are in-flight. In this case, the IM system gives up and tosses the message into persistent storage for later recovery. And that's one way messaging does really, really surprising things.

Forums

Blogs

Knowledge Base

Posts posted by Monty Linden

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us