Jump to content

Tools and Technology

  • entries
    72
  • comments
    3,233
  • views
    56,232

Contributors to this blog

About this blog

Entries in this blog

Linden Lab

 

pr3AP_DADAY

 

One of the challenges that virtual world creators face is the trade-off between rich visual detail and geometric complexity. Ideally, by adding more and smaller faces to an object, a designer can model different surface textures and create realistic variations in the interplay of light and shadow. However, adding faces also quickly increases the size of the model and its rendering cost. Normal and Specular Maps are ways to address this by allowing for the appearance of a complex surface without actually modeling fine scale geometry.

A Normal Map is an image where the color codes indicate how the renderer should reflect light from each pixel on a surface by modifying the direction that the pixel "faces" (imagine that each pixel could be turned on tiny pivots). This means that pixels on a simple surface can be rendered so that they appear to have much more detail than the actual geometry and at much lower rendering cost. Light and shadow are rendered as though the surface had depth and physical texture, simulating roughness, bumps, and even edges and additional faces.

Similarly, a Specular Map allows each pixel to have its own degree of reflectivity, so that some parts of a single face reflect sharply, while adjacent pixels can be dull.

The open source developers of the Exodus Viewer are contributing Viewer support for Normal and Specular Maps, as well as some additional controls for how light reflects from faces. Linden Lab is developing the server side support so that this powerful tool will be available in Second Life.

Design and development are under way. Watch this blog and the Snowstorm Viewers page for information on when test Viewers with these new capabilities become available.

For additional information, or to learn more about how you can participate in the open source program, please contact Oz@lindenlab.com.

 

Linden Lab

When I came to Linden Lab over five years ago, Second Life had gone through a period of the coveted hockey-stick growth, and we had just not kept up with the technical demands such growth creates. One or more major outages a week were common.

In my first few months at the Lab, we removed more than a hundred major single points of failure in our service, but several major ones still loomed large, the granddaddy of them all being the core MySQL database server. By late Winter 2009 we were suffering from a core database outage a few times each week.

With a lot of hard work and countless long nights we stabilized the service and started making major improvements to the overall stability and performance of Second Life. However, despite our continued improvements, and the relative tranquility they have created, the spectres of technical debt and single points of failure still loom over our operations. In recent weeks some of them have struck and disrupted Second Life. So much so that I want to explain the outages that have occurred, how we addressed them, and what we are doing going forward.

First, that core MySQL database cluster still exists. It is still the core of many of our central functions. When the write server fails it takes a minimum of thirty minutes to promote a new server into position. The promotion itself is actually relatively quick, but its numerous dependent services must all be taken down and brought back up carefully to ensure that they are all functioning properly.

In the last two months the core MySQL write database has hit two different fatal hardware faults, driving us to temporarily halt most Second Life operations. In some sense, two major write database failures close together is bad luck, but we cannot depend on luck to ensure the reliability of Second Life. In the very near future, we are moving the core MySQL write server to a new hardware class, on which production read servers are already running. Moving the write server will further improve overall database performance and make failures less frequent. It does not of course solve the root single point of failure problem so in the coming days, weeks, and months we will be reducing the impact of database failures even more. This includes continued improvement to the rotation process, extracting more functions out of the core database cluster, and further reducing the number of features that depend on the single write server.

The core MySQL database, however, has not been our only problem recently. A few weeks ago there was a massive distributed denial of service attack on one of our upstream service providers that affected most of their customers, including us, and inhibited the ability of some to use our services. We have since mitigated future potential impact from such an attack by adding an additional provider. There have also been hardware failures in the Marketplace search infrastructure that have impacted that site, a problem that we are continuing to work through. Most seriously though was this week’s four and a half hour long login outage.

On Tuesday morning, users stopped being able to get into Second Life. The root cause was created over ten years ago in a system designed to assign a unique identifier to the hand-off of sessions from login to users’ initial regions. At 7:40AM Pacific Time, that system quietly ran out of possible numbers to assign. It took us four hours to isolate the problem, test a fix, and deploy the change. Users could immediately log in at that point, but it took an additional two hours for systems to settle out. When tens of thousands of users rush back into Second Life following an outage, we have to deliberately throttle some services to prevent further breakage.

Having such a hidden fault in a core service  is unacceptable, so we are doing a thorough review of the login process to determine if there are any more problems like this lurking. Our intent at this point also is to remove the identifier assignment service altogether. It not only was the ultimate source of this outage, but is also one more single point of failure that should have been dispatched long ago.

We want to apologize for all of the recent problems and the frustration they have caused. We too are frustrated and are intent on making our service better. Few things give me more pleasure than every day helping to make Second Life a happy and fun place. Thank you for your patience and support. We simply could not have a more devoted user base and for that we owe you better.

Most sincerely,

Landon (Linden)

 

Linden Lab

Somewhere below the regolith in the Linden Lab secret lunar base, some of the Lab’s top minds have been tackling performance issues in Second Life. The areas of focus range from infrastructure improvements to server-side texture compositing to object caching.

This year, Linden Lab is making the single largest capital investment in new server hardware upgrades in the history of the company. This new hardware will give residents better performance and more reliability. Additionally, we are converting from three co-locations to two co-locations. This will significantly reduce our inter-co-location latency and further enhance simulator performance.

The Shining project is the name given to a recent Lab effort to identify and measure delays and other problems with streaming avatars and objects and to propose and implement solutions. During the Shining project, several small improvements have been identified and deployed. The next small improvement from Shining to be deployed will be pre-rendering the starter Avatars to improve the new resident experience.

As a result of the Shining investigation, the project has been split into three larger performance projects: Project Sunshine, Object Caching, and HTTP Library.

All together, the hardware upgrades and the Shining projects should improve avatar and object streaming speeds significantly. Depending on your unique situation, your mileage will vary.


Project Sunshine (Server-side Baked Texture Generation & Storage)


Currently, all client viewers are responsible for compositing their own Avatar textures, then sending the results back to the Sim for other viewers to access. This method can lead to slowdowns and errors. The actual calculations for compositing textures are straightforward and not particularly time-consuming. However, in order for the viewer to do the calculations it first has to download a lot of individual assets from the Sim, and must then upload large results back to the Sim. This pushes a lot of bits through the Sim / Viewer connection, which can be slow and unreliable.

Depending on client hardware to do the compositing and uploading of the resulting baked textures can introduce erroneous results (like way too much pink). In order to handle these errors, a number of retry and fallback mechanisms have been put in place. This adds further load and overhead to the whole system.

Project Sunshine stands up a Texture Compositing server that is separate from the Sims servers. When a Viewer needs to render an Avatar, it sends a message to the Sim, which in turn sends a message to the Texture Compositing Server. The Texture Server then performs the texture compositing and sends the results back to the Viewer.

The Texture Server has access to a database of previously generated Avatar textures. If the Sim is asking for the baked textures for an Avatar that has been previously calculated, the Texture Server can pull the results out of the database instead of recalculating. Even if it’s not the same Avatar, but the set of clothes is the same, the Texture Server can use the previously calculated results.

An additional advantage of having a Texture Generation server is that the heavy HTTP communication about the individual assets that make up an Avatar will happen on reliable, fast internal network connections instead of external connections.

Object Caching & Interest Lists

Caching objects on a resident’s hard drive is intended to provide faster drawing of objects that are most likely to be redrawn later in a resident’s session or in the next session.

The current object cache implementation has room for improvement. Currently, the viewer only caches objects at the point the resident disconnects from a region and only objects currently in view are cached. The current caching communication between Sim and Viewer when a scene is initially being loaded is inefficient and resource intensive.

The new system is designed to make sure that a viewer never has to download object information that it already has and is still valid. This will reduce the load on the Sim and on the Sim / Viewer connection.

With the new system, when the Viewer connects to a region, the Sim first says to the Viewer, “Here’s a list of all my object groups by UUID.” The Viewer checks its cache and replies, “Here are the timestamps for the object groups that I found in my cache.” (If an object group is not found in the cache, the timestamp is 0.) The Sim checks the Viewer’s timestamps against its own data and responds, “Here’s a list of object groups in your cache that are still valid.” While the Viewer starts drawing valid objects from its cache, the Sim starts streaming the new object data that the Viewer needs, based on the particular Viewer’s object priorities.

HTTP Library

Top performance for the Shining initiatives depend on the speed and reliability of HTTP messages initiated by both the Viewers and the Sims. The third part of the Shining project, the new HTTP Library, implements modern best practices in messaging, connection management, and error recovery.

Linden Lab

Mesh Update

Linden Lab is excited to announce the latest updates on one of our coolest new features — mesh. With mesh, you’ll be able to create, build and beautify even more incredible objects inworld! You’ll find a new Mesh Enablement functionality tab in the My Account section of your Account screen. It’s just another part of the innovation and imagination that makes Second Life the magical place it is.

Uploading Mesh

Now that mesh is in rollout stage and many users will soon be able to utilize its content-creation capabilities, there are a few things creators should know before giving it a whirl. Potential creators will need to satisfy a couple of requirements in order to gain access to mesh-upload capabilities. First, you’ll need to review a short tutorial that is intended to help Residents understand some of Linden Lab’s intellectual property policies — they can be kind of tricky, and we at the Lab want all content creators to be well-informed This tutorial outlines some of the key points relating to intellectual property.

Second, if you have never given us billing information, you'll need to enter it into your account. Why, you ask? Because having payment information on file is an important step in establishing direct relationships with content creators who will be working in mesh. Note: We do realize that with the current configuration, some Residents will need to make a purchase in order to enter their billing information.

The first round of rollouts of mesh will be happening over the next few weeks! We look forward to seeing your continued imagination, skill and creativity in this exciting new format. We hope you’ll be pleased with our innovations — and can’t wait to see what you come up with!

Now go create something amazing!

Linden Lab


Linden Lab

Hi! I’m a member of the Second Life Operations team. On Friday afternoon, major parts of Second Life had some unplanned downtime, and I want to take a few minutes to explain what happened.

Shortly before 4:15pm PDT/SLT last Friday (May 6th, 2016), the primary node for one of the central databases that drive Second Life crashed. The database node that crashed holds some of the most core data to Second Life, and a whole lot of things stop working when it’s inaccessible, as a lot of Residents saw.

When the primary node in this database is offline we turn off a bunch of services, so that we can bring the grid back up in a controlled manner by turning them back on one at a time.

My team quickly sprung into action, and we were able to promote one of the replica nodes up the chain to replace the primary node that had crashed. All services were fully restored and turned back on in just under an hour.

One additional (and totally unexpected) problem that came up is that for the first part of the outage, our status blog was inaccessible. Our support team uses our status blog to inform Residents of what’s going on when there are problems, and the amount of traffic it receives during an outage is pretty impressive!

A few weeks ago we moved our status blog to new servers. It can be really hard to tune a system for something like a status blog, because the traffic will go from its normal amount to many, many times that very suddenly. We see we now have some additional tuning we need to do with the status blog now that it’s in its new home. (Don’t forget that you can also follow us on Twitter at @SLGridStatus. It’s really handy when the status blog in inaccessible!)

As Landon Linden wrote a year ago, being around my team during an outage is like watching “a ballet in a war zone.” We work hard to restore Second Life services the moment they break, and this outage was no exception. It can be pretty crazy at times!

We’re really sorry for the unexpected downtime late last week. There’s a lot of fun things that happen inworld on Friday night, and the last thing we want is for technical issues to get in the way.


April Linden

Linden Lab

At one time or another, many of us have suffered a Viewer crash that interrupted our time in Second Life. At Linden Lab, we collect crash data to help us in our ongoing efforts to identify and fix issues in the Viewer that can cause crashes. Thanks to that information, nearly every release we make includes fixes of this sort.

The data we collect also reveals that many users can greatly reduce their risk of Viewer crashes by taking a few steps to update their software outside of Second Life.

The nature of Second Life as a platform for user creativity means that the Viewer faces different challenges than client software for an online game, for example, which would just need to handle the limited and carefully optimized content created by the game’s developer. This can make Second Life a demanding application for your computer and can mean that if your operating system is out of date, your Viewer is more likely to crash.

The good news is you can take steps today to help this! Here are a couple of tips:

Upgrade your Operating System

There is a very clear pattern in our statistics - the more up to date your operating system is, the less likely your Viewer is to crash. This applies on both Windows and Macintosh (Linux is a little harder to judge, since "up to date" has a more fluid meaning there, and the sample sizes are small). Some examples:

  • Windows 8.1 reports crashes only half as often as Windows 8.0

    Those of you who stuck with Windows 7 (roughly 40% of users of our Viewer right now) rather than upgrade to 8.0 made a good choice at the time; version 7 still has a much better crash rate than 8.0, but not quite as good as 8.1 (now about 15% of users), so waiting is no longer the best approach.

  • Mac OSX 10.9.3 reports crashes a third less than 10.7.5

    OSX rates do not have as much variation as Windows versions do, but newer is still better, and there are other non-crash reasons to be on the up to date version, including rendering improvements.

 Upgrading will probably also better protect you from security problems, so it's a good idea even aside from allowing you to spend more time in Second Life.

Use the 64 bit version of Windows if you can

For each version of Windows for the last several years, you have had a choice between 32 bit and 64 bit variants; if your system can run the 64 bit variant, then you will probably crash much less frequently by changing to it. While we don't have a fully 64 bit version of the Viewer yet, you can run it on 64 bit Windows, and statistically you'll be much better off if you do.

  • Generally speaking the 64 bit Windows versions report crashes half as often as the 32 bit versions.

    According to the data we collect, a little more than 20% of users are running 32 bit Windows versions; most of you can probably upgrade and would benefit by it.

If you bought your computer any time in the last 5 years, chances are very good that it can run the 64 bit version of Windows (as will some systems that are even older). Microsoft has a FAQ page on this topic; go there and read the answer to the question "How do I tell if my computer can run a 64-bit version of Windows?". That page also explains how to do the upgrade and other useful information.

We'll of course continue working hard to find and fix things that lead to Viewer crashes. Even as we do that, though, you can decrease your chances of crashing today by taking the steps above.

Best Regards,

Oz Linden

 

Q Linden

Today we launched Viewer 2.5, now out of Beta. The most significant update is a new, web-based profile system, which allows profiles to be viewed and edited both on the web and in the Viewer. For example, here's mine. Please note this is just a starting point for the web-based profiles; we’ll be doing a lot of work to refine the usability and make them richer over time.

In response to your feedback from the early beta versions of Viewer 2.5, we've added some privacy settings that will allow you to control just how public your profile is. Once you’ve logged in, click on “Privacy Settings” in the upper right corner of your profile. Group settings set in the Viewer will be overridden by these group privacy settings.

  • "Everyone" means that the information is available to the whole Internet and can be picked up by search engines.
  • "Second Life" means that the information is available to all Second Life Residents who are logged into the website or inworld. This is the default for all existing Residents using Viewer 2.5.
  • "Friends" means that only your Second Life friends can see the information on the web and inworld.

This is why we have a beta process--to address concerns and improve your user experience. We will continue to iterate as we get more feedback. Thank you for all your help and comments. Please attend the Viewer 2 User Group meetings if you would like to share your thoughts and feedback directly with me and the Snowstorm team.

Viewer 2.5 also has some other new features. The one I like best is that you can now have your Favorite landmarks also appear on the login screen, so that you can log directly into your favorite locations. Torley made a video about this, so check it out! We've also improved some texturing performance and fixed another batch of bugs. Watching the internal data, we've already seen a noticeable improvement in stability and performance--on par with Viewer 1.23.

Download Viewer 2.5, try it out, and keep the feedback coming! And, if you Twitter, please use the hashtag #slviewer2.

Helpful Links

Linden Lab

We're ready to start a limited beta test of an exciting new tool for creators: Experience Keys. These are new LSL functions and calls that make it possible to bypass the multiple permissions dialogs that you encounter with scripted objects today. Experience Keys will make it possible for users to create more immersive experiences inworld, because those interacting with the experience will be able to grant all the permissions necessary to participate just once, instead of having the experience interrupted by multiple permissions requests. To learn more, check out this brief

.

uO2Ean6Vzxk

We used this technology when creating the Linden Realms game, and we're now ready to start putting this tool in the talented hands of creators in the Second Life community. Experience Keys is a powerful tool, and we need to be sure we test and roll out the feature carefully, so the first step will be a limited beta, then the viewer and server releases shortly after.

If you’d like to participate, send an email to slexp_beta@lindenlab.com with “Experience Key Beta” as the subject along with:

  1. Your experience name.

  2. What genre does it fit in?

  3. Give us a brief description of your experience.

  4. How would your customers benefit from Experience Keys?

Linden Lab

Available now is the ability for LSL to return an avatar’s shape type to scripted objects. With this information, scripters and creators of objects can determine the best animation, pose, or position to play when avatars interact with their objects.

Scripts can now read the avatar’s shape type (male or female) and hover height values. For a complete list of object and avatar agent size details please visit the LLGetObjectDetails() and LLGetAgentSize() wiki pages..

Linden Lab

You may have noticed that we’ve added a Mesh Enablement tab to your Account. Get ready to explore the latest innovation from the Lab!

Mesh? What the heck is that?

We’ve had quite a few questions from users about mesh —  mostly along the lines of what is it? How can I use it? When will it become available? Well, we’ve got some answers for you — and some good news!

First of all, what is it, exactly?The term “mesh” refers to an object that consists of polygonal geometry data. Essentially, it’s what you see in modern video games, special effects and 3D animation. It's extremely flexible — and when it’s well-made, it can be way more efficient than the existing prims and sculpties you’ll find inworld currently. Mesh objects are first created in external programs, such as Blender or Maya, and then imported onto the Second Life grid. Once there, mesh objects can be manipulated in pretty much the same ways you would manipulate a regular old prim. And, here’s the best part — the wait is almost over! We’ve been rolling out mesh-upload capability to selected regions over the past week, and the rollout pace will increase in the coming weeks.

Mesh on the Main Grid

Because we’re rolling out mesh-upload capability across the Main Grid over the next few weeks,  mesh is not going to be available everywhere all at once. But, you can get a preview. Ready to try mesh? Here are a few steps you'll need to complete to get started:

1. Get enabled. To do this, you'll need to first complete a short tutorial about mesh and intellectual property rights.

2. Download the Mesh Project Viewer.

3. Once you’re inworld and using the Mesh Project Viewer, teleport to the selected Project Regions where mesh capabilities are enabled. If you're interested in uploading mesh objects, you will want to visit public areas to test. View the list of current mesh enabled sandbox regions here.



If you have a region and want to get upgraded early, check here.

Before you do visit a Project Region, you should check out the Mesh Project Wiki Page and read about uploading mesh.

It's also a good idea to read through the Mesh Forums to get more info and to ask questions of your fellow creators.

During the rollout period (until we enable upload in the next version of the Release Viewer), Linden Lab will charge a discounted fee for uploads. Also, during this initial period, no mesh items should be posted to the Second Life Marketplace due to the limited availability of places to rez mesh objects.

We’re excited to share our latest project with you — and can’t wait to see the awesome things you’ll build with mesh!

Go out and create something amazing – have fun!

Linden Lab

Linden Lab

Keeping the systems running the Second Life infrastructure operating smoothly is no mean feat. Our monitoring infrastructure keeps an eye on our machines every second, and a team of people work around the clock to ensure that Second Life runs smoothly. We do our best to replace failing systems proactively and invisibly to Residents. Unfortunately, sometimes unexpected problems arise.

In late July, a hardware failure took down four of our latest-generation of simulator hosts. Initially, this was attributed to be a random failure, and the machine was sent off to our vendor for repair. In early October, a second failure took down another four machines. Two weeks later, another failure on another four hosts.

Each host lives inside a chassis along with three other hosts. These four hosts all share a common backplane that provides the hosts with power, networking and storage. The failures were traced to an overheating and subsequent failure of a component on these backplanes.

After exhaustive investigation with our vendor, the root cause of the failures turned out to be a hardware defect in a backplane component. We arranged an on-site visit by our vendor to locate, identify, and replace the affected backplanes. Members of our operations team have been working this week with our vendor in our datacentre to inspect every potentially affected system and replace the defective component to prevent any more failures.

The region restarts that some of you have experienced this week were an unfortunate side-effect of this critical maintenance work. We have done our best to keep these restarts to a minimum as we understand just how disruptive a region restart can be. The affected machines have been repaired, and returned to service and we are confident that no more failures of this type will occur in the future. Thank you all for your patience and understanding as we have proceeded through the extended maintenance window this week.

 

Linden Lab

Since its introduction, the Linux version of the Second Life Viewer has been considered a Beta status project, meaning that it might have problems that would not have been considered acceptable on the much more widely used Windows or Mac versions. Because "Linux" isn't really one platform - it's a large (and fluid) number of similar but distinct distributions - doing development, builds, and testing for the Linux version has always been a difficult thing to do and a difficult expense to justify. Today, Linux represents under half of one percent of official Viewer users, and just a little over one percent of users on all viewers. We at Linden Lab need to focus our development efforts on the platforms that will improve the experience of more users.

While we hope to be able to continue to distribute a Linux version, from now on we will rely on the open source community for Linux platform support. Linden Lab will integrate open source community contributions to update the Linux platform support, and will build and distribute the resulting viewers, but our development engineering, including bug fixing, will be focused on the platforms more popular among our users. We hope that the community will take up this challenge; anyone interested in ensuring that their fellow Linux users can continue on their preferred platform is encouraged to reach out to us to find out where help is most needed.

 

Linden Lab

Hi! I’m a member of the Second Life operations team, and I was the primary on-call systems engineer this past weekend. We had a very difficult weekend, so I wanted to take a few minutes to share what happened.

We had a series of independent failures happen that produced the rough waters Residents experienced inworld.

Shortly after midnight Pacific time on January 9th (Saturday) we had the master node of one of the central databases crash. The central database that happened to go down was one the most  used databases in Second Life. Without it Residents are unable to log in, or do, well, a lot of important things.

This sort of failure is something my team is good at handling, but it takes time for us to promote a replica up the chain to ultimately become the new master node. While we’re doing this we block logins and close other inworld services to help take the pressure off the newly promoted master node when it starts taking queries. (We reopen the grid slowly, turning on services one at a time, as the database is able to handle it.) The promotion process took about an hour and a half, and the grid returned to normal by 1:30am.

After this promotion took place the grid was stable the rest of the day on Saturday, and that evening.

That brings us to Sunday morning.

Around 8:00am Pacific on January 10th (Sunday), one of our providers start experiencing issues, which resulted in very poor performance in loading assets inworld. I very quickly got on the phone with them as they tracked down the source of the issue. With my team and the remote team working together we were able to spot the problem, and get it resolved by early afternoon. All of our metrics looked good, and I and my colleagues were able to rez assets inworld just fine. It was at this point that we posted the first “All Clear” on the blog, because it appeared that things were back to normal.

It didn’t take us long to realize that things were about to get interesting again, however.

Shortly after we declared all clear, Residents rushed to return to the grid. (Sunday afternoon is a very busy time inworld, even under normal circumstances!) The rush of Residents returning to Second Life (a lot of whom now had empty caches that needed to be re-filled) at a time when our concurrency is the highest put many other subsystems under several times their normal load.

Rezzing assets was now fine, but we had other issues to figure out. It took us a few more hours after the first all clear for us to be able to stabilize our other services. As some folks noticed, the system that was under the highest load was the one that does what we call “baking” - it’s what makes the texture you see on your avatar - thus we had a large number of Residents that either appeared gray, or as clouds. (It was still trying to get caught up from the asset loading outage earlier!) By Sunday evening we were able to re-stabilize the grid, and Second Life returned to normal for real.

One of the things I like about my job is that Second Life is a totally unique and fun environment! (The infrastructure of a virtual world is amazing to me!) This is both good and bad. It’s good because we’re often challenged to come up with a solution to a problem that’s new and unique, but the flip side of this is that sometimes things can break in unexpected ways because we’re doing things that no one else does.

I’m really sorry for how rough things were inworld this weekend. My team takes the stability of the grid extremely seriously, and no one dislikes downtime more than us. Either one of these failures happening independently is bad enough, but having them occur in a series like that is fairly miserable.

See you inworld (after I get some sleep!),

April Linden

Linden Lab

Hi! I wanted to take a moment to share why we had to do a full grid roll on a Friday. We know that Friday grid rolls are super disruptive, and we felt it was important to explain why this one was timed the way it was.

Second Life is run on a collection of thousands of Linux servers, which we call the “grid.” This week there was a critical security warning issued for one of the core system libraries (glibc), that we use on our version of Linux. This security vulnerability is known as CVE-2015-7547.

Since then we’ve been working around-the-clock to make sure Second Life is secure.

The issue came to light on Tuesday morning, and the various Linux distributions made patches for the issue available shortly afterwards. Our security team quickly took a look at it, and assessed the impact it might have on the grid. They were able to determine that under certain situations this might impact Second Life, so we sprang into action to get the grid fully patched. They were able to make this determination shortly after lunch time on Tuesday.

The security team then handed the issue over to the Operations team, who worked to make the updates needed to the machine images we use. They finished in the middle of the night on Tuesday (which was actually early Wednesday morning).

Once the updates were available, the development and release teams sprung into action, and pulled the updates into the Second Life Server machine image. This took until Wednesday afternoon to get the Second Life Server code built, tested, and the security team confirmed that any potential risk had been taken care of.

After this, the updates were sent to the Quality Assurance (QA) team to make sure that Second Life still functioned as it should, and they finished up in the middle of the night on Wednesday.

At this point we had a decision to make - do we want to roll the code to the full grid at once? We decided that since the updates were to one of the most core libraries, we should be extra careful, and decided to roll the updates to the Release Candidate (RC) channels first. That happened on Thursday morning.

We took Thursday to watch the RC channels and make sure they were still performing well, and then went ahead and rolled the security update to the rest of the grid on Friday.

Just to make it clear, we saw no evidence that there was any attempt to use this security issue against Second Life. It was our mission to make sure it stayed that way!

The reason there was little notice for the roll on Thursday is two fold. First, we were moving very quickly, and second because the roll was to mitigate a security issue, we didn’t want to tip our hand and show what was going on until after the issue had been fully resolved.

We know how disruptive full grid rolls are, and we know how busy Friday is for Residents inworld. The timing was terrible, but we felt it was important to get the security update on the full grid as quickly as we could.

Thank you for your patience, and we’re sorry for the bumpy ride on a Friday.

April Linden

Linden Lab

Facebook recently announced plans to deprecate an old Open Graph API and required all apps running 1.0 to force update to 2.0. We have completed this update for SLShare, but Facebook anticipates that the process on their end for migrating a given app may take up to a couple of weeks. During this migration period, there may be some service interruptions for some apps.

This means that when using SLShare (updating status, photo uploads, and check-ins from the Viewer) you may experience some temporary problems. Please be assured that we are aware of this and any issues you encounter should be resolved once the migration period is complete.

Thank you for your patience!

 

Linden Lab

As of today, several projects have reached the Project Viewer stage, and we wanted to share a bit about how you can expect to see your Second Life experience improve with these initiatives.

Graphics Settings Benchmarking:
This is a new way of figuring out the best default graphics settings. Maybe this has happened to you: you got an awesome new graphics card, fired up SL… only to discover your graphics settings are set to Low, and can’t be changed? No more! This Viewer does away with the old GPU table and instead uses a quick benchmark measurement to detect your GPU to assign appropriate default graphics settings on startup. The settings on shiny powerful hardware should really let that hardware shine. Get a Project Benchmark Viewer today and help us gather metrics!  Please file bugs in JIRA if you find them.

Installation and Login screen changes:
A new look for the login screen is coming in stages. We’re tweaking the login screen and A/B testing the results. We’ve simplified the new user login screen to remove distractions, and we’re adding instructions for new downloads and installs. We’re also streamlining the returning user login screen to help you get where you’re going faster - or find a new place to visit. You’re likely to see some incremental changes continue over the coming weeks.

Performance Improvements:
Two complementary projects are going in neck-and-neck:

  • A new texture and mesh asset service (a CDN) is in testing on a limited set of regions, and so far it’s showing encouraging results. Particularly for those who login from places far away from our US data centers, this has the potential to significantly improve how quickly textures and meshes load. In the coming weeks, we will expand the number of regions using this new service as the next step in our testing.

  • We’re also taking the next step with the HTTP project - Pipelining. We will soon put out a viewer that will pipeline HTTP requests for texture and mesh fetches, improve inventory folder and item fetches, and have some general adjustments for using a CDN-enabled grid.

Separately, each of these will improve texture and mesh loading performance, but put together, you should really see some exciting improvements in how long it takes to load new areas and objects - making touring the many fabulous places in Second Life you have not yet visited even better!



Linden Lab

User-submitted bug reports help improve the Second Life experience for all Residents, so we greatly appreciate all of you who take the time to provide this invaluable information to us.

Because we want to make it even easier to report bugs, today we are making some changes that will streamline the bug reporting process, allowing us to more quickly collect information and respond to issues.

Following is a summary of the JIRA changes:

  • All bugs should now be filed in the new BUG project, using the more streamlined submission form.
  • Second Life users will only see their own reported issues.  When a Bug reaches the "Been Triaged" status, they will no longer be able to add comments to their issue.
  • Once a Bug reaches the “Accepted” or “Closed” status, it will not be updated. You can watch the Release Notes to see when and if a fix has been released for your issue.
  • Existing JIRAs will remain publicly visible. We will continue to review and work through these.


To those of you who have taken the time to alert us to bugs and provided the information we need to fix them -- thank you! We hope that you will continue to help us improve Second Life, and this new process should make it easier for all of us. Ideas about how we can continue to improve the bug reporting process can be shared here.

For more information, visit:
How to report a BUG (Knowledge Base Article):
Bug Tracker (wiki page):
Bug Tracker Status/Resolutions (wiki page):

Linden Lab

UPDATE: 04.24.2015 - For an update on how the Tool Viewer Release affects these systems, visit this blog.


 


We have made some changes to the Second Life System Requirements to bring them more up to date, and are making some related changes to the Viewer:




  • We have removed Windows XP and Mac OSX 10.6 from the list of supported operating systems. Microsoft has announced the end of support for XP, and it has been some time since Apple has released updates for 10.6. For some time now, the Viewer has been significantly less stable on these older systems, and the lack of security updates to them make them more hazardous to use.
    We have no plans to actually block those systems, but problems reported on them that cannot be reproduced on supported systems will not likely be fixed.




  • The Windows installer has been modified to verify that the system has been updated with the most recent Service Packs from Microsoft. While we will not block installation on Windows 8 at this time, we strongly recommend upgrading to 8.1 for greater stability. Our data shows that the Viewer is significantly less stable on systems that have not been kept up to date, so the installer will now block installation until the updates have been applied. This change will be effective in a Viewer version to be released in the next few weeks, so it would be a good idea to get your system up to date before then. You can find information on how to install the latest updates at the Microsoft Windows Update page.



Linden Lab

Yesterday, with much rejoicing, we promoted Viewer 3.7.28.300918 (Tools Update) to release. While this Viewer doesn’t have a shiny new featureset on the surface (other than reverting to a single-button login), it’s what’s inside that really matters - we’ve updated the numerous tools used to build the Viewer. The immediate expected effect of this is performance stability and a decreased crash rate.

We go to great lengths to maintain backwards compatibility in Second Life, both to never break users’ creations and to support the wide range of systems our Residents use to log in. However, sometimes we have to make the hard decisions: a year ago we announced that we were dropping support for Windows XP and Mac OSX 10.5 & 10.6 (a complete list of current system requirements is available here). Today, with the Tools Update release, the Viewer will no longer run on those systems. You will still be able to log in with an older Viewer until it is aged out based on our deprecation policy, however we strongly recommend updating your system.

It's unfortunate that we have to stop supporting some older systems, but upgrading the tools we use to build the viewer will help us to bring you other improvements to your Second Life experience more quickly and reliably.

Linden Lab

Over the past week, a number of Second Life customers may have noticed that they were not being billed promptly for their Premium membership subscriptions, mainland tier fees, and monthly private region fees, with some customers inadvertently receiving delinquent balance notices by email, as we described on our status blog.

This incident has now been corrected, and our nightly billing system has since processed all users that should have been billed over the past week.

I wanted to share with you some of the details of what happened to cause this outage from our internal post mortem and more importantly, what we’re doing to prevent this happening in the future.

Summary

Every night, one of our batch processes collects a list of users that should be billed on that day, and processes that list through one of our internal data service subsystems. Internally, we refer to this process as the 'Nightly Biller'.

A regularly scheduled deploy to that same data service subsystem for a new internal feature inadvertently contained a regression which prevented this Nightly Biller process from running to completion.

Event Timeline
On February 1st, 2016, we began a rolling deploy of code to one of our internal dataservice subsystems. For this particular deploy, we opted to deploy the code to the backend machines over four days, deploying to six hosts each day. The deploy was completed on February 4th.

The first support issue regarding billing issues was raised on February 8th, however, as we only had one incident reported to our payments team, we decided to wait and see if they were billed correctly the next night.

However, we were notified on the morning of February 9th that 546 private regions had not been billed, and an internal investigation began with a team assembled from Second Life Engineering, Payments, QA and Network Operations teams. This team identified the regression by 9am, and had pushed the required code fixes to our build system. By noon, the proposed fix was pushed to our staging environment for testing.

Unfortunately, testing overnight uncovered a further problem with this new code that would have prevented new users from being able to join Second Life. On February 10th, investigation into this failure, and how it was connected with both the Nightly Biller system, and our new internal tool code continued.

By February 11th, we had made the decision to roll back to the previous code version that would have allowed the Nightly Biller to complete successfully, but would have disabled our new internal feature. One final review of the new code uncovered an issue with an outdated version of some Linden-specific web libraries. Once these libraries were updated and deployed to our staging environment, our QA team were able to successfully complete the tests for our Nightly Biller, our new internal tool, and the User Registration flow.

The new code was pushed out to our production dataservice subsystem by 7pm on February 11th, and the Payments team were able to confirm that the Nightly Biller ran successfully later that evening.

Follow Up
As a result of this incident, we’re making some internal process changes:

  • Firstly, we’ll be changing our build system to ensure that when new code is built, we’re always using the latest version of our internal libraries.
  • Secondly, we are implementing changes to our workflow around code deploys to ensure that such regressions do not occur in the future

Conclusions
We're always striving for low risk software deploys at the Lab, and each code deploy request made is evaluated for a potential risk level. Further reducing our risk is internal documentation that describes the release process. Unfortunately, a key step was missed in the process, which inadvertently led to a high risk situation, and the failure of our nightly biller. The above changes are already in progress which will reduce the likelihood of incidents such as this recurring.

 

Steven Linden

Linden Lab

The Snowstorm project is happy to announce that we have a Materials Project Viewer available for alpha testing of support for Normal and Specular maps on Second Life objects!  When generally available, this will enable content creators to make much richer looking materials with less geometric complexity.

This is a very early test version - there are still known bugs, and certainly some that we don't know about yet. We do not recommend using it to manipulate content that you care about. Bugs should be reported in the Materials Project Viewer (MATBUG) project on Jira.  If you use it, please leave automatic updates enabled, because we expect to release updates relatively frequently.

Information on how to prepare textures in a way that's compatible with the Materials Project Viewer available on the wiki.

This project is a great example of how we can improve Second Life through collaboration: it includes contributions from members of the Exodus, Firestorm, and Catznip development teams, as well as from Linden Lab. We don't yet have a target date for when it will be in the regular Release Viewer, but we're working toward that, and your help testing will make that happen sooner and with higher quality.

Linden Lab

Chris Linden here. I wanted to briefly share an example of some of the interesting challenges we tackle on the Systems Engineering team here at Linden Lab. We recently noticed strange failures while testing a new API endpoint hosted in Amazon. Out of 100 https requests to the endpoint from our datacenter, 1 or 2 of the requests would hang and eventually time out. Strange. We understand that networks are unreliable, so we write our applications to handle it and try to make it more reliable when we can.

We began to dig. Did it happen from other hosts in the datacenter? Yes. Did it fail from all hosts? No, and this was maddening to be true. Did it happen from outside the datacenter? No. Did it happen from different network segments within our datacenter? Yes.

Sadly, this left our core routers as the only similar piece of hardware between the hosts showing failures and the internet at large. We did a number of traceroutes to get an idea of the various paths being used, but saw nothing out of the ordinary. We took a number of packet captures and noticed something strange on the sending side.

1521   9.753127 216.82.x.x -> 1.2.3.4 TCP 74 53819 > 443 [sYN] Seq=0 Win=29200 Len=0 MSS=1400 SACK_PERM=1 TSval=2885304500 TSecr=0 WS=128
1525   9.773753 1.2.3.4 -> 216.82.x.x TCP 74 443 > 53819 [sYN, ACK] Seq=0 Ack=1 Win=26960 Len=0 MSS=1360 SACK_PERM=1 TSval=75379683 TSecr=2885304500 WS=128
1526   9.774010 216.82.x.x -> 1.2.3.4 TCP 66 53819 > 443 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2885304505 TSecr=75379683
1527   9.774482 216.82.x.x -> 1.2.3.4 SSL 583 Client Hello
1528  10.008106 216.82.x.x -> 1.2.3.4 SSL 583 [TCP Retransmission] Client Hello
1529  10.292113 216.82.x.x -> 1.2.3.4 SSL 583 [TCP Retransmission] Client Hello
1530  10.860219 216.82.x.x -> 1.2.3.4 SSL 583 [TCP Retransmission] Client Hello

We saw the tcp handshake happen, then at the SSL portion, the far side just stopped responding. This happened each time there was a failure. Dropping packets is normal. Dropping them consistently at the Client Hello every time? Very odd. We looked more closely at the datacenter host and the Amazon instance. We poked at MTU settings, Path MTU Discovery, bugs in Xen Hypervisor, tcp segmentation settings, and NIC offloading. Nothing fixed the problem.

We decided to look at our internet service providers in our datacenter. We are multi-homed to the internet for redundancy and, like most of the internet, use Border Gateway Protocol to determine which path our traffic takes to reach a destination. While we can influence the path it takes, we generally don't need to.

We looked up routes to Amazon on our routers and determined that the majority of them prefer going out ISP A. We found a couple of routes to Amazon that preferred to go out ISP B, so we dug through regions in AWS, spinning up Elastic IP addresses until we found one in the route preferring to go out ISP B. It was in Ireland. We spun up an instance in eu-west-1 and hit it with our test and ... no failures. We added static routes on our routers to force traffic to instances in AWS that were previously seeing failures. This allowed us to send requests to these test hosts either via ISP A or ISP B, based on a small configuration change. ISP A always saw failures, ISP B didn't.

We manipulated the routes to send outbound traffic from our datacenter to Amazon networks via the ISP B network. Success. While in place, traffic preferred going out ISP B (the network that didn't show failures), but would fall back to going out ISP A if for any reason ISP B went away.   

After engaging with ISP A, they found an issue with a piece of hardware within their network and replaced it. We have verified that we no longer see any of the same failures and have rolled back the changes that manipulated traffic. We chalk this up as a win, and by resolving the connection issues we've been able to make Second Life that much more reliable.

 

Linden Lab

Latest Update on Mesh

With the initial release of mesh in August, Linden Lab introduced an entirely new way to measure the impact that objects have in Second Life. Since then, we’ve seen that providing customers with this additional insight has allowed them to design with more performance efficiency.

Now, with this new release of mesh, creators get a more detailed understanding of the actual impact of objects on the world around them. Being able to see Resource Weights, as well as how they drive Land Impact, helps creators more easily build responsive and low-lag locations, improving the overall resident experience.

To better show what effect an object will have inside Second Life, we have developed four weight values: Download, Physics, Server, and Display. Each of these values can be used by creators to determine the impact that a given object will have on performance from different perspectives (e.g. client vs. server) and under different circumstances.

 

CjyEHUwDoHSOINwa9Fg-n6wm8nhJqqDcVkBcYMfXKOLMQ8jbbxdUzqdjpJnD76JDUT5aYhwice4earHwql79zA1vWlPyEMBBJe7nY2orIAm9l0lDSus

Here’s how they break down:


  • Download measures how much bandwidth we need to send out the information about an object so that everyone can see it.
  • Physics is a value that shows how difficult the object is for the physics and Havok processes to create, track, and update the object.  
  • Server is a value that shows you how difficult the object is for the server processes to create, track, and update the object.  
  • Land Impact (formerly PE or Prim Equivalence) is the highest of the above server-side weights. It defines a theoretical limit under which performance remains acceptable for residents.

 

  • Render is a value that shows you how difficult it is for the client renderer (your viewer) to draw the object. Because it’s related to the client, this is informational only. This is the same calculation used to create Avatar Rendering Weight.


    Knolwedge Base Article on Weights and Impact.


How you see it in the viewer:


WH_NdAFmN4D9wDfs78LgyW5CO8IUZxrgRWWVve5KdV1ROknnqtM1YwrWZgTI9LALwAA9LgTA6dyHQzStdBq6IJIxcJOm_NcknH-1NCLw66QUJM3S5nw



We hope understanding the relationship between objects and performance will help land owners and creators make even more efficient content. As always, we welcome feedback to help us make this information even easier to understand.


Linden Lab

Last week we deployed the change to serve all texture and mesh data primarily through the CDN, as we've been doing with avatar textures since March. In addition to reviewing feedback from Residents we've been monitoring and measuring the effects of the change, and thought it would be interesting to share some of what we've learned.

First the good news:

  • Load on some key systems on the simulator hosts has been reduced considerably. The chart below shows the frequency of high-load conditions in the simulator web services, and you can see the sharp drop as the CDN takes on much of that job. This translates into other things, including region crossings and teleports, being faster and more reliable.
  • For most users most of the time there has been a big performance improvement in texture and mesh data loading, resulting in faster rez times in new areas. The improvement has been realized both on the official viewer and on third party viewers.

cdn2.png

However, we have also seen that some users have had the opposite experience, and have worked with a number of those users to collect detailed data on the nature of the problems and shared it with our CDN provider. We believe that the problems are the result of a combination of the considerable additional load we added to the CDN, and a coincidental additional large load on the CDN from another source. Exacerbating matters, flaws in both our viewer code and the CDN caused recovery from these load spikes to be much slower than it should have been. We are working with our CDN provider to increase capacity and to configure the CDN so that Second Life data availability will not be as affected by outside load. We are also making changes to our code and in the CDN to make recovery quicker and more robust.

We are confident that using the CDN for this data will make the Second Life experience better. Making any change to a system at the scale of Second Life has some element of unavoidable risk; no matter how carefully we simulate and test in advance, once you deploy at scale in live systems there's always something to be learned. This change has had some problems for a small percentage of users; unfortunately, for those users the problems were quite serious for at least part of the time. We appreciate all the help we've gotten from users in quickly diagnosing those problems. We think that the changes we've begun making will reduce the frequency of failures to below what they were before we adopted the CDN, while keeping the considerable performance gains.

 

Linden Lab

CDN Unleashed

11.07.2014: An update on performance improvements and adjustments is available here.

 

Second Life was originally designed for nearly all data and Viewer interactions to go through the Simulator server. That is, the Viewer would talk almost exclusively to the specific server hosting the region the Resident was in. This architecture had the advantage of giving a single point of control for any session. It also had the disadvantage of making it difficult to address region resource problems or otherwise scale out busy areas.

Over the years we’ve implemented techniques to get around these problems, but one pain point proved difficult to fix: asset delivery, specifically textures and meshes. Recently we implemented the ability to move texture and mesh traffic off the simulator server onto a Content Delivery Network (CDN), dramatically improving download times for Residents while significantly reducing the load on busy servers.

Download times for textures and meshes have been reduced by more than 50% on average, but outside of North America those the improvements are even more dramatic. That is great news, but the most amazing improvement has been on the simulator servers themselves. The following chart graphs servers on a production release-candidate channel with high HTTP load conditions before and after we rolled the CDN code onto them:

CDN.png

The high load conditions almost completely disappeared! We knew that we would get a major drop in load with the move, but this blew us away. At first we didn’t believe it and spent two days trying to figure out what we did wrong. There was nothing wrong; this was real.

The results of all of this are faster scene loads, quicker object rezzing, far fewer problems with fuzzy or cloudy avatars, fewer teleport failures, and more! The feedback from Residents has been fantastic. We’re loving it, too! Everything is just so much snappier.

We have finished rolling the CDN code out to the grid, and the results have remained extraordinary. This week, we are also fully releasing our HTTP Project Viewer, which will make the CDN change even better by taking advantage of the elimination of server-side rate limiting. We have been extremely happy with the results so far (psst, we’re talking an 80% reduction in content download times).The CDN benefits are available to everyone regardless of which Viewer you choose to use. All users of the official Viewer will also be able to enjoy the results of the HTTP improvements, and third party developers are able to adopt these changes in their Viewers as well.

We are very happy to be finally releasing these improvements to everyone. Give it a try and let us know what you think!

×