Jump to content

Tools and Technology

  • entries
    80
  • comments
    3,233
  • views
    81,234

Contributors to this blog

About this blog

Entries in this blog

Linden Lab

Some of you know me as Soft Linden. I’m the information security manager at Linden Lab.

A large number of you attended the Tilia Town Hall  last week. Aside from the many questions you had about how Tilia affects Second Life L$ and monetary activity, privacy was a common concern. Grumpity asked if I would answer a few of the questions about Tilia privacy and security which surfaced in the town hall and in our forums. This has been a busy time for everybody who has worked on Tilia, but I’m glad I can take a few moments to share some information.
 

Where did the Tilia team come from? And why should I trust Tilia with my personal information?
 

The Tilia team is made up of people you previously knew as Linden Lab employees. We’re part of this team because we are passionate about privacy and security. Tilia includes employees who use Second Life alts in our free time. We know many of you as friends and creators in Second Life. So not only are our practices aimed at complying with an ever expanding list of U.S. regulations and laws, but we strive to go above and beyond. We want to protect the best interests of ourselves, our friends, and the countless Residents who support the world we love. We fully believe that Second Life wouldn’t be possible without working to earn your trust.

For example, we don’t like the way many other companies resell customer information. Because we disagree with those practices, the information you store with Tilia is never provided to third parties for purposes such as marketing. We want you to feel confident that you can play, experiment, and explore in Second Life without outside strangers learning anything about you which you have not shared under your own initiative.

We won’t even provide that information to the US government unless we are compelled to do so through a legal process such as a subpoena or a search warrant. 

But the privacy and security story goes much, much further.


Does Tilia change how my information is secured?
 

Yes! This project began years ago. Quite a bit of the work we do to improve Second Life is "behind the scenes" - things that users cannot directly interact with. Often it's not even possible for users to detect that something has changed. This is one such case.

A few years ago, we looked at Second Life, and how information security has evolved in the time since Second Life was created. We asked ourselves how we could better protect our most sensitive customer information.

Our engineers created a new “personal information vault” project. This vault uses modern algorithms to encrypt sensitive information in a way that would require both enormous computing power and an enormous amount of memory for an attacker to crack… if they could even get a copy of the encrypted data. These algorithms are specifically tuned to defeat expensive decryption acceleration hardware. And all of this new encryption is wrapped around the encryption we already used - encryption which was the industry standard at the time. These are entire new layers using encryption technologies which didn’t exist when Second Life was new.

Even after all of these changes, the old protection remains in place at the bottom of that stack. Figuratively speaking, we locked the old vault inside a bigger, stronger vault. We chose an approach where we didn’t need to decrypt information in order to enhance your protection.

There is another key part of this project: Our storage mechanisms for sensitive customer information are now isolated from Second Life. The information isn’t stored at the same physical location anymore, and hasn’t been for a while. But the difference is more than physical.

Second Life’s servers do not have direct access to Tilia information that isn’t required for daily Second Life usage. Even developers who have worked at the company for a dozen years - developers who have full access to every last Second Life server - do not have access to the servers that store and protect the most sensitive information. A policy of least privilege means fewer opportunities for mistakes.

Even within Tilia, key information is further segmented. This means that compromising one database inside of Tilia is insufficient to decrypt and correlate sensitive data without compromising a different service. We have deployed numerous commercial products which help monitor for access, abuse, or data copying attempts for data that is made available to Tillia employees. This means that even an attacker with all employee access credentials, access to employee multifactor authentication tokens, and all Tilia access permissions would still face some challenges in avoiding early detection.

That was a lot to explain. But it is all important, because this is the technical foundation of Tilia. It’s a core piece of the Tilia story, and it is something we have worked on for years. Tilia was created in large part because we saw an opportunity to share these technologies with other businesses.

These technologies are in place today for all of the information you entrust Tilia to handle. 

I am proud of what our engineers have accomplished. These same technologies are only in the planning stages at other companies and institutions. Many of the bigger businesses who already handle sensitive data like credit reports and medical records are working to complete similar projects. But we have it today.
 

It sounds like a lot has changed at once. Aren’t large changes risky?
 

Tilia was designed with security and privacy as its primary considerations. These considerations apply not only to what we create, but how we create it, and how we validate ongoing changes to what we create.                                

For Tillia, we chose a newer security-focused programming language over Python and C++, the older languages which make up much of Second Life. It’s more difficult to make security errors in modern security-focused languages, but it’s not impossible. This is why we have created thousands of automated tests which exercise nearly every aspect of Tilia. Every change to Tilia triggers the execution of these tests, and the change is rejected if it causes nonconformant behavior.

The Tillia team also pays a security testing company to attempt to hack Tilila and perform routine vulnerability assessments. Any Tilia service that is exposed to Second Life users is also exposed to outside security testers. These testers evaluate changes in a staging environment before they are ever presented to Second Life users.

We enlisted outside specialists to review some of our key privacy and security practices and procedures. We then invited a team from Amazon Web Services to sit in our offices with us and review every aspect of our service deployment and hosting infrastructure.

Every step we have taken has been cautious. When it comes to privacy and security, the Tilia engineering team believes that the tortoise wins the race.
 

What does Tilia mean for Second Life privacy and security in the future?
 

We have many plans for Tilia. Additional work is already under way.

While we have already moved regulated information out of Second Life and into Tilia, we are actively migrating additional forms of information. Now that we have a new privacy and security foundation, we can extend the amount of information that enjoys this level of protection. If it pertains to your real life identity, we believe in leveraging Tilia protection wherever possible.

Tilia will enable future Second Life projects as well. We designed Tilia to support additional business customers, so we are able to justify larger privacy and security projects to benefit new business customers and existing Second Life Residents alike.

Aside from ensuring compliance with upcoming privacy and security regulations, our early goals are largely driven by Second Life. These goals include the option for users to select stronger authentication mechanisms, better mechanisms for our team to identify callers who request account help, and additional tools which support our fraud protection team.

As to Second Life itself, by relieving the team of many of the heaviest privacy and security burdens, we believe we can help them be even more effective in developing the virtual world we all love.

Stay tuned to see what we can do.

Soft Linden

Linden Lab

Hi Residents!

We had one of the longest periods of downtime in recent memory this week (roughly four hours!), and I want to explain what happened.

This week we were doing much needed maintenance on the network that powers Second Life. The core routers that connect our data center to the Internet were nearing their end-of-life, and needed to be upgraded to make our cloud migration more robust.

Replacing the core routers on a production system that’s in very active use is really tricky to get right. We were determined to do it correctly, so we spent over a month planning all of the things we were going to do, and in what order, including full rollback plans at each step. We even hired a very experienced network consultant to work with us to make sure we had a really good plan in place, all with the goal of interrupting Second Life as little as we could while improving it.

This past Monday was the big day. A few of our engineers (including our network consultant) and myself (the team manager) arrived in the data center, ready to go.  We were going to be the eyes, ears, and hands on the ground for a different group of engineers that worked remotely to carefully follow the plan we’d laid out. It was my job to communicate what was happening at every step along the way to my fellow Lindens back at the Lab, and also to Residents via the status blog. I did this to allow the engineering team to focus on the task at hand.

Everything started out great. We got the first new core router in place and taking traffic without any impact at all to the grid. When we started working on the second core router, however, it all went wrong.

As part of the process of shifting traffic over to the second router, one of our engineers moved a cable to its new home. We knew that there’d be a few seconds of impact, and we were expecting that, but it was quickly clear that something somewhere didn’t work right. There was a moment of sheer horror in the data center when we realized that all traffic out of Second Life had stopped flowing, and we didn’t know why.

After the shock had worn off we quickly decided to roll back the step that failed, but it was too late. Everyone that was logged into Second Life at the time had been logged out all at once. Concurrency across the grid fell almost instantly to zero. We decided to disable logins grid-wide and restore network connectivity to Second Life as quickly as we could.

At this point we had a quick meeting with the various stakeholders, and agreed that since we were down already, the right thing to do was to press on and figure out what happened so that we could avoid it happening again. We got a hold of a few other folks to communicate with Residents via the status blog, social media, and forums, and I kept up with the internal communication within the Lab while the engineers debugged the issue.

This is why logins were disabled for several hours. We were determined to figure out what had happened and fix the issue, because we very much did not want it to happen again. We’ve engineered our network in a way that any piece can fail without any loss of connectivity, so we needed to dig into this failure to understand exactly what happened.

After almost four very intense hours of debugging, the team figured out what went wrong, worked around it, and finished up the migration to the new network gear. We reopened logins, monitored the grid as Residents returned, and went home in the middle of the night completely wiped out.

We’ve spent the rest of this week working with the manufacturer of our network gear to correct the problem, and doing lots of testing. We’ve been able to replicate the conditions that led to the network outage, and tested our equipment to make sure it won’t happen again. (Even they were perplexed at first! It was a very tricky issue.) As of the middle of the week we’ve been able to do a full set of tests including deliberately disconnecting and shutting down a router without impact to the grid at all.

Second Life is a really complex distributed system, and it never fails to surprise me. This week was certainly no exception.

I also want to answer a question that’s been asked several times on the forums and other places this week. That question is “why didn’t LL tell us exactly when this maintenance was going to happen?”

As I’ve had to blog about several other times in the past, the sad reality is that there are people out there who would use that information with ill intent. For example, we’re usually really good at handling DDoSes, but it requires our full capacity being online to do it. A DDoS hitting at the same time our network maintenance was in progress would have made the downtime much longer than it already was.

We always want what’s best for Second Life. We love SL, too. We have to make careful decisions, even if it comes at the expense of being vague at times. I wish this wasn’t the case, but sadly, it very much is.

We’re really sorry about this week’s downtime. We did everything we possibly could have to try to avoid it, and yet it still happened. I feel terrible about that.

The week was pretty awful, but does have a great silver lining. Second Life is now up and running with new core routers that are much more powerful than anything we’ve had before, and we’ve had a chance to do a lot of failure testing. It’s been a rough week, but the grid is in better shape as a result.

Thanks for your patience as we recovered from this unexpected event. It’s been really encouraging to see the support some folks have been giving us since the outage. Thank you, you’ve really helped cheer a lot of us up. ❤️
 

Until the next time,
April Linden
Second Life Operations Manager

 

Linden Lab

Over seven years ago, I posted my first set of Viewer release notes to the Second Life Wiki, where we have kept all of our release notes to this day. Over the years, we’ve made some minor tweaks to the appearance and how we generate them, but for they most part they have remained the same.

While the wiki has served us well for release notes, it’s time to improve their readability and browsability. We’ve been putting together the finishing touches to a new website dedicated solely to release notes, with a new look and feel that makes the individual pages easier to find, and easier to read - take a look!

Previous release notes will still be archived on the wiki, however, new releases will be shared and published on the new website.

Our goal is to improve overall accessibility and ease in browsing and reviewing release notes. I, personally, am excited to see the dedicated new website and hope you are too!

Steven Linden  

Linden Lab

Due to continued changes in the Facebook API, as of today the Second Life viewer will no longer be able to support Facebook Connect for sharing your inworld photos and posts.  We apologize for this inconvenience and will be removing the UI from the viewer shortly. We will, of course, be happy to see your SL posts on Facebook going forward, and you can always say hello and check out what’s happening on our official page: https://www.facebook.com/secondlife

Linden Lab
Many Residents have noted that in the last few weeks we have had an increase in disconnects during a teleport. These occur when an avatar attempts to teleport to a new Region (or cross a Region boundary, which is handled similarly internally) and the teleport or Region crossing takes longer than usual.  Instead of arriving at the expected destination, the viewer disconnects with a message like:

Darn. You have been logged out of Second Life.
You have been disconnected from the region you were in.
We do not currently believe that this is specific to any viewer, and it can affect any pair of Regions (it seems to be a timing-sensitive failure in the hand-off between one simulator and the next).  There is no known workaround - please continue logging back in to get where you were going in the meantime.
We are very much aware of the problem, and have a crack team trying to track it down and correct it. They’re putting in long hours and exploring all the possibilities. Quite unfortunately, this problem dodged our usual monitors of the behavior of simulators in the Release Channels, and as a result we're also enhancing those monitors to prevent similar problems getting past us in the future.
We're sorry about this - we empathize with how disruptive it has been.
Linden Lab
There are a number of scripts available that animate the Group Role title for your avatar. While no doubt entertaining, these take advantage of a functionality, which was never intended to serve this purpose.  Server side changes which will break Group Tag Animators are going to be rolling out to the grid soon. This change is deliberate: it will sharply restrict the rate at which these updates are allowed. The new limits are generous enough that a human changing things through a viewer should not exceed them, but strict enough to break the 'title animators'.

The current Marketplace products which do this are being removed and the sellers notified.

Image from xkcd
Linden Lab
You may have been among those who had problems seeing avatars yesterday; this is a quick note to share with you what went wrong and what comes next.
Early yesterday morning, we deployed an update to the backend service that creates the baked textures and other data that make up your avatar’s appearance. The time was chosen to be at a low point in concurrency because we knew the update would create unusual load (making all-new appearances for every active avatar). It turned out that the load was a little higher than we expected; that probably would have been OK, but we had two simultaneous system failures in the servers that provide some of the key data to the service. By the time we had diagnosed those failures, the backlog of work to be done had grown considerably and the number of active users had also increased. It took a few hours for the backlog of avatar appearances to get caught up.
We realize how disruptive events like these can be (our own inworld meetings were full of mostly-gray Lindens and clouds yesterday), and very much regret the inconvenience.
We're increasing the redundancy of these services and modifying our deploy procedures to avoid a repeat of this kind of failure in the future.
The good news is that this was one of the last backend changes needed for us to be able to roll out the new Bakes-on-Mesh feature on agni; it will, with the updated viewer, allow you to apply system clothing and skins on your avatar even when you're using mesh body parts. Look for the viewer release announcement soon.
        - Oz Linden
Linden Lab
Hello everyone,
The software which we use for keeping track of the bugs found in Second Life is long overdue for an upgrade.  If you've never interacted with the system before, you don't need to start now; however if you are one of the dedicated Residents who spends time helping us improve Second Life via https://jira.secondlife.com then this message is for you.
We are planning the upgrade on Wednesday, August 29, 2018, starting at 8:30 pm PDT.  We're allowing a 6-hour window to give us time to chase down any problems, though hopefully we will be done much more quickly.
If you are interested in what this upgrade actually means, the good news is that most of the changes should be improvements behind the scenes, or cosmetic upgrades.  The look and feel of the user interface will be changing, which isn't surprising since we are going from Jira version 5 to version 7. Importantly for some of you, the new login system actually updates the email address that jira uses for you, every time you log in, instead of only the very first time as the current system does.
The very first change you will likely see is a new login page:

 

If you want to see more about the upgrade, there are a few links on the Atlassian (maker of Jira) website: here and here.
Mention this blog to me inworld and get a free Linden Bear!
Ekim Linden
Linden Lab
We are pleased to announce that our newest viewer update (5.1.0.511732 AlexIvy) is the first Linden Lab viewer to be built as a 64-bit application on both Windows and Mac. We'd like to send a shout out to the many third party viewer developers who helped with this important improvement!  For Windows users whose systems are not running 64-bit yet,  there is a 32-bit build available as well; you don't need to figure out which is best for your system because the viewer will do it for you (see below, especially about upgrading your system).
Building the viewer as a 64-bit application gives it access to much more memory than before, and in most cases improves performance as well. Users who have been running the Release Candidate builds have had significantly fewer crashes.
This viewer also has updates to media handling because we've updated the internal browser technology. This version will display web content better than before, and more improvements in that area are on the way. You may notice that this version runs more processes on your system for some media types; this is expected.
There is one other structural difference that you may notice. The viewer now has one additional executable - the SL_Launcher. This new component manages the viewer update process, and on Windows also ensures that you've got the best build for your system (in the future it may pick up some other responsibilities). For Windows systems, the best build is usually the one that matches your operating system. For example, if you're running a 64-bit Windows, then you’ll get the 64-bit viewer. If not, then you’ll get the 32-bit viewer.  However, some older video cards are not supported by Windows 10, so the launcher may switch you to the 32-bit build which is compatible for those cards. You won’t have to do anything to make this work - it's all automatic - if you get an update immediately the first time you run this new viewer, it's probably switching you to the better build for your system.
Important: If you have created shortcuts to run the viewer, you should update them to run the SL_Launcher executable (if you don't, the viewer will complain when you run it, and updates won't work). On Macs, the SL_Launcher and Second Life Viewer processes both show as icons on the Dock when running (hover over them to see which is which); this is known bug, and in a future update we'll fix it so they only show as a single icon - we apologize for the temporary inconvenience, but think you'll agree that the performance improvement (quite noticeable on most Macs) is worth it.
Having a 64-bit viewer will help to make your SL experience more reliable and performant (and we have quite a few projects in the queue for this year to that end). However, if you're running older versions of Windows, and especially if you're not running a 64-bit version, you won't be able to get most of those benefits. In our Release Candidate testing, users on 32-bit Windows are seeing crash rates as much as three times as often as those on 64-bit Windows 10. Almost any Windows system sold in the last several years can run 64-bit Windows 10, even if it didn't come with that OS originally. We strongly suggest that upgrading will be worth your while (this is true even if you run a Third Party Viewer, by the way).
About Linux … at this time, we don't have a Linux build for this updated viewer. We do have a project set up to get that back. We're reorganizing the Linux build so that instead of a tarball, it produces a Debian package you can install with the standard tools, and rather than statically linking all the libraries it will just declare what it needs through the standard package requirements mechanism. We'll post separately on the opensource-dev mailing list with information on where that project lives and how to contribute to it.
A fun bit of trivia:  AlexIvy name comes from LXIV, the roman numerals for 64.
Best Regards,
Oz Linden
Linden Lab
Hi everyone! Mazidox here. I’d like to give you an overview of what happened on Wednesday (09/06) that ended up with some Residents’ objects being mass returned.
Two weeks ago, we had several problems crop up all at once - starting with a DNS server outage (a server that helps route requests between different parts of Second Life). Unfortunately, when the dust settled, we started seeing a disturbing trend: mass-returns of objects.
We diagnosed an issue where a region starts up with incorrect mesh Land Impact calculations, which could lead to a lot of objects getting returned at once, as we had encountered several months ago. At that time we applied what we call a speculative fix. A speculative fix means that while we can’t recreate the circumstances that led to a problem, we are still fairly confident that we can stop it from happening again. Unfortunately, in this case we were mistaken; because the fix we applied was speculative, the problem wasn’t fixed as completely as it could have been, and we found out how incomplete the fix was in a dramatic fashion that Wednesday night.
When a problem like this occurs with Second Life we have three priorities:
Stop the problem from getting worse
Fix the damage that has been done
Keep the problem from happening again
We had the first priority taken care of by the end of the initial outage; we could be certain at that point that our servers could talk to each other and there weren’t going to be any more mass-returns of objects that day. At that point, we started assessing the damage and figuring out how to fix as much as we could. In this case it turned out that restarted affected regions where no objects had been returned fixed the problem of some meshes showing the wrong Land Impact.
For regions where a mass-return had happened, there wasn’t a quick fix. Our Ops team managed to figure out a partial list of what regions were affected by a mass object return, which kept our Support team very busy with clean up. Once we helped everyone we knew, who had experienced mass object returns our focus shifted once more, this time to keeping the problem from happening again.
In order to recreate all the various factors that caused this object return we needed to first identify each contributing factor, and then put those pieces together in a test environment. Running tests and finding strange problems is the Server QA team’s specialty so we’ve been at it since the morning after this all happened. I have personally been working to reproduce this, along with help from our Engineering and Ops teams. We’re all focused on trying to put each of the pieces together to ensure that no one has to deal with a mass-return again.
Your local bug-hunting spraycan,
Mazidox Linden
Linden Lab
Recently, our bug reporting system (Jira) was hit with some spam reports and inappropriate comments, including offensive language and attempts at impersonating Lindens. The Jira system can email bug reporters when new comments are added to their reports, and so unfortunately the inappropriate comments also ended up in some Residents' inboxes.
We have cleaned up these messages, and continue to investigate ways to prevent this kind of spam in the future. We appreciate your understanding as we work to manage an open forum and mitigate incidents like this. 
In the short term, we have disabled some commenting features to prevent this from recurring. This means that you will not be able to comment on Jiras created by other Residents. We apologize for this inconvenience as we look into long term solutions to help prevent this type of event from occurring.




 
Xiola Linden
The AssetHttp Project Viewer is now available on the alternate viewers page!  We expect it to help with speed and reliability in fetching animations, gestures, sounds, avatar clothing, and avatar body parts - but we need your help to make sure everything works!
Historically, loading assets such as textures, animations, clothing items and so on has been an area where problems were common. All such items were requested through the simulator, which would then find the items on our asset hosts and retrieve them. Especially in heavily populated regions it was possible for things to fail to load because the simulators would get overloaded. Having these requests routed through an intermediate host also made them much slower than necessary. 
A few years ago we made the change to allow textures to be loaded directly via HTTP. Now instead of asking the simulator for every texture, the Viewer could fetch textures directly from a CDN, the same way web content is normally distributed. Performance improved and people noticed that the world loaded faster, and clouded or gray avatars became much less common. 
Today we are taking the next step in enabling HTTP-based fetching. If you download this test Viewer, then several other common types of assets will also be retrieved via HTTP. Supported asset types include animations and gestures, sounds, avatar clothing, and avatar body parts such as the skin and shape.
Based on our tests, this change also helps with performance. We need your help to make sure it works for people all over the world, and to identify any remaining issues that we need to fix. Please download the Viewer and give it a spin; we hope it will make your Second Life even faster and more reliable.
Sometime after these changes go into the default Viewer download, we will be phasing out support on the simulators for the old non-HTTP asset fetching process. We will let you know well ahead of that time so you can download a supported Viewer, and so that makers of other viewers will have time to add these changes to their products. Please let us know how it goes! 
Linden Lab
As promised, we’re sharing some release note summaries of the fixes, tweaks., and other updates that we’re making to the Marketplace and the Web properties, so that those following along can read through at their leisure.
12/01/16 - Maps:  Maps would disappear at peak use times.  That’s fixed now.  
11/28/16 - We have a new shiny Grid Status blog! You may notice an updated look and feel. If you  followed https://community.secondlife.com/t5/Status-Grid/bg-p/status-blog, be sure to update your subscriptions to status.secondlifegrid.net
11/22/16 - No more slurl.com. All http://maps.secondlife.com/ all the time.
11/21/16 - We did a minor deploy to the lindenlab.com web properties.
11/09/16 - Events infrastructure stabilization to fix a few listing bugs.
11/08/16 - Fixes to maps.secondlife.com were released, including:


Viewing a specific location on maps.secondlife.com no longer throws a 404 error in the console Adding a redirect from slurl.com/secondlife/ requests to maps.secondlife.com
11/04/16 - A minor Security fix was released.
11/03/16 - We released a large infrastructure update to secondlife.com along with security fixes and several minor bug fixes.
As always, we appreciate and welcome your bug reports in Jira!
Stay tuned to the blogs for future updates as we complete new releases.
Linden Lab
Hi everyone!
As many Residents saw, we had a pretty rough day on the Grid yesterday. I wanted to take a few minutes and explain what happened. All of the times in this blog post are going to be in Pacific Time, aka SLT.
Shortly after 10:30am, the master node of one of the central databases crashed. This is the same type of crash we’ve experienced before, and we handled it in the same way. We shut down a lot of services (including logins) so we could bring services back up in an orderly manner, and then promptly selected a new master and promoted it up the chain. This took roughly an hour, as it usually does.
A few minutes before 11:30am we started the process of restoring all services to the Grid. When we enabled logins, we did it in our usual method - turning on about half of the servers at once. Normally this works out as a throttle pretty well, but in this case, we were well into a very busy part of the day. Demand to login was very high, and the number of Residents trying to log in at once was more than the new master database node could handle.
Around noon we made the call to close off logins again and allow the system to cool off. While we were waiting for things to settle down we did some digging to try to figure out what was unique about this failure, and what we’ll need to do to prevent it next time.
We tried again at roughly 12:30pm, doing a third of the login hosts at a time, but this too was too much. We had to stop on that attempt and shut down all logins again around 1:00pm.
On our third attempt, which started once the system cooled down again, we took it really slowly, and brought up each login host one at a time. This worked, and everything was back to normal around 2:30pm.
My team is trying to figure out why we had to turn the login servers back on much more slowly than in the past. We’re still not sure. It’s a pretty interesting challenge, and solving hard problems is part of the fun of running Second Life.
Voice services also went down around this time, but for a completely unrelated reason. It was just bad luck and timing.
We did have one bright spot! Our status blog handled the load of thousands of Residents checking it all at once much better. We know it wasn’t perfect, but it showed much improvement over the last central database failure, and we’ll keep getting better
My team takes the stability of Second Life very seriously, and we’re sorry about this outage. We now have a new challenging problem to solve, and we’re on it.
April Linden
Linden Lab
Hi! I’m a member of the Second Life Operations team. On Friday afternoon, major parts of Second Life had some unplanned downtime, and I want to take a few minutes to explain what happened.
Shortly before 4:15pm PDT/SLT last Friday (May 6th, 2016), the primary node for one of the central databases that drive Second Life crashed. The database node that crashed holds some of the most core data to Second Life, and a whole lot of things stop working when it’s inaccessible, as a lot of Residents saw.
When the primary node in this database is offline we turn off a bunch of services, so that we can bring the grid back up in a controlled manner by turning them back on one at a time.
My team quickly sprung into action, and we were able to promote one of the replica nodes up the chain to replace the primary node that had crashed. All services were fully restored and turned back on in just under an hour.
One additional (and totally unexpected) problem that came up is that for the first part of the outage, our status blog was inaccessible. Our support team uses our status blog to inform Residents of what’s going on when there are problems, and the amount of traffic it receives during an outage is pretty impressive!
A few weeks ago we moved our status blog to new servers. It can be really hard to tune a system for something like a status blog, because the traffic will go from its normal amount to many, many times that very suddenly. We see we now have some additional tuning we need to do with the status blog now that it’s in its new home. (Don’t forget that you can also follow us on Twitter at @SLGridStatus. It’s really handy when the status blog in inaccessible!)
As Landon Linden wrote a year ago, being around my team during an outage is like watching “a ballet in a war zone.” We work hard to restore Second Life services the moment they break, and this outage was no exception. It can be pretty crazy at times!
We’re really sorry for the unexpected downtime late last week. There’s a lot of fun things that happen inworld on Friday night, and the last thing we want is for technical issues to get in the way.

April Linden
Linden Lab
Quite some time ago, we introduced viewer changes that moved the fetching of most assets from using UDP through the simulator to HTTP through our content delivery network (CDN) to improve fetch times and save cycles on the simulator for simulation. At the time, we did not disable the UDP fetching path so that all users would have time to upgrade to newer viewer versions that fully support the new HTTP/CDN path. We warned that at some time in the future we would disable the older mechanism, and that time has come.

The simulator rolling to the BlueSteel and LeTigre channels this week removes support for UDP asset fetching. In the normal course of events that version should be grid wide within a couple of weeks. Viewers that have not been updated to the newer (and faster) protocol will no longer be able to fetch many asset types, including
 
sounds landmarks clothing body parts animations gestures meshes Since some specific body parts are required to render avatars, anyone on viewers that cannot load them will be a cloud or “Ruth” avatar and unable to change from it.
All current official viewer versions are able to load assets normally. As far as we are aware, all actively maintained Third Party Viewers have had support for the HTTP/CDN asset fetching for many releases, so no matter what viewer you prefer there should be an upgrade available for you.
Regards,
Oz Linden
Linden Lab
Residents on our Release Candidate Regions got an extra bonus Region restart today. We apologize if this extra restart disrupted your Second Life fun, so we want to explain what happened.

The Release Candidate channels exist so that we can try new server versions under live conditions to discover problems that our extensive internal testing and trials on the Beta Grid don't uncover. Unfortunately, it's not nearly possible for us to simulate the tremendous variety of content and activities that are found on the main Grid. We appreciate that Region owners are willing to be a part of that process, and regret those occasions when a bug gets past us and disrupts those Regions.
Normally, Region state is saved periodically many times a day as well as when the Region is being shut down for a restart. The most recent Region state is restored when a Region restarts.
The extra roll today was needed because we found a problem that could have caused long-running Regions to fail to save that state.  Without the roll, there would be a significant chance that changes made to those regions might not be there following the regularly scheduled roll because the save data would be out of date. Good news! It’s been fixed, and today's roll applies that fix.  The roll took a little longer than usual because we took extra care to ensure that the Region saves would work normally for this roll.
We apologize for any disruption to your Second Life today, but at least you can rest assured that  Second Life saves!
Linden Lab
Hello amazing Residents of Second Life!
A few days ago (on Sunday, October 28th, 2018) we had a really rough day on the grid. For a few hours it was nearly impossible to be connected to Second Life at all, and this repeated several times during the day.
The reason this happened is that Second Life was being DDoSed.
Attacks of this type are pretty common. We’re able to handle nearly all of them without any Resident-visible impact to the grid, but the attacks on Sunday were particularly severe. The folks whothat were on call this weekend did their best to keep the grid stable during this time, and I’m grateful they did.
Sunday is our busiest day in Second Life each week, and we know there’s lot of events folks plan during it. We’re sorry those plans got interrupted. Like most of y’all, I too have an active life as a Resident, and my group had to work around the downtime as well. It was super frustrating.
As always, the place to stay informed of what’s going on is the Second Life Grid Status Blog. We do our best to keep it updated during periods of trouble on the grid.
Thanks for listening. I’ll see you inworld!
April Linden
Second Life Operations Team Lead
Linden Lab
Hi everyone.
As I’m sure most of y’all have noticed, Second Life has had a rough 24 hours. We’re experiencing outages unlike any in recent history, and I wanted to take a moment and explain what’s going on.
The grid is currently undergoing a large DDoS (Distributed Denial of Service) attack. Second Life being hit with a DDoS attack is pretty routine. It happens quite a bit, and we’re good at handling it without a large number of Residents noticing. However, the current DDoS attacks are at a level that we rarely see, and are impacting the entire grid at once.
My team (the Second Life Operations Team) is working as hard as we can to mitigate these attacks. We’ve had people working round-the-clock since they started, and will continue to do so until they settle down. (I had a very late night, myself!)
Second Life is not the only Internet service that’s been targeted today. My sister and brother opsen at other companies across the country are fighting the same battle we are. It’s been a rough few days on much of the Internet.
We’re really sorry that access to Second Life has been so sporadic over the last day. Trying to combat these attacks has the full attention of my team, and we’re working as hard as we can on it. We’ll keep posting on the Second Life Status Blog as we have new updates.
See you inworld!
April Linden
Second Life Operations Team Lead
Linden Lab
Things were a little bumpy for users that tried to log into Second Life on Monday morning as a result of a scheduled code deploy. I wanted to share with you what happened, and what we're going to do to try and prevent this in future.
That morning, I attempted to deploy a database change to an internal service. Without going into too much detail, the deploy was to modify an existing database table in order to add an extra column. These changes had been reviewed multiple times, had passed the relevant QA tests in our development and staging environments, and had met all criteria for a production deploy. Although this service isn't directly exposed to end-users, it is used as part of the login process and it is designed to fail open, i.e. if the service is unavailable, users should still be able to log in to Second Life without a problem.
During the database change, the table being altered was locked to prevent changes to it while it was being altered. This table turned out to be almost a billion rows in size and the alteration took significantly longer than expected. Furthermore, the service did not fail open as designed, and caused logins to Second Life to fail, along with a handful of other ancillary services. Our investigation was further complicated by other problems seen on the Internet on Monday due to a configuration issue at one of the big ISPs in North America. Many of us work remotely and while we saw problems early on, it wasn't immediately clear to us that it was internal, rather than one caused by a third party service. After some investigation, the lock on the database was removed, and services slowly began to recover. We did have to do some additional work to restore the login service, as the Next Generation login servers (as described by April here) are not yet fully deployed.
I'm still looking to complete this deploy in the near future, but this time we'll be using another method which doesn't require locking the database tables, and won't cause a similar problem. We're also investigating exactly why the service didn't fail open as it was designed to, and how we can prevent it from happening in the future. 
Steven Linden
Linden Lab
Heya! April Linden here.
We had a pretty rough morning here at the Lab, and I want to tell you what happened.
Early this morning (during the grid roll, but it was just a coincidence) we had a piece of hardware die on our internal network. When this piece of hardware died, it made it very difficult for the servers on the grid to figure out how to convert a human-readable domain name, like www.secondlife.com, into IP addresses, like 216.82.8.56.
Everything was still up and running, but none of the computers could actually find each other on our network, so activity on the grid ground to a halt. The Second Life grid is a huge collection of computers, and if they can’t find other other, things like switching regions, teleports, accessing your inventory, changing outfits, and even chatting fail. This caused a lot of Residents to try to relog.
We quickly rushed to get the hardware that died replaced, but hardware takes time - and in this case, it was a couple of hours. It was very eerie watching our grid monitors. At one point the “Logins Per Minute” metric was reading “1,” and the “Percentage of Successful Teleports” was reading “2%.” I hope to never see numbers like this again.
Once the failed hardware was replaced, the grid started to come back to life.
Following the hardware failure, the login servers got into a really unusual state. The login server would tell the Resident’s viewer that the login was unsuccessful, but it was telling the grid itself that the Resident had logged in. This mismatch in communication made finding what was going on really difficult, because it looked like Residents were logging in, when really they weren't. We eventually found the thing on the login servers that wasn’t working right following the hardware failure, and corrected it, and at this point the grid returned to normal.
There is some good news to share! We are currently in the middle of testing our next generation login servers, which have been specifically designed to better withstand this type of failure. We’ve had a few of the next generation login servers in the pool for the last few days just to see how they handle actual Resident traffic, and they held up really well! In fact, we think the only reason Residents were able to log in at all during this outage was because they happened to get really lucky and got randomly assigned to one of the next generation login servers that we’re testing.
The next step for us is to finish up testing the next generation login servers and have them take over for all login requests entirely. (Hopefully soon!)
We’re really sorry about the downtime today. This one was a doozy, and recovering from it was interesting, to say the least. My team takes the health and stability of Second Life really seriously, and we’re all a little worn out this afternoon.
Your friendly long eared GridBun,
April Linden
Linden Lab
Heya! April Linden here.
Yesterday afternoon (San Francisco time) all of the Place Pages got reset back to their default state. All customizations have been lost. We know this is really frustrating, and I want to explain what happened.
A script that our developers use to reset their development databases back to a clean state was accidently run against the production database. It was completely human error. Worse, none of the backups we had of the database actually worked, leaving us unable to restore it. After a few frantic hours yesterday trying to figure out if we had any way to get the data back, we decided the best thing to do was just to leave it alone, but in a totally clean state.
An unfortunate side effect of this accident is that all of the web addresses to existing Place Pages will change. There is a unique identifier in the address that points to the parcel that the Place Page is for, and without the database, we’re unable to link the address to the parcel. (A new one will automatically be generated the first time a Place Page is visited.) If you have bookmarks to any Place Pages in your browser, or on social media sites, they'll have to be updated.
Because of this accident, we’re taking a look at the procedures we already have to make sure this sort of mistake doesn’t happen again. We’re also doing an audit of all of our database backups to make sure they’re working like we expect them to.
I’d like to stress that we’re really sorry this accident occurred. I personally had a bunch of Place Pages I’d created, so I’m right in there with everyone else in being sad. (But I’m determined to rebuild them!)
Since we’re on the topic of human error, I’d like to share with you a neat piece of the culture we have here at the Lab.
We encourage people to take risks and push the limits of what we think is possible with technology and virtual worlds. It helps keep us flexible and innovative. However… sometimes things don’t work out the way they were planned, and things break. What we do for penance is what makes us unique.
Around the offices (and inworld!) we have sets of overly sized green ears. If a Linden breaks the grid, they may optionally, if they choose to, wear the Shrek Ears as a way of owning their mistake.
If we see a fellow Linden wearing the Shrek Ears, we all know they’ve fessed up, and they’re owning their mistake. Rather than tease them, we try to be supportive. They’re having a bad day as it is, and it’s a sign that someone could use a little bit of niceness in their life.
At the end of the day, the Linden takes off the Shrek Ears, and we move on. It’s now in the past, and it’s time to learn from our mistakes and focus on the future.
There are people wearing Shrek Ears around the office and inworld today. If you see a Linden wearing them, please know that’s their way of saying sorry, and they’re really having a bad day.

Baloo Linden and April Linden, in the ops’ team inworld headquarters, the Port of Ops . 
Linden Lab
Heya!
April Linden here. I’m a member of the Second Life Operations team. Second Life had some unexpected downtime on Monday morning, and I wanted to take a few minutes to explain what happened.
We buy bandwidth to our data centers from several providers. On Monday morning, one of those providers had a hardware failure on the link that connects Second Life to the Internet. This is a fairly normal thing to happen (and is why we have more than one Internet provider). This time was a bit unusual, as the traffic from our Residents on that provider did not automatically spill over to one of the other connections, as it usually does.
Our ops team caught this almost immediately and were able to shift traffic around to the other providers, but not before a whole bunch of Residents had been logged out due to Second Life being unreachable.
Since a bunch of Residents were unexpectedly logged out, they all tried to log back in at once. This rush of logins was very high, and it took quite a while for everyone to get logged back in. Our ops team brought some additional login servers online to help with the backlog of login attempts, and this allowed the login queue to eventually return to its normal size.
Some time after the login rush was completed the failed Internet provider connection was restored, and traffic shifted around normally, without disruption, returning Second Life back to normal.
There was a bright spot in this event! Our new status blog performed very well, allowing our support team to be able to communicate with Residents, even in a state where it was under much higher load than normal.
We’re very sorry for the unexpected downtime on Monday morning. We know how important having a fun time Inworld is to our Residents, and we know how unfun events like this can be.
See you Inworld!
April Linden
Linden Lab
There's been a lot going on with the Marketplace and our Web properties, and in an effort to give you a more granular view into what we're working on, we're going to put out release notes summaries on this blog going forward. Of course, some things will have to remain behind the scenes, but here are all the news that's fit to print:
10/31/16 New Premium Landing page
10/28/16 Several bug fixes to the support portal support.secondlife.com
10/24/16 We made an update to the Marketplace with the following changes:
Fix sorting reviews by rating Fix duplicate charging for PLE subscriptions Fix some remaining hangers-on from the VMM migration (unassociated items dropdown + “Your store has been migrated” notifications Fix to Boolean search giving overly broad results (BUG-37730)
10/18/16 Maps: We deployed a fix for “Create Your Own Map” link, which used to generate an invalid slurl.
10/11/16 Marketplace: We disabled fuzzy matches in search on the Marketplace so that search results will be more precise.
10/10/16 We made an update to the Marketplace with the following changes:
We will no longer index archived listings We will now reindex a store's products when the store is renamed We made it so that blocked users can no longer send gifts through the Marketplace We added a switch to allow us to enable or disable fuzzy matches in search
9/28/16 We deployed a fix to the Marketplace for an issue where a Firefox update was ignoring browser-specific style sheet settings on Marketplace.
9/22/16 We made a change to the Join flow for more consistency in password requirements.
9/22/16 We updated System Requirements to reflect the newest information.
As always, we appreciate and welcome your bug reports in Jira! Please stay tuned to the blogs for updates as we complete new releases.
Linden Lab
Linden Lab is excited to announce the latest updates on one of our coolest new features — mesh. With mesh, you’ll be able to create, build and beautify even more incredible objects inworld! You’ll find a new Mesh Enablement functionality tab in the My Account section of your Account screen. It’s just another part of the innovation and imagination that makes Second Life the magical place it is.

Uploading Mesh

Now that mesh is in rollout stage and many users will soon be able to utilize its content-creation capabilities, there are a few things creators should know before giving it a whirl. Potential creators will need to satisfy a couple of requirements in order to gain access to mesh-upload capabilities. First, you’ll need to review a short tutorial that is intended to help Residents understand some of Linden Lab’s intellectual property policies — they can be kind of tricky, and we at the Lab want all content creators to be well-informed This tutorial outlines some of the key points relating to intellectual property.

Second, if you have never given us billing information, you'll need to enter it into your account. Why, you ask? Because having payment information on file is an important step in establishing direct relationships with content creators who will be working in mesh. Note: We do realize that with the current configuration, some Residents will need to make a purchase in order to enter their billing information.

The first round of rollouts of mesh will be happening over the next few weeks! We look forward to seeing your continued imagination, skill and creativity in this exciting new format. We hope you’ll be pleased with our innovations — and can’t wait to see what you come up with!

Now go create something amazing!

Linden Lab
×
×
  • Create New...