Why the Friday Grid Roll?
Hi! I wanted to take a moment to share why we had to do a full grid roll on a Friday. We know that Friday grid rolls are super disruptive, and we felt it was important to explain why this one was timed the way it was.
Second Life is run on a collection of thousands of Linux servers, which we call the “grid.” This week there was a critical security warning issued for one of the core system libraries (glibc), that we use on our version of Linux. This security vulnerability is known as CVE-2015-7547.
Since then we’ve been working around-the-clock to make sure Second Life is secure.
The issue came to light on Tuesday morning, and the various Linux distributions made patches for the issue available shortly afterwards. Our security team quickly took a look at it, and assessed the impact it might have on the grid. They were able to determine that under certain situations this might impact Second Life, so we sprang into action to get the grid fully patched. They were able to make this determination shortly after lunch time on Tuesday.
The security team then handed the issue over to the Operations team, who worked to make the updates needed to the machine images we use. They finished in the middle of the night on Tuesday (which was actually early Wednesday morning).
Once the updates were available, the development and release teams sprung into action, and pulled the updates into the Second Life Server machine image. This took until Wednesday afternoon to get the Second Life Server code built, tested, and the security team confirmed that any potential risk had been taken care of.
After this, the updates were sent to the Quality Assurance (QA) team to make sure that Second Life still functioned as it should, and they finished up in the middle of the night on Wednesday.
At this point we had a decision to make - do we want to roll the code to the full grid at once? We decided that since the updates were to one of the most core libraries, we should be extra careful, and decided to roll the updates to the Release Candidate (RC) channels first. That happened on Thursday morning.
We took Thursday to watch the RC channels and make sure they were still performing well, and then went ahead and rolled the security update to the rest of the grid on Friday.
Just to make it clear, we saw no evidence that there was any attempt to use this security issue against Second Life. It was our mission to make sure it stayed that way!
The reason there was little notice for the roll on Thursday is two fold. First, we were moving very quickly, and second because the roll was to mitigate a security issue, we didn’t want to tip our hand and show what was going on until after the issue had been fully resolved.
We know how disruptive full grid rolls are, and we know how busy Friday is for Residents inworld. The timing was terrible, but we felt it was important to get the security update on the full grid as quickly as we could.
Thank you for your patience, and we’re sorry for the bumpy ride on a Friday.
There are no comments to display.