Technology General

You are currently in the Blog Archive. All content within this area is Read-Only and cannot be modified. Active Blogs can be found here.

Blocking 2.0.1 Logins Today

by Linden on ‎08-30-2010 03:07 PM

This afternoon, we’ll be blocking logins for the 2.0.1 version of Viewer 2. All users still running a 2.0.1 Viewer will be prompted to upgrade to 2.1.1. Our developers have made numerous stability and performance improvements that make the 2.1.1 release a much more reliable Viewer.

One of the notable changes in 2.1.1 is that we’ve defaulted HTTP Textures to ON for all users. HTTP Textures is a feature that allows us to transmit textures from SL over HTTP instead of UDP. This should make textures load much faster.

In the event you have any issues with HTTP Textures, you can turn it back off by enabling the Develop menu in your Viewer (CTRL-ALT-Q), then clicking Develop > ‘HTTP Textures’ to deselect it.

For a more detailed look at what’s in the 2.1.1 release, check out the release notes.

And, let us know what you think on Twitter using #slviewer2.

If you've been following these blogs, you've seen that we're making some major changes to how we work. In short: we're aiming to fix more bugs, faster, with more visible process. And if we're aiming to fix more bugs, we should fix our bug-tracking. But first, a quick introduction for those new to the topic.

What is the Second Life Bug-Tracker?

At Linden Lab we use the JIRA bug-tracking software from Atlassian to collect, organise and track bugs for fixing. Residents can file new bugs for our attention at jira.secondlife.com . Our engineering teams review submitted bugs and prioritise them for fixing in a process known as triage.

What kinds of bugs go into the bug-tracker?

This is a topic that confuses a lot of people, so here's a handy guide: If you've got a sudden problem that's affecting you, and you're not sure whether it's related to your account or your software, then please go to Support, not the bug-tracker. Our Support folks can help diagnose the problem and may be able to fix it quickly. If the problem is obviously a software defect that may be affecting a large number of people, it goes in the bug-tracker. So, a problem like "I can't log in and I don't know why" is a Support issue. A problem like "The 'Build' button flips upside-down whenever there's a hippo on the screen" is a bug.

That's what our bug-tracker's about. However, there's a lot about its current setup, and the way we use it, that we want to change and improve. Here are our priorities:

Triage, prioritise and fix more bugs, more effectively

Top of the list, because that's what bug-tracking's for.

A better experience for those who want to use JIRA

The current UI leaves a lot of room for improvement. We want  Fast and Easy. (Fun may be a tall order, but we're open to suggestions.)

A simpler experience for those who don't

JIRA is a powerful and complex tool for those who want to go deep into the software development process. These people are in the minority. The default method of bug reporting should be far less intimidating and directly integrated into our customer support process.

More project transparency

Project teams that work in the open, such as , should have project management tools that those outside the Lab can follow. Everyone should be able to see the state of the current sprint, the arrangement of the project's backlog and the items that are scheduled against each upcoming release.

Better progress notification

Issue statuses have frequently not reflected the progress we're making internally. This needs to improve so that our customers can see us responding to their bug reports.

One JIRA for all

Up until now we've maintained two separate bug-trackers, external and internal, and we've imported issues between them. It's not worked that well; among other things, it's caused the issue status problem above. The two bug-trackers will be merged into one. (This won't mean that all our work will be publicly-visible; some types of issues, such as security or customer-support-related issues, need to be kept internal.)

Easier issue workflow

The workflow stages through which a bug moves should better reflect our working process. Issue statuses and resolutions will have clearer names. For example, "Needs More Info" is not a resolution; it's an intermediate state waiting on action from the reporter.

Now that we've listed our priorities, here's how we're going to address them.

Tomorrow morning we'll roll out some changes to the default workflow used in our main projects. Some status and resolution names will change, and issues which have status changes will be updated. For a full description of the changes we're making, see this wiki page. The rollout will require some downtime, which will be tracked on the Second Life status blog.

In September, we'll be migrating our JIRA setup to new hosting and upgrading to the latest version. With JIRA 4.1, we'll get an interface that's easier to read, faster to navigate, and more responsive. We'll also be installing Greenhopper, the Scrum management tool for JIRA, which open projects such as Snowstorm can use to publish backlogs and organise sprints. JIRA will no longer have a login screen separate from the main secondlife.com site; logging into one also logs you into the other. This migration is planned for September 6th and will likely take most of the day, during which the bug-tracker will be read-only; you'll be able to look at issues, but not add or change them. If all goes to plan, the new system will be live on September 7th.

There'll still be work to do after this. Migrating internal work to the new JIRA will take a while. Some workflows or other processes will likely need tweaking as we settle in. On top of that, there's the ongoing process of helping our engineering teams be more transparent and responsive in their daily work while still giving them time to get the engineering done.

Down the road, we are looking to integrate our support case system and JIRA.  This integration will allow Support to associate bug-related support cases with open JIRA issues.  The resident who submitted the case will be able to click through to the JIRA issue, and their case will automatically be updated as the JIRA status changes. We're also looking at ways of making bug reporting easier for everyone so that only those who want to navigate JIRA do so.

The proof of the bug-tracker, of course, is in the bug fixing.  It's also in the two-way communication that enables you to see what we're working on, and tell us about other things we should be looking at.  We deeply appreciate the effort you put in to filing detailed bug reports and we work to reward that effort with higher-quality software.  Along the way, we want to keep all of you up-to-date on what we're doing with those reports and why we're doing it.

We thank everyone who has contributed to our bug-tracker so far; it's been incredibly valuable. You'll be hearing more from us, in many more ways, soon.

Where We’ve Been

Almost two years ago, we set about revamping the Second Life Viewer and in March of this year we released Viewer 2. Over that two year period, we took a heads-down approach to our design and development process to create a Viewer that would be easier for new Residents to use. This heads-down approach meant we had very limited contact with you, and left many Residents feeling alienated. Now, we are making some big changes to better communicate with you and include you in our development process. Specifically, we’re beginning a new open-source program -- known as Project Snowstorm -- that will show you exactly what we’re working on and will also start to bring Resident contributions into our mainline Viewer build. We’re extremely excited to be firing up this program, and we’re confident it will lead to a better Viewer, one that benefits from the tremendous talent and creativity we’ve seen from the most committed members of our development community.

Fast, Easy, Fun!

As part of Linden Lab’s recent reorganization, we’ve taken a hard look at the way we work together, the way we build software, and the way we interact with the open-source community. We’ve got a lot of improvements we want to make to the Viewer 2 user experience. Some of the Viewer’s workflows are cumbersome for some Residents and this has hurt Viewer adoption. We really can improve the Second Life user experience by rethinking the way our Viewer works and making it (and its features and functionality) faster, easier, and more fun for everyone. But we can’t do it without your help.

Where We’re Headed - Project Snowstorm

As we prepare to make Viewer 2.1.1 the mainline release, I’m really excited to introduce you to Project Snowstorm and the Snowstorm team who will be working on the Second Life Viewer in the open and in way that directly engages you.

Here are our goals:

Show Residents continuous visible progress

  • Work in the open by sharing not only our code, but our process publicly -- this includes our backlog and our discussion about it.
  • Engage with the open source community and aggressively accept contributions from the community into the Second Life Viewer.
  • Release new ‘Development’ Viewers frequently -- our initial target is bi-weekly.  All builds from the ‘Development’ branch are visible and available for testing.

Improve the user experience 

  • Make continuous improvements to the design and implementation of the Viewer’s user interface.
  • Import desirable patches and features from Snowglobe and other Third Party Viewers.
  • Add small features and fixes that have high value and low cost, while still remaining consistent with an overall product vision.

Renew and deepen our relationship with the community

  • Integrate community work directly into the main Viewer rather than routing it through Snowglobe first.
  • Demonstrate rapid responsiveness to feedback and patches from the community.
  • Engage continuously with the community to develop new project proposals and provide resources that open source developers need to be effective.

I should note that it’s not just the Snowstorm team who are working on the Viewer. Several teams throughout the Lab are contributing features and bug fixes to the Viewer -- those teams will also be moving to a model where we work more closely with our Residents. The Snowstorm team will be focused on rapid iteration and constant improvement, while working closely with the open source community and sharing everything we do.

How’s that going to work, Esbee?

I’m glad you asked! Linden Lab has adopted the Scrum framework as a way of allowing our teams to work quickly and feel empowered to introduce new features and functionality to Second Life.

At the heart of our process is the Snowstorm Team Backlog. This is a ever-evolving ranked list of user stories that describe things we’d like to do with Viewer 2. Every team at Linden Lab has a backlog like this and ours will be visible to you. If you look at the list and think of something you’d like to suggest we add, change, or just have questions, please let me know!

As a team, we’ll be gathering every two weeks to pick items off our Backlog to work in our next Sprint. A Sprint is a development cycle where teams create tasks to fulfill a series of user stories and work to design, implement, and test those stories during that cycle. Snowstorm Team Sprints will last two weeks. Each day, the team will gather inworld for a Daily Scrum, where each team member will give a short (2 minute) status update. We’ll publish that status update on the Wiki after our Daily Scrum as well.

We’ll also be encouraging open source developers to work with us on Backlog items - or you’re welcome to propose ideas too! Oz Linden will be posting information about how to propose project ideas on the Wiki. Open source developers will be treated as any other team member and invited to our Daily Scrum to share their status as well.

As the Snowstorm team works with you to make changes to the Second Life Viewer with a focus on improving overall user experience, teams in the Lab will also be working on the Viewer. We’ll be sharing our Viewer Roadmap shortly so you can get an idea of all the work being considered for the Viewer this year, not just from the Snowstorm Team.

The Snowstorm team will be blogging at least weekly about their work, sharing their successes, failures, challenges, and ideas with you. We’ll be holding public meetings, sharing our design ideas, and all documentation.

Who’s on the Snowstorm Team?

As I mention above many teams across the Lab are working on the Viewer. The Snowstorm team will be managing the Development branch, coordinating contributions from the open source community, and will also contribute rapid feature development.

The team consists of:

  • Q Linden - Tech Lead, Team Lead
  • Esbee Linden - Product Lead, Business Lead
  • Oz Linden - Open Source Lead
  • Merov, Aimee, Tofu, Paul, Andrew, Vadim, Anya - Engineers
  • Open Source Community
Where can I learn more?

Snowstorm operates in the open; the home page of the Snowstorm team is on the Snowstorm Wiki page. It has pointers to our various communication channels, processes, and contact information.

The Snowstorm Team presented at the Second Life Community Convention on Aug 15. You can watch the recorded presentation, here.

Let us know what you think!

Do you have questions about what we’re doing, where we’re headed, how we work, our Backlog, processes, etc? The Snowstorm Team would love to hear your thoughts and feedback. Please feel free to respond to this post, Twitter using the hashtag #snowstorm or reply to @snowstormsl, email, or come to one of our weekly open source meetings.

Thanks!
Esbee Linden, Q Linden & Oz Linden

It been a summer of major change here at the Lab and  I wanted to give you a quick update on how we’re tackling a top  priority--platform enhancements that improve grid performance.

An Introduction to HTTP Assets
Last  week, we enabled grid-wide changes that represent an important first  step towards more effectively managing asset requests out of the  simulator path, and distributing certain assets (like textures) to an  "edge" infrastructure (like a Content Distribution Network or CDN). Internally,  the project is called HTTP Assets and includes a series of initiatives  that are intended to optimize how we manage, distribute, and store  assets on the grid to enable higher performance, reliability, and rez  times.

The HTTP Assets implementation accomplished two important objectives. The first was to move texture requests from a UDP protocol format, to TCP.  Using TCP, will increase reliability of proper receipt of those asset  requests; UDP is not designed as a reliable messaging protocol and is  used where dropping packets is preferred over the need to wait for all  packets to arrive.  The second was to reduce the need to have the  simulator directly managing asset requests. In the past, the simulator  would handle all requests from the viewer back into the inventory data  bases and asset complex, then back to the viewer. Needless to say, this  is not the most efficient or fastest way to deliver those assets to the  viewer. Now, the simulator will only be queried to provide a redirect,  so that the viewer can fetch the asset directly from a front end web  proxy to the asset system servers.  Ultimately, the simulator will be  completely removed from this path and the viewer will be querying  directly to the asset proxy server.  If you got a little lost, here’s a  visual of what the asset request process now looks like now, in two  steps:

Step 1:


Step 2:

OK, So Why Does This Matter? 
Well,  from a performance standpoint (translation - lag), you should begin to  see better texture download times. Reducing our dependency on the  simulator for these requests and eventually directing them to a services  layer will also improve the reliability of those requests.  Keep your  eyes for a blog post in a few weeks from Xan Linden, who runs the  Systems Analysis team, that will quantify those performance improvements  with hard data. He's been a Linden for a long time and is a wealth of  information on the history of our codebase and infrastructure.

One More Big Performance Improvement Now Available in Viewer 2.1
Here’s  an interesting fact. Did you know that textures and objects represent  about 90% of downloads to the Viewer? That means that any Viewer bugs  that hamper the performance of textures and objects make a big  difference when it comes to lag.

I have good news to report. We found and corrected a bug in the latest version of Viewer 2.1.  (This particular bug has been present for a while and also exists in  the 1.23 code base.) The bug was related to object retrieval, and was  causing object requests to bypass the cache and constantly make requests  for those objects to the asset system. Generally, most residents  frequent the same regions, and the objects in those regions are cached  locally so that they can be downloaded very quickly, improving overall  rez time. It’s only when objects are changed (not very frequent), or  when you visit different regions, that a request for objects are made  back to the asset system. So, download Viewer 2.1 and you should notice an appreciable improvement in performance.

More Goodness on the Horizon
Soon,  you’ll be hearing from Xan on performance data, Jack who will give us  his perspective on HTTP textures from more of a product perspective, and  also Vogt with an update from Support.

My team and I look forward to answering your questions and hearing if you feel the impact of our recent changes.

FJ Linden

I'm feeling a little A.A. Milne this morning, so forgive the title.

Early this week, the additional relevancy improvements associated with last week's release finished rolling out. The index update took too long and we've made changing that a priority. It's a huge frustration that we share with Residents.

There were a couple of high-profile fixes in the last update. First and foremost, we have resolved WEB-2378  (Web and Inworld Search results are being mixed). This one was affecting relevancy for a number of keywords and we're glad to get the fix out! Next, we have, WEB-1819  (Find is resetting to start page). Having a way to easily get back to a previous Search is central to basic usability. The Search team came up with a solution that we think fits that need without interfering with the core search experience. We know some people would prefer a different solution, but we hope you'll give this a chance while we iron out the bugs.

Unfortunately, we also found some bugs pretty quickly in the last release and we've scheduled another deployment next week to release the fixes for these. Namely, they are that exact phrase searching is no longer possible and WEB-2536, a problem in our spam filter changes. In addition to that spam issue, we also see some enterprising Residents trying to spam in some pretty creative ways to get around the filter and we'll be looking at those, too. Expect lots of upcoming tweaks!

Residents have all taught me to be explicit about what else we're working on, so forgive me if there is a little repetition here. First off, we're replacing the maturity ratings system. This is a multi-phase project that we are testing at each step.

Let me segue into a side note about the current maturity system. Surprising as it may be, the current maturity system is working as intended. It is counter-intuitive, but it's not broken. The GSA looks at each set of results it's been given independently and it evaluates relevancy based on the set it has. Practically speaking, this results in some odd behaviour, but it also results in some expected behaviour.  So, I wanted to write a little bit about it.

First, the larger the data set, the less likely your parcel will be to show up highly ranked for your keywords. I know, this one makes sense. Adding more maturity levels generally means that your parcel is being evaluated against more parcels that might be similar. The GSA, just like Google, only returns the first thousand results. If you aren't there, try refining your keywords. Instead of optimizing for “beach" to be your main keyword try something more relevant to your business, for example, "beachfront rentals". The more specific queries will naturally have fewer relevant results. I have been a Search professional for a decade and I have never seen such an enormously long tail of search queries as I have seen in Second Life. Residents are using extremely specific queries!

Second, we realize that the current system is too unpredictable and frustrating for residents. Our new system will eliminate the counter-intuitive behaviour as well as extending smarter maturity ratings to Classifieds, Events, and database sorting.

Other fixes included:

  • Tweaks and fixes to the html pages in web Search
  • Search team’s part of the fix for owner links breaking in parcel descriptions
  • Lots of small localization fixes and updates
  • Stemming refinements to increase relevancy
  • Javascript warning enabled for residents who disable it.
  • Query speed testing and optimization
  • Lightened scroll bar in viewer Search
  • Spam system update

Next up, in additon to the items I already mentioned, we have a number of improvements on deck. Currently, Classifieds ads only count a portion of their actual clicks and teleports, so, the fix for that is on its way along with a number of localization changes, and some work on the new events browse tab. In addition, we’re testing what we hope is a vastly improved way to roll out changes. This new deployment change won't be live with our release next week, but we're testing it and focused on getting it out as soon as possible.

Thanks again for your comments! Sadly, I can't enable them this week, but I'll ensure we have them up for the Search post next week!

Search Index Update in Progress

by Linden on ‎08-06-2010 09:01 AM

This  morning, you should continue to see all the improvements associated with this week's release rolling out. It’s not complete yet, but we’re monitoring it over the weekend to ensure it goes out as expected.

In addition, we’re exploring other options for rolling out changes. When we do a big update, there are 1.5 million documents to reindex and process, this doesn't happen fast, and it doesn't even happen overnight. The Search team is actively planning ways to reduce that time so, when things do change, they change quickly, without the odd, in-between state of the index. We plan to have this in place for our next release.

Thanks for your patience!

Search Release, it's an Event

by Linden on ‎07-28-2010 12:42 PM

We  had a small release last week and have been fine-tuning performance in  preparation for our next release. Improving performance is a major  concern of ours and if you haven't yet, please download the latest Viewer. Search performance depends, in part, on Viewer  performance and every Viewer release improves Search performance.

Just released!

You may have already noticed this additional option appear on your in-viewer Search last week. If not, check it out! You now have the option to browse events straight from the Find window in Viewer 2.

events.png

Some features of note:

1) Featured
The editorial team at Linden has stumbled across a few events we wanted to bring to your attention, so we've include one of their selections up top. If you have an event worthy of notice, please send in your  suggestions!

2) Choose a date
By  default, you'll see events listed starting with the top of the current  hour. If you're planning ahead, we have a convenient drop-down calendar  for you to choose another date.

Browse by interest
3)  You also have the option to simply select an Event category. We’re looking to expand categories as well and we welcome your suggestions.

Additional fixes included:

  • Performance tweaks
  • Teleport not triggered for specific locations in viewer search
  • Changes to filtered words (Which included this fix WEB-2023)
  • Multiple repeat Destination guide entries on web search
  • Profiles missing images

Release next week!

The  fix for WEB-2378, finally, is coming along with a number of tweaks to the world pages, like visible html tags, table misalignment, classifieds tracking, and broken group links. Also coming next week, is an updated FAQ with additional information on how Search works, including suggestions for all of you who would like to appear higher in listings for your keywords.

We're working hard to increase the frequency of releases and enable the Search team to deploy when needed in order to fix bugs in a more timely manner. Thanks for your ongoing patience and understanding.

On a related note, thank you to everyone who is talking about Search  publicly, in social media, comments, and in forums everywhere. It's a tremendous resource and we value your input highly. Please keep it up!

Yesterday, we released the third beta of Second Life Viewer 2.1. (To learn more about the Viewer 2.1, read our alpha blog post and the last Viewer 2.1 beta post.) We’ve been hard at work to significantly improve both the stability and performance of this Viewer. But we need more Residents to try it out to help us get it ready for prime time. That’s where you come in.

Now that the beta of Viewer 2.1 is more stable than your current viewer -- Viewer 2.0 -- we invite you to download the beta, send us crash reports, report any bugs that you run into, and let us know what what you like and what we need to work on in the Viewer 2 Forum.

And remember, Viewer 2.1 is full of cool features that enable you to:

  • Easily share media
  • Customize the bottom bar
  • Resize the chat bar
  • Choose between sliding the world or overlaying it with the Sidebar
  • Control your camera more easily
  • And morph your inworld voice!

So download the third beta of Viewer 2.1 today and let us know what you think on Twitter using #slviewer2.

Windows | Mac | Linux
Release Notes

Looking forward to hearing from you!

Search: Today's Release

by Linden on ‎07-08-2010 11:13 AM

Following  on from the previous blog post discussing the work we're doing to fix search - today we're happy to announce the full release of our fixes and improvements.

Perhaps the most important part of today's release is the fix for the bug that was causing some parcels  to be dropped from the search index The fix involved replacing legacy components of the Search system which has had the added benefit of  greatly improving the speed at which the Search index gets updated. In other words, changes you make will be reflected by Search much more rapidly! This new system also brings us the increased dependability and consistency we need for Search.

Index Generation

Until this week, updates could take 72-96 hours to fully propagate in Search.

This meant that new parcels and regions would take a while to  show up in Search and merchants editing their search listings were not  able to monitor their changes in a timely fashion. By comparison the new  system updates in under 12 hours! (Plus, we plan to cut that in half in the upcoming weeks!)

Spam

Also in  this release are improved methods to deal with Search spam, starting  with keyword stuffing (using large numbers of words, often the same or very similar and often not related to the location itself). You should  therefore notice that fewer of the top ranking results will look like  they were written solely to game the rankings. Increasingly, if you add  excessive numbers of keywords this may count against your ranking. Try  to use fewer and more accurate keywords, including key phrases, to  increase your relevance. If you would like to refresh your memory on  best practices, please read our guidelines.

Improvements and Fixes List

More improvements include:

    ◦    Improve search listing page generation

     ◦   Increase crawl update speed and reliability

    ◦    Remove picks data

    ◦    Increase search index size

    ◦    Adjust column headers for events

    ◦    Exclude parcels for  private sale in general search

    ◦    Activate arrow keys for  browsing Viewer results

Bug fixes include:

    ◦    Results  appearing with "None" text

    ◦    Eliminate misleading result  count estimates

    ◦    Re-index groups missing from Search

     ◦   Eliminate English spelling suggestions for non-English users.

     ◦   Encode special characters in V2 results

More Improvements ahead

We've had reports that the current  maturity filter does not prevent all adult content from being shown only  to Residents who've asked for adult content. Our next projects include  ensuring that the maturity filters work the way our Residents expect  them to, across search results, classifieds, and events. We have also  heard from people who would like to be able to browse certain types of  content and restrict their search to specific collections before they  search. Although we don't like the idea of showing empty Search pages to  Searchers, we love the idea of having browsable collection options  available to everyone who launches Search. So, we'll be designing and testing options for that, too!

Please note: that during this first release of the new system, it could take up to 48 hours for the index to be fully populated.

Search: Upcoming Release

by Linden on ‎07-07-2010 07:47 AM

We intended to have a Search release incorporating a number of improvements in June. We did have one, sort of. During the release, we became aware, both internally and through Jira, that we had another scenario to take into consideration. So, we made the decision to roll back the release. Since then, we have fixed the Jira issue, dealt with the performance implications, and made some high-value additions to the release.

Our main focus for this next release is getting rid of our legacy Search system, one that has been patched, updated, and added to for years.  We're replacing it with a new, faster, more dependable way of generating the Search indices. In short, it will be significantly improved. In addition, we wanted to tackle the two main issues we highlighted in the last post. Namely, a significant bug that caused us to drop parcels and secondary information that was causing bad Search results. It's this second point I'd like to expand on a bit here, before our next release is out.

The Association Problem
When I first joined the lab in December, I was made aware of a report from a resident land owner whose region was somehow associated with the word "Nazi" in Search. There were no WWII role-playing events and no German history groups associated with his land, so he was justifiably unhappy about this. After investigating thoroughly, we narrowed it down to a picks problem. Some Residents, a literary group inworld, who had chosen his place as a pick had phrases like "grammar nazi" in their profiles. We contacted the 2-3 Residents about the issue and asked them to change their descriptions or delete this particular pick. Unfortunately this didn't solve the problem because the GSA had "learned" that his region had something to do with Nazis. It took our GSA6 upgrade and fresh Search indices for this problem to finally go away. This is one of many examples; we've heard from several Residents citing their own favourite examples of this issue.

More Picks Issues
Aside from these accidental association issues, we have three main other issues with the picks data in Search. First, some enterprising individuals have actually created businesses based on paying people to influence Search with picks.  Second, Residents have been using picks to highlight those they care about, their friends and family in Second Life, rather than just places.  I think it's nice to call out your important relationships in a public way, it's just not good data for Search. Third, places frequently come and go in Second Life and many picks are outdated. There are those who like to keep a pick of a former favourite spot as a memory, but often, people simply don't realize that their pick isn't accurate. In either case, it's more bad data from the search perspective.

These cases add to the problem of using picks data in Search and chasing the consequences of these issues as they come up is a time-consuming pursuit that doesn't actually address the core issue. Though we think picks can be valuable information on important inworld spaces, right now, picks are a problem in Search.

We'll continue to test different and better ways of using picks information, but for now, starting with this upcoming release, picks are no longer included in Search relevancy. They will still be visible in resident profiles in the viewer, but they won't affect Search until we find a better way to use that information and add value to Search. We do appreciate the value of picks, but when we do include them again we want to be sure it adds to the relevancy rather than making it worse.

I look forward to writing more this week about the rest of the improvements included in the upcoming release!

Viewer 2.1 Beta Available Today

by Linden on ‎06-23-2010 02:18 PM

Today, we released Second Life Viewer 2.1 Beta, focused on performance, stability, and usability improvements. Our engineering team continues to tune our software, integrating numerous performance and stability improvements and attacking top crashers in the Viewer. (So, please continue sending those crash reports when you run into trouble!) As Howard noted in his last blog post, we are focused on fast iteration cycles and hope to be releasing updates to the Viewer 2.1 Beta every one to two weeks. The faster we can get software into your hands, get your feedback, and review your performance statistics, the faster we can optimize and improve your Second Life experience! For more information about specific crashes, performance fixes, and bugs addressed in this version, check out the Viewer 2.1 Beta Release Notes.


Laser-Like Focus on Performance and Stability
During our last iteration cycle, we made an important decision to defer several new features that we had previously planned to include in this release. Right now, our priority is to improve Viewer 2 performance and stability and to create a strong software foundation on which to build new features on in the future. That said, our Residents--both new and old--have a wide range of computer configurations and this presents unique challenges in terms of how we build the software and deliver a compelling 3D virtual world experience. We're tackling these challenges head on and have re-focused our development efforts toward delivering a viewer that performs well, with a minimal number of crashes and bugs, so that all Residents can enjoy their Second Life experience to the fullest.

Viewer 2.1 Beta is One Choice Among Many
As we've always said about Viewer 2, it is one choice among many great Viewers available today. Viewer 2.1 Beta and Viewer 2.0 will both continue to be available during the beta cycle; there will be no forced upgrade at this time as there will likely be more bugs and crashes than you'd find in a release version. At this time, Viewer 2.0 is still the latest "official" Second Life Viewer and Viewer 1.23 is still available (more on this below). In addition, the Third-Party Viewer Directory now contains 10 third-party viewers which address specialized needs of our Residents, many of which are based on the Snowglobe open-source program.


Viewer 1.23 Will Continue to Be Available
Many of you have expressed concern about the deprecation of the 1.23.5 viewer. Our usual policy is to deprecate the the last viewer version 60 days after the latest version's release. With Viewer 2.0, we focused on creating a viewer that would better serve the needs of new users with a simplified user interface and easier window management model. We understand that Viewer 2.0 doesn't address the many complex needs of our existing Residents--yet. As a result, we have decided not to deprecate the 1.23.5 viewer at this time, and it will continue to be available for those who wish to still use it.

Thank you for trying Viewer 2 and providing such great feedback. Your comments, both positive and negative, are helping us prioritize and guide our design and development process. If you have thoughts you'd like to share, I encourage you to visit the Viewer 2 Forum and post updates to Twitter as well (using the #slviewer2 hashtag). Keep that feedback coming!

Resources
Download Page
Release Notes
Jira
SL Answers
Knowledge Base for Voice Morphing

A Step Forward for Second Life Search

by Linden on ‎05-25-2010 12:34 PM

Today, we wanted to update you on the status of Search and what we're doing to make it a more accurate and powerful Second Life tool. As many of you are aware, we have had some challenges with Search, particularly in Viewer 2. We know that many of you consider Search the most important item for us to fix and, consequently, it's a top priority for us.

Upgraded Google Search Appliances

We use the Google Search Appliance (GSA) to power most of your searches. Google's technology can crawl, index, and rank a huge amount of inworld content. With few exceptions, when you search inworld, the results you see are coming from our GSA machines. We recently upgraded from GSA 5 to the latest GSA 6 version. This change not only produces a better search experience, but it also causes some changes in search behaviour that we want to share with you.

The Impact of GSA 6

GSA 5 results were good, but not good enough. GSA 6 delivers more sophisticated relevance, scales more easily, and returns results much more rapidly, especially in serial searching and search pagination where we've seen a significant performance improvement. Second Life Search processes millions of searches every month, so query performance is something that we continually monitor and improve. It also supports multi-language relevancy tools -- a big win for our international Residents! Additionally, the new algorithms more efficiently factor in many types of "search gaming".

Due to the more sophisticated relevancy in GSA 6, search results will be more dynamic. That's a big change that will be most noticeable to those who may be accustomed to the more static search rankings from the previous version of GSA. You may already have seen that your search listing can move up and down in search results more than it did before. As the new GSA 6 boxes learn more about Second Life content, the results will settle down and become more relevant. However, you should expect that, in general, search results will still be less static.

Improvements and Fixes

As we mentioned, improving Search is a top priority.  Here are some changes that went out last week.

Improvements  include:

    • Doubling the number of results per page in Viewer and web
    • Improved update speeds for Destination Guide in Viewer 2
    • Added Destination Guide profiles to web search
    • Change to "Profile" labels in line with Viewer
    • Sorting by region name

Bug fixes include:

    • Maturity boxes auto-unchecking
    • Missing event details in Viewer
    • Unicode search failures for classifieds
    • Classifieds not triggering Viewer launch

More Improvements and Fixes Slated for June

Here are the significant issues that we're planning to address in June.

    1. We currently have a bug that can randomly cause parcels to drop from Search. Basically, when we crawl the hundreds of thousands of parcels inworld, occasionally one will have malformed data and it doesn't appear in the search index. It doesn't happen often, and when reported to us we can fix it, but clearly this is an important one to solve. We have a solution planned and will be working to push that through as soon as we can.
    2. Relevancy is currently skewing in unexpected ways due to some of the secondary data that gets associated with a particular location. We understand the problem and, whilst subtle, it's definitely not helping so we are taking steps to address this too.

After launching Viewer 2, we found that events posted with deliberately inappropriate keywords were causing problems. This kind of "gaming" caused some events to dominate search results. This is why events were removed from the All tab. We'll be looking at how to best address event listings without causing search results to be unfairly skewed. For now, events can be found in the Events tab.

    Questions? Check out the Search Guidelines

    If you are having search related issues, please read the Search Guidelines before you contact support. This is important because most of the  reported search issues have actually been caused by the parcel being less relevant than a Resident expected; sometimes it's due to keyword  spamming, bot abuse, or otherwise using content to game the search rankings. Not only is gaming Search a violation of our terms of service, but increasingly it will be ineffective and may well result in a lower ranking.

    We are committed to improving Search and implementing GSA 6 was a big step forward, but we also know that we have plenty of work yet to do and some significant problems to  address. We are monitoring results closely. You should begin to see some real improvement very soon. Meanwhile, we'll continue to keep you posted on our progress here in the blogs.

    On the Road to Viewer 2.1

    by Linden on ‎05-05-2010 10:26 AM

    Howdy! Esbee Linden here, your friendly neighborhood Product Manager for Viewer 2. Last year, our product and engineering teams went heads  down with our design partners to rethink the viewer new user experience, the result of which was Viewer 2. Thank you to everyone who has  downloaded Viewer 2, tried it out, and given us so much thoughtful and constructive feedback! 

    We're seeing great Viewer 2 adoption --  over 400,000 downloads so far! Viewer 2 is working well for new Residents, but many existing Residents are finding that it doesn't meet  their needs -- yet! We hear your concerns and now we're turning our focus to your requests -- the upcoming Viewer 2.1 release will address some of your most consistent and pressing feedback. And it's always  important to remember that we celebrate viewer choice. Viewer 1.23 continues, and there are additional choices in the Third-Party Viewer Directory.

    Viewer 2 hasn't been around all that long, but we're already seeing encouraging signs from new Residents. We're seeing a rise in people returning after seven days, and more new Residents are traveling to additional destinations once they've gotten started in Second Life. Early indications are that we've created a good foundation for new  Residents, although it's also clear we have more work to do.

    As part of that effort, we released Viewer 2.0.1 a couple of weeks ago, which addressed several performance and stability concerns, including the most common crashes,  texture cache improvements, and voice connection issues. We've also improved search performance. If you've been frustrated by search in the viewer in the past, try a favorite search term again -- you'll be  pleasantly surprised.


    And we're already looking to Viewer 2.1, scheduled to roll out  this summer (that is, summer in the northern hemisphere). Our engineering teams are focused on a few areas: usability, performance, and key new features like improved avatar  customization and snapshot sharing via social networks.

    Usability Enhancements
    We've been following Resident reaction to Viewer 2 very closely since its  release, and have noted a number of things we want to improve in  response to your feedback. At the moment, we're hoping to get the  following features and changes into the Viewer 2.1 release this summer.  While we may need to change course on some of these (and will keep you updated if we do), here's a look at what we're planning:

    • Adding individual volume controls for Shared Media objects.
    • Customization of the bottom bar, so that you can quickly access the features and functionality that you use most often.
    • Updates to the camera and movement controls, so we can allow you to pan and orbit your view of Second Life at the same time.
    • Adding the 'Build' option back to the right-click context menu.
    • Fixing the bug where CTL-ALT-F1 does not hide all the Viewer UI as it should. This fix should solve a lot of problems for our machinimists and photographers.
    • Adding a preference that allows users to control whether the Side Bar opening resizes the world or slides over it.

    Better Performance
    We're continuing to make performance and stability enhancements as well. Our data on Residents using Viewer 2 indicates that a large number of people are running Second Life on lower-end machines. After reviewing the data, we're optimizing performance for all types of machines, bringing  performance benefits to all users. We will also be making some rendering updates to improve the performance of texture downloads, which will also help SL run more smoothly on older computers.

    The team is also continuing to improve internal processes for testing and shipping software to ensure that we  deliver the highest-quality software possible. We are also looking at a few more crash fixes and addressing some memory leak issues. Your crash reports help us discover edge-case  configurations and other issues that aren't always easy to identify. We're working rapidly to correct these, so please keep sending those  crash reports!

    New Features

    Among the new features slated for Viewer 2.1 are improvements to avatar customization. First, we're introducing multi-wearables, which  will allow you to wear more than one layer of a particular type of clothing. Want to wear two shirts? Go for it! Second, we hope to roll out multi-attachments, which will allow you to have more than one object at any given attachment point -- a feature that both consumers and merchants will appreciate. No more losing the collar of your jacket when you want to wear that favorite necklace! We're also improving our snapshot sharing functionality, making it easier for you to share photos with your friends via Facebook, Flickr, and other social networks.

    Open Development
    It's important to note that we plan to do much of the Viewer 2.1 development in the  open. We continue to improve our open-source development processes and are working with the Snowglobe 2 team to make it easier to move features between Snowglobe and the official Viewer branches. For more information, check into the open development forum, and see how we're approaching multi-wearables and other projects.

    As T Linden wrote in his post earlier this week, we're aiming to release Viewer 2.1 this summer, delivering on our promise of shorter Viewer 2 release cycles.

    Lastly, please know that we are listening and that  we care deeply about your opinions on Viewer 2 and SL in general. I hope to continue an ongoing dialogue with the community to ensure we are  building features and functionality into Viewer 2 that will delight all our Residents and heighten your experience in Second Life. So please stay tuned for more updates! As  always, please feel free to post feedback to the V2 Forum, our public Jira, and in Twitter (#slviewer2), and let me know what  features and updates you'd like to see in Viewer 2!

    Cheers!
    Esbee Linden

    Grid Outage

    by Linden on ‎04-29-2010 10:03 PM

    It would seem inevitable that the minute I blog about a change in my focus from stability to performance, we would have a major grid outage.  As I'm sure many of you experienced, the grid was unavailable from just before 9pm PT, until about 4am PT this morning.  The cause of the outage was the loss of our Phoenix, AZ data center.

    This data center not only contained simulator hosts, but also many key central services, including inventory databases and databases that login processes are dependent on. The reason we lost the Phoenix datacenter was a power outage which affected the floor where the Second Life servers are located.  The root cause of the power outage and the circumstances around it were such that we were impacted even despite having triple redundant power systems in place.


    I have spoken to our datacenter provider in Phoenix and let them know, as well as their entire staff, how much impact this outage had to our Residents.  At this moment, we continue experiencing inventory database issues, that may still be causing login problems for some residents.  I sincerely apologize for the inconvenience this has caused Residents, merchants, content creators, and everyone who should expect the grid to be stable, reliable, and always up.

    You can be assured that we are testing all servers, network elements, power supplies, and application processes to make sure that we have the grid up and in a stable condition as quickly as possible.  I will also continue to work to drive better engineering into the grid, so that any failure (even one this catastrophic) will not result in such an extended time to bring systems reliably back online.  We continue to make progress on this front but, as I have seen first hand over the last 18 hours, we still have a long way to go.

    Search: Release Ready

    by Linden on ‎04-27-2010 02:34 PM

    Update May 3rd: This release is complete!

    Update: With all of the work associated with the Grid Outage this week, we were asked to hold off on this upgrade until Monday. So, we'll get this out as soon as possible on May 3!

    ______________________________________________

    This Thursday, the Search team is excited to complete the server search appliance upgrade announced here.

    There may be some variability in search results during deployment, but the release should be complete within 24 hours.  Directly after, expect to see immediate improvement in search query speed and relevancy.

    We'll update again once the release is complete!

    Grid Update

    by Linden on ‎04-26-2010 07:08 AM

    I know Its been a while since my last post, but with our major announcement around the new resident experience I didn't want important information about our infrastructure to be missed.  So I'm back now to report progress and direction across the grid.

    First up is the successful completion of our data center relocation and reconfiguration projects.  We had undertaken 3 separate initiatives which were lead by our Escape from SFO project - targeted at shutting down Linden's original data center in San Francisco.  We also reconfigured our DFW (Dallas) data center, migrating servers to a newly designed space within the Dallas facility, extended LLnet (our private data network) establishing fully redundant and diverse fiber rings within the US, and commissioned our new DCA data center located just outside of Washington, DC.  We closed the doors on the SFO data center at the end of February and completed the DFW reconfiguration in late March.  As part of these changes, we also completed some server load balancing as we expanded our Phoenix (PHX) facility.  We now have 3 data centers operational in the US - PHX, DFW, and DCA.  We'll be continuing to add servers to the DCA and PHX locations during the course of the year, then begin to turn our sights to Europe.  We had hoped to target a physical presence in Europe sometime in late 2010, but that now looks to be more likely early 2011.  The same would go for our plans for a presence in the Asia Pacific region - planning to occur in 2010 and deployment in late 2011.

    A significant part of that planning process will be the evolution of the simulator and back end systems and services that make up the grid.  I've talked in the past about the need to change the way we manage capacity needs on the grid, and how server virtualization must be a central part of our infrastructure strategy.  A main component of the Core Engineering roadmap for 2010 will be evaluation of server virtualization and how we design simulators that take advantage of this technology.  Its clear that we cannot properly scale the grid under our current constraints - from both a cost and performance standpoint.  As has been pointed out to me many, many times, if we are to truly be a "green" technology, we've got to be able to supply server capacity on demand.  With data showing that the grid itself is never more than 50% occupied (even at peak usage), that's a lot of server capacity, drawing significant amounts of power, while maintaining regions Second Life that have no activity.  Additionally, from a performance standpoint, it would be great to be able to offer the ability to respond to demands for additional capacity as simulators become heavily loaded with avatar numbers or avatars with high prim counts.  Lots of analysis to do on this front, and its not an easy engineering problem to solve, but its one that we must solve in the next 12 months.

    That brings me to my central topic for this post - our asset system.  Asset storage currently rely on 2 systems - Islilon storage clusters (managed internally in our PHX and DFW data centers) and Amazon S3 (secured storage in the AWS cloud).  Over the past few months we've evaluated our current storage circumstance and have decided to move all of our assets to the S3 cloud, and (eventually) utilize as our primary storage tier.  We looked seriously at making an additional investment in a "vertical" solution with our Isilon clusters, but decided that moving toward a cloud environment was best, not only from a cost and reliability perspective but, even more importantly, from a performance perspective.  As much as my focus, over the last 12-18 months, has been on grid stability, that focus has turned squarely to performance.  I know how much our asset systems and management of assets across the grid affect resident performance and experience, so a redesign of this system is a "must complete" objective this year.  We're still evaluating the final technology solution - which may include the deployment of our own internal storage cloud - but, its already clear that (directionally) we'll move all assets to a cloud technology as our primary storage method and deploy an (internally managed) caching layer that will serve most asset requests.  Its standard design for most content managed across the web and, with the deployment of a key piece of server code coming in June, we'll have the technical capability to move assets closer to the "edge" and eventually utilize web standard CDNs (Content Distribution Networks).  Eventually, I want us to be able to serve asset requests directly from the viewer to an internal cache or service, as opposed to always managing those requests through the simulator (current methodology).  Actually a main objective of the current efforts around the simulator, is to pull out as many ancillary services and processes as possible, allowing the simulator perform its primary purpose - region simulation.

    Right now we are in Phase I of the asset system redesign, which is to move all of our assets to S3 - while still primarily serving from our internal Isilon clusters.  We have been using S3 as a secondary tier/bulk storage system for almost a year now, but placing all assets on S3 will give us a couple of immediate benefits.  First it will prevent us from pushing the existing Islilon clusters any harder (they have proven unstable if utilizations move past certain thresholds) and second, it will allow us to complete a full "garbage collection" run on the system.  As with our real lives, second life has lots of trash which accumulates over time, and results in large amounts of data unnecessarily taking up storage capacity.  Current storage is now over 450 terabytes and climbing.

    So today, every asset request is made as follows:

    • Viewer to Simulator, Simulator to load balancer, load balancer to Isilon cluster.

    If this path does not contain the asset, a message will be returned to the simulator and the simulator will then initiate a "get request" to S3 (via a proxy server), then direct the asset back to the viewer.  So residents will see a bit more delay with this configuration (if the asset requested must be retrieved from S3), but our internal performance data is actually showing better response rates from S3 than from our Isilons. This is promising as we move to S3 as the primary storage platform.

    Our next phase will begin in early 3Q and will include building the mid level cache layer and then move primary storage to S3.  In parallel we're working on our asset backup plan, as the protection of resident intellectual property must be part of any infrastructure strategy we implement.

    Starting with this posting, I'll be focusing my blogs away from stability work and more to performance and the evolution of the grid.

    Hi, everyone.

    As I said in my Viewer 2.0.1, Beta 1 blog post of last week, we've been working on a variety of stability, security, and performance issues for Viewer 2. Those fixes are now available in today's release of Viewer 2.0.1.

    In this release, we have addressed several items:

    • Performance improvement, particularly relating to texture downloads.
    • Fixed a number of crash bugs that were found through our crash reporter. Big thanks to everyone who sent in crash reports.
    • Updated a few support libraries.

    We hope that these fixes will improve your experience with Viewer 2. Please see the Release Notes for additional details.

    Note that this release does not include any substantial changes to the user interface, but future releases will. We're very aware of all the discussion around UI, and we are listening. Thank you for all of your feedback and passion.

    If you're currently using Viewer 2, then it's important to download this update. Please click on the correct link below to begin the automatic download.

    Windows | Mac | Linux

    We look forward to your continued feedback on the V2 Forum and please log any bugs that you find on Jira.

    Best regards,

    Q Linden

    Hi all,

    Q Linden here with some information about our latest Viewer 2 software update.

    Since launching Viewer 2 last month, we have received a huge amount of feedback about the new Viewer -- in the blogs, forums, and JIRA. We thank you for your enthusiasm and yes, for your criticism. We are lucky to have such a committed and vocal community, and we want you to know that we've been listening, responding when we can, and working hard on enhancements and improvements.

    Today we're releasing Viewer 2.0.1, Beta 1 which is only aimed at  improving overall security, performance and stability in Viewer 2. There are no feature changes included in this release,  but they will be coming soon in future releases.

    Viewer 2.0.1, Beta 1 addresses these key areas:

    • The top crashes we've seen from Residents since the launch of Viewer 2.
    • A set of fixes to texture threading that should improve the stability of the texture cache and reduce CPU usage.
    • Voice connection issues.
    • Various minor, but important, bits of cleanup.

    Please read the Viewer 2 Release Notes for more detailed  information.

    For this beta release, we're using something called a Watchdog Timer. If it detects a stall, it will deliberately crash the viewer, and then we'll see a crash report..

    A personal appeal -- please let the  viewer send in crash reports!  We cannot fix bugs we don't know about, and the crash reporter is designed to give us enough information to find the causes of these bugs. While the technical information (the "stack trace") we get from the crash reporter usually tells us where in the code the crash occurred, it doesn't always provide enough information to tell us why the crash occurred. The Watchdog Timer will help provide the "why" of crash reports, by giving us the location on the grid where the crash occurred (which is sometimes helpful). And if you're given a chance to do so, please give us any hints you think might be able to help find the crash.

    Thanks for your help.

    Q Linden

    UPDATE: The auto-updater is currently unavailable. If you wish to download the Viewer 2.0.1 Beta 1, please use one of the links below:

    Shared Media, Security, and Privacy

    by Linden on ‎03-15-2010 05:27 PM

    At Linden Lab, we take Resident security and privacy very seriously.  Since we launched Viewer 2 and Shared Media on 2/23, there's been a lot of excitement about the creative potential of Shared Media and we're already starting to see some very innovative uses of this feature, some of which are showcased in the Shared Media Destination Guide. There's also been a lot of discussion in the community about potential security issues when Residents view Shared Media inworld. We have been closely monitoring the blogosphere and spoken with many Residents about the concerns. I would like to take this opportunity to address them directly and share why we chose to auto-play Shared Media, what issues have arisen, steps that we're taking, and some things that you can do on your end to increase viewer security.

    Benefits of Shared Media on Auto-Play
    Many months ago, the Viewer 2 team and I discussed the benefits and potential drawbacks associated with having Shared Media auto-play set as the default. We weighed the drawbacks with the upside of having Shared Media capabilities ubiquitous throughout Second Life. After much debate, we made the auto-play default decision because of the overwhelming value, from an experience perspective, that Shared Media brings. The drawbacks are minimized because Residents have complete control of Shared Media and can easily disable it. Instructions are below.

    Your Concern

    Because Shared Media brings the web inworld, there is some potential information sharing to be aware of, just as there is whenever you use the Internet. For example, when you request a web page in your browser, your IP address may be transmitted to the web server (hosting the website) and then stored in log files. IP addresses can be used to pinpoint geographic location.

    Now that Shared Media introduces web content more broadly into Second Life, we have received Resident concerns around whether a website can gather IP information from avatars in Second Life. The answer is that it's possible, but it would take considerable effort. If you are concerned, then we recommend that you disable auto-play, as we outline below.

    So, we all need to be smart about information sharing inworld, just as we are on the Internet, and the real world. We're taking steps to help minimize the issue and you can, too.


    Shared Media Cookies Help Increase Resident Security

    The Viewer 2 Beta update, released today, has an enhancement that helps protect your privacy when accessing Shared Media. Viewer 2 now stores "cookies" per user rather than per Viewer. This is important because it means that a website can't tie multiple accounts to a single user. 

    We are also exploring additional ways to increase privacy protections for Shared Media, including safe browsing technologies, that will roll out in coming releases.


    Steps That You Can Take
    There are a few easy things that you can do right now to help protect your privacy.

    • To turn Shared Media auto-play off, just go to Me > Preferences  > Sound & Media > Allow Media Auto-Play. Just uncheck that box and click "OK" to save the change. This simple action protects against information sharing with third-party websites.
    • You can also turn off the media attached to other avatars, which prevents attachment media from playing.
    • You can assess the URL of a website on a Shared Media object by using the inspector or the Nearby Media Panel.
    • Clear all cookies and other data by going to Me > Preferences > Privacy > Clear all user data.
    • And, please be as vigilant about data security inworld just as you are on the Internet. Do not share sensitive information inworld.


    We look forward to your comments in the Shared Media section of the Viewer 2 Beta Forum. And, if you find Shared Media bugs, or have suggestions about features, please file a PJIRA, under the SEC heading.

    V2_Search_Home.jpg

    I'm very happy to announce that, as a core feature of the new Viewer 2 Beta, Second Life Search has been redesigned inside and out to make finding the people, places and content you're looking for easier than ever before. We've revamped the interface, overhauled what's happening under the hood, and refocused our work on Web-standard tools and methodologies that have not only improved our ability to bring you more relevant search results, but also made us faster on our feet as a development team. And, we've been building a first-class Search team--from Yahoo!, eBay, Google, and Amazon--to bring the best of search and advertising services to Second Life. We've been hard at work improving Second Life Search, and we're excited to share some of the work that we've recently completed in conjunction with the new viewer.

    Second Life Search, as part of Viewer 2, now includes:

    • A New and Improved User Experience: The first thing that you'll notice about search in Viewer 2 is that it's got a whole new look and feel. We've reorganized the interface to be more intuitive and to use filtering and sorting techniques common to search on the Web. We've also cleared away some of the legacy UI found in Viewer 1.23 to create a clean hierarchy of information and actions. One of our biggest goals in this redesign was to make the search interface more inviting for new Residents while still preserving many of the search options that current Residents had become accustomed to. For example, in the Viewer 2 Beta,  there is now only one location for conducting a "Places" search (rather than a tab and a filter as found in Viewer 1.23), and that "Places" search can be filtered and sorted to achieve similar results as in Viewer 1.23.
    • A Robust Search Infrastructure: We've also redesigned the search infrastructure to be more robust and nimble. These changes, although maybe not as immediately visible, are a major step forward as we work towards innovating more rapidly and being responsive to Resident feedback. Previously, Search was hard coded in the XUI language of Viewer 1.23; now, search uses HTML on top of Django Web services and can be developed independently of Viewer 2. This gives us greater speed and flexibility as we develop additional search features, advertising products, and bug fixes.
    • Google Search Technology: As we all know, Google's relevance algorithms are the defacto industry standard on the Internet. In Viewer 1.23, the Google Search Appliances (GSA) served results to the "All" and "Group" search tabs for years. For Search in Viewer 2, we have expanded our use of the GSAs, and they now provide the first set of results for most search types. (Advanced search users can still access Linden Lab's proprietary search tools through category filters and sorts.) Google alone cannot provide the best results for Second Life Residents, however. So, the Search Team is continually adjusting how the GSAs work in order to provide the highest quality, most relevant results within Second Life.
    • Enhancement to Classified Advertising: Second Life business owners can benefit from a significant change to classified advertising within Second Life Search. Now, we can place classifieds alongside more searches, and advertisers will have a better opportunity to be found by relevant buyers. In Viewer 2, potential ad exposure (i.e., the number of search requests that serve ads) will increase to 100% of initial searches, whereas in Viewer 1.23 classified ads were only shown alongside search results in the All and Group tabs.  Additionally, we have increased the number of featured classifieds from nine to 12 on the Search homepage (which appears when you click the magnifying glass or type CTRL-F).

    The bottom line is that the new Second Life Search benefits everyone within the Second Life economic ecosystem--most importantly business and Residents--as it plays a crucial role connecting inworld buyers and sellers. The more that Second Life Search can help Residents, particularly new Residents, find compelling content, communities and experiences, then the higher the probability they will be come active, long-term Residents. Plus, more Residents mean more potential customers for inworld businesses.  So, go download the Viewer 2 Beta, try the new and improved Second Life Search and let us know what you think.

    Searching Tips

    V2_NavBar_circled.jpg

    To browse the features in the Find window (including Search, Destination Guide and Classifieds), click the magnifying glass icon in the Search field at the top right of the Viewer 2 navigation bar. To initiate a search, type a keyword in the Search field on the Viewer navigation bar.

    Filter-Sort_circled.jpg

    Once you've initiated a search from the Viewer navigation bar, the Find window will open with an initial set of relevance-ranked results. You can then narrow your results with Category filters and organize the list with Sorting options. Your search terms are now carried from one category to the next, with no need to re-type!

    Further Reading:

    Resources to Help  You Learn Viewer 2

    And, if something breaks or  you're really stuck, then contact Support and we're happy to help.

    Edit to Post:

    • For more details, visit the Search Release Notes wiki. We'll update this for each release with highlights of what got done and a list of known issues we are working to resolve (e.g., "Events and Land Sales are not easily browsable or searchable in current UI").
    • Thanks for your feedback in the comments. I've posted a summary and responses below.

    Twitter OAuth Comes to Second Life

    by New Resident Gisele Linden on ‎03-02-2010 05:55 PM

    Hi, I’m Gisele Linden.  I’m a product manager here at Linden Lab supporting a variety of technical  projects, and today I’m here on behalf of the Scripting team, also known as Team Pixie Dust (see below), to share exciting  news with you. While working on new scripting features, we built a library that allows scripts in Second Life to talk to Twitter on our internal prototype platform. We recognized that this could be really useful to the Second Life community.  And so with the help of residents Cale Flanagan, Latif Khalifa and Strife Onizuka, we converted it to run on the current LSL scripting platform.

    What this  means (as you may have read) is that you can now let people  Tweet from within Second Life in a safe and secure way, without having  to set up external Web servers, and without requiring Residents to  re-enter credentials if they want to use Twitter from inworld. OAuth is  an open-source protocol that provides secure API access to login  credentials. This means you can give Web sites or inworld objects the ability to update your Twitter account without actually having to give out your username or password. The Twitter OAuth Library  we created also allows for fine-grained control over which Second Life  objects can send updates directly to Twitter.

    New Ways to Share Second  Life With Your Friends

    The Twitter OAuth Library allows any object in Second Life to  send updates to Twitter when any interaction you define occurs. What  sorts of applications could utilize this new ability? The possibilities  are endless! But here are a few ideas we think would be cool, just to  get your wheels turning:

    • Having a HUD with a “Tweet” button that lets a Resident Tweet out their SL location at  any time. 
    • The ability for a vendor to send out updates when hot selling inventory is moving off the virtual  shelves. 
    • A dance machine that informs your Twitter followers that you're hitting your favorite Second Life  dance floor.
    • A scoreboard that can send out scores and results of games taking place inworld.

    Getting Started

    To get started with the Twitter OAuth Library for Second Life, check this wiki page, and take a look at this video of the Twitter OAuth Library in action. And an LSL version of the Twitter OAuth Library is now available on Xstreet SL (for free, of course). We can't wait to see what you cook up, and we  look forward to seeing your Tweets from Second Life! And if you are  Tweeting from within Second Life and care to add a hashtag, we'll be keeping an eye out for #inSL so we can see what people are up to.

    Why We Call Ourselves Team Pixie Dust

    Scripts are lines of code  that enable interactivity with objects in world. They allow us to do  just about everything with all objects in SL, from simple things like  sitting in chairs and opening doors to really interactive things like dancing, flying around with a jetpack, and playing games.

    Simply put: Scripts are the pixie dust that brings Second  Life to life!

    What's Next? Your  Creations

    This is just the beginning of how our magic pixie dust can enhance our experiences both in and out of Second Life. We look forward  to seeing all the amazing, creative things that you come up with. Please  post your questions and creations to the discussion thread below.

    In the meantime, we look forward to seeing your Tweets! Join the conversation below:

    Discussion.png

    Thanks,

    Gisele  Linden and all of Team Pixie Dust

    Please join us in discussing Gisele's post about bringing Twitter OAuth into Second Life.

    "While working  on new scripting features, we built a library that allows scripts in  Second Life to talk to Twitter on  our internal prototype platform. We recognized that this could be really  useful to the Second Life community.  And so with the help of residents  Cale Flanagan, Latif Khalifa and Strife Onizuka, we converted it to run  on the current LSL scripting platform."

    Read the original post here.

    Grid Stability Update

    by Linden on ‎03-01-2010 06:00 AM

    Normal.dotm 0 0 1 545 3110 Linden Lab 25 6 3819 12.0

    I wanted to follow up on my Feb 18th posting, specifically to update you on the concurrency blips we’ve been experiencing the last few weeks.   These blips would occur at regular intervals and mostly impact login for short periods of time (less than 1 minute).  As I mentioned in that last posting, we were focused on some networking issues with our newly deployed Juniper core switches.  We’ve made great progress with Juniper and have those issues under control, but the network turned out to be neither the trigger nor the root cause of the login problems.  The trigger was a non-responsive state occurring in one of our mysql.agni slave databases (aux.mysql.agni).  This database contains session and state information that is called on during the login process, and when it became non responsive, login failures will occur.  The problem would then  “self clear” rapidly, and logins would resume a normal state.  The rapid occurence to the problem and quick recovery seemed to meet the signature pattern of a network related event, and with the configuration issues we've been working through with the new Juniper gear, we initially focused our attention in that area.

    After more extensive testing and isolation, we began to refocus on the configurations of our newly deployed database hardware.  As part of our datacenter projects, we have been refreshing some of our simulator hardware (all sims that were in the SFO datacenter), and also made significant upgrades to our database hardware – including our key central database (mysql.agni).  When we began deploying the new simulator hardware, we found some initial production issues, which exhibited the same failure mode that we have been seeing on the aux.mysql.agni database hardware.   The root cause of the simulator hardware problems ended up being a configuration where hyper-threading was turned “on.” 

    Hyper-threading is an Intel term describing functionality that Intel has built into its chips to allow multi thread code to process in parallel.  In theory, the promise is to provide greater efficiency and performance of a mult-thread application running on this chipset.  Hyper-threading had been deployed on previous Intel chips with controversial results.  We have equipment currently deployed on the grid that have hyper-threading capable chips, but have turned it “off” by default.  However, in the latest version of Intel chips (Nehalem Core i7), which are part of our newest generation of hardware, Intel brought hyper-threading back and turned “on” by default.   We decided to give hyper-threading another go, and everything tested well, so we deployed the first set of new simulators with hyper-threading “on.”  When problems began to be experienced in production, we quickly made configuration adjustments to turn it “off,” and then changed our default configurations to ensure hyper-threading was “off” on all subsequent simulator hardware.  Unfortunately, our new database hardware was deployed after we made these default configuration changes, and had hyper-threading turned “on.”

    Last Friday, we rotated the aux.mysql.agni database to hardware with hper-threading turned “off” and have monitored performance through the weekend.  The good news is that this database has remained stable, with no login blips.  Our plan was to monitor through the weekend and evaluate more data (now complete), and we have concluded that hyper-threading was actual the root cause of the problem.  This now means we will be rotating all databases that have hardware with hyper-threading “on,” including the central mysql.agni database.  Look for a maintenance window announced this week as we rotate these databases and, while its unfortunate that we have to cause some additional Resident impact, I’m glad that we have resolved the issue can get the grid back to a stable position.

    "Escaping" our SFO Datacenter

    by Linden on ‎02-18-2010 09:16 PM

    Normal.dotm 0 0 1 741 4225 Linden Lab 35 8 5188 12.0

    We’re entering the final stages of our data center migration from San Francisco (SFO), and hit a significant milestone last week when we successfully migrated mySQL.agni (Second Life central database) from SFO to our Dallas datacenter (DFW).  This was a critical step for a number of reasons.

    First, the SL infrastructure has always been highly sensitive to backend latency between mySQL.agni, associated slave databases and the SL simulators.  So, physcially repositioning the central db over 1000 miles away from its original location had the potential to introduce many performance issues.  Fortunately, over the last 18 months, we’ve enabled many internal tools and analytics that gave us the data necessary to anticipate where issues may occur, and implemented a design modification to our internal network (LLnet), which kept us well within our latency tolerance windows.  LLnet is now operating as 2 independent and fully redundant fiber rings, in the eastern and western U.S.

    Second, we are now operating our most critical database (mySQL.agni) on the latest generation of server hardware.  This hardware was designed and tested in mid-late 2009, and performance data indicated that we would experience a marked improvement in query times and overall stability.  The new hardware is the first to include SSD's, which significantly improve disk I/O performance.  Preliminary data has already shown that these improvements are now being realized, more specifically through a dramatic reduction in "slow queries" (those taking >1 second to complete).  Ahhhhh, the benefits of good capital investments in hardware!!  While its hard to quantify these backend improvements into tangible Resident improvements, I am very confident that all of the combined work related to infrastructure will begin to manifest itself in lag improvements, and a more delightful overall in world experience. (M has internally designated 2010 as the year of DEEELIGHT for Residents!!) I’ll be tracking that data very closely, and continue to keep a close eye on Resident Satisfaction metrics.

    As a side note, Charity Linden blogged about our planning and execution of the mySQL version upgrade in early January.  This planning work and the experience gained during the version upgrade proved to be invaluable in the planning of the central database migration from SFO to DFW.  We followed many of the same procedures and completed the physical migration in about the same time (1 hour) that it took to upgrade the major software version.  This was the kind of work that was deemed as too complicated and risky just 12 short months ago.

    As far as the remaining work to complete our exit from SFO, we are now migrating SL regions, external and internal web properties, and many internal systems that support development efforts and corporate systems.  We are still well on track to complete this work by the end of February, and will then use the month of March to finish our DFW facility realignment and begin bringing simulators online in the new Virginia datacenter (DCA).  At that point, we’ll be turning our sites to international expansion, including a presence in Europe and strategy for Asia-Pac.  Look for more updates from Les Linden, as we move into the final stages of work.

    Lastly, I wanted to touch on some continued stability problems we have been facing.

    I’m always dubious about work that “pokes” our infrastructure.  While every bit of work that we are doing will be the foundation for a stable, scalable, and high performance experience in the future, there are certainly short term pains that Residents are experiencing.  One of the main causes of current problems is related to a new generation of datacenter core network switching equipment that we have deployed in DFW.  This equipment (Juniper EX 8XXX Series) is at the heart of our new intra datacenter network design and represent a major improvement to our core network.  The new design will allow for fully redundant, auto failover, switches to be operating (current core switch failures require manual intervention).  These new switches have not been operating as expected and are intermittently experiencing outages, which cause very short duration impacts to in world performance and login (durations are less than 1 minute).  We have been working directly with Juniper and have a high degree of visibility on the problem (both within Juniper and Linden).  In the meantime, we’ve got our internal monitoring and support teams closely watching these particular elements of the grid and are ready to quickly react if a failure occurs.

    Next month, I expect to blog about the redesign of our asset systems.  We’ve formed an internal team to review current architecture and options available, including more extensive utilization of cloud storage technologies.  I saw all of the questions from my last post, specifically regarding use of  “the cloud” and the implications this has on areas from content and personal security to lag.  All of these issues are key questions that we will be considering as we decide on a future design. 

    As always, thank you for your continued patience as we work through the final stages of our data center work and your feedback is always welcome and insightful  - well, most of it ;-).

    Server 1.36 Deploy

    by Honored Resident Lil Linden on ‎02-17-2010 01:24 PM

    Update @ 2010-02-17 13:24: The full roll is complete.
    Update @ 2010-02-17 07:00: The full grid roll is about to begin.
    Update @ 2010-02-16 10:55: The even sims are being rolled back due to content being adversely affected by 1.36.  An updated 1.36 will be rolled to all sims tomorrow! (Schedule revised below)
    Update @ 2010-02-16 07:00: The even sim restart is about to begin.
    Update @ 2010-02-10 12:00: The Pilot regions have been rolled.

    Hi everyone, another server deploy is upon us!  The schedule for the rolling restart is thus:

    • 2010-02-10, 7a - 12p: Pilot Regions
    • 2010-02-16, 7a - 12p: Even numbered sims
    • 2010-02-17, 7a - 12p: All Sims

    Regions should be down no more than 30 minutes.  If yours is down longer than that, please leave a comment with the region name and we'll look into it.  This thread will be updated with the status of the deploy when it begins.

    Beta release notes can be found here.

    <hr/>

    Pilot region information

    Message was edited by: Lil Linden

    Hello and welcome to my first attempt to get the word out around some of the projects going on within the Global Technology group here at Linden, and more importantly, how that affects you - the resident. Allow me to introduce myself, I'm Les Linden, lead program manager for GTech coming to you from our Virginia Lab. You may have read FJ's latest blog describing some of the issues we've encountered while we continue to move our infrastructure onto a much more stable colocation platform. We've been planning this migration project for over a year, and we're now in the last critical weeks on our path to success. Unfortunately, as you may already know... we've run into a few bumps in the road. Most of these performance hits are expected, and posted on our Grid-Status blog. The unexpected blips, usually impacting logins, teleports and other in-world experiences are where I'd like to focus. For example, on Tuesday morning we had a load balancer failure that affected logins and concurrency. This was an unexpected hardware failure. I'm working with various teams to ensure that we're not only proactive about these outages via the Grid Status Blog, but also retroactively communicating grid issues and resolution plans. I want to make sure everyone understands that while the team is moving as quickly as possible to hit our goals, we're also being very careful. We have (and will continue to) sacrifice our schedule when we think we may be making a change that impacts the grid. That said, these "blips" that were not encountered during testing are being identified and fixed as quickly as possible. This project is very complex with a lot of moving parts. That's not an excuse... it's more of an ask for understanding and patience on behalf of the team as we move our infrastructure onto bigger, faster, more capable hardware.

    So, what are we doing you may be wondering...Well, in short, we're moving onto new shiny hardware and into new shiny colo space that will allow us to grow not only quickly, but smartly too! Here are some upcoming "blips" we're expecting, and blogging about...

    • Central Database Re-home to Dallas - Feb 8th, 5AM PST - Downtime 90mins
      • Logins will be disabled
      • In world residents will not be logged out, but will experience failures in
        • Teleporting
        • Mapping
        • Inventory
        • Linden Dollar Transactions
    • Inventory DB Migration - Feb 11th, 7AM PST - Downtime 30mins
      • Residents localized to the affected DB will be logged off and will not be able to log back in during the maintenance window

    Our ability to prepare for the future of Second Life hinges on us making smart architectural decisions. The upgrades we've made on the network, data center and infrastructure side are allowing for just that. There are a few more weeks left in this project, with a few big pieces left to move (central database move is happening Monday!). So please keep an eye on the Grid Status blog(s) for updates on how you may be impacted. I'll do my best to update this blog often with our plans and progress. Thanks in advance for your understanding during what is an awesome transition phase for Linden. Here's to an incredible 2010!!!

    Coming Soon: Viewer 2 Public Beta

    by Honored Resident Howard Linden on ‎01-21-2010 02:35 PM

    Happy New Year! As you may have noticed, it's 2010, and we haven't yet shipped the project we referred to last year as "Viewer 2009". We made the decision to delay for a very good reason: It wasn't up to our standards of quality and user experience, and we simply needed more time to make it better.

    A project the size of Viewer 2 (as it's now called) is a new thing for us at the Lab. We've not done much at this scale before and we've made some substantial changes to how we work in order to pull it off. Here's a bit of the back story on what we've been up to and what's coming soon.

    Focus on the New User and First Hour Experience

    Here's a sobering statistic: Over 50% of new Residents who register and download the Second Life viewer log in once and never come back a second time. We've made it way too hard for a new user to absorb all the wonderfulness that is Second Life.

    With Viewer 2, our revamped web site, a new Orientation Island and much more, we've taken a step back and tried to create an end-to-end experience that will be much more compelling and relevant for a new Resident. There's still more to do, but we believe we've made a pretty dramatic step forward.

    But what about existing Residents?

    When Viewer 2 ships, some current Residents will find it frustrating. While we have kept almost all existing functionality, the UI has changed dramatically. It looks different. Menus have moved. There are new ways of interacting and communicating. If you've come to know and love the existing user interface, it may be a challenging transition.

    Other Residents will embrace the new UI. It is more consistent and discoverable. The look is more polished and professional. Our UI team, along with our outside design partners 80/20 Studio took a fresh look at the entire interface, with the goal of making Second Life more immediately relevant and compelling. We worked very closely with a core set of Residents to make sure that we were making design choices that would enhance the experience. Sometimes we got it exactly right and sometimes they sent us back to the drawing board. We also did several rounds of formal usability testing with both new users and existing Residents. We took all of this feedback to heart and it helped us make Viewer 2 even better.

    The current viewer, version 1.23, will not go away any time soon. In the earliest case, following the policy T described here, 1.23 will be around until 30 days after we ship Viewer 2.1 (our first update to the new viewer), about a quarter after delivery of Viewer 2. Even then, there will continue to be open source viewers based on the 1.23 code base available for users who wish to continue using them.

    Open Source is a Big Deal

    Like previous viewers, Viewer 2 will be released to open source. We've got a bit of work to do to rebase Snowglobe off of the Viewer 2 code base and merge in Snowglobe changes, but we're on it.

    Open Source is a big deal for us. What the adoption of open source viewers has made clear is that many Residents want a high-end "power user" experience. We know we can't do it all. We can't at the same time work towards expanding our user base to a broader, consumer market and address all of the needs of our high-end Residents. We need help from our robust, passionate developer community to develop "power user" and more niche market viewers.

    Of course, there are lots of other great reasons for our Open Source efforts, but they are beyond the scope of this note. For example, we're working with the Internet Engineering Task Force (IETF) to establish broad standards for virtual worlds, starting with formal definitions of the region and agent domains, a.k.a. VWRAP. Open Source is a key tenet of this work.

    We've stepped up our Open Source efforts and will continue to support the development community as best we can. For example, we've been meeting regularly with the Emerald team and are supportive of their development of a viewer that meets the needs of many long-time Residents.

    We're also keenly aware that a few bad actors are using open source viewer code to create viewers that enable functionality that violates our terms of service or enables intellectual property theft. As Cyn describes here, with our Third Party Viewer Policy our intent is to list viewers that comply with new guidelines and policies so that Residents can make informed choices about which viewer to use. At the same time, we will be taking aggressive action against developers that don't comply with these guidelines.

    New Ways of Working == Better Software

    If you're a software development geek, you might be interested to know that we're in the thick of a substantial transition in how we build software. Most of our teams have moved to a Scrum-inspired model, keeping iterations short, quality high and working off of a clearly prioritized product backlog.

    We've also built a new test automation team, and are building an automated test suite to help catch performance and functional regressions before they make it to Residents. Our automated "Crash Reporter" helps us find and fix all but the most obscure crashers. If we've done our jobs well (and I think we have), Viewer 2 should be the most stable SL Viewer ever. We've got lots more to do and you'll continue to see even more stable software, more regular updates, and fewer regressions. In other words, our goal is to deliver software that Just Works.

    Coming Soon: Viewer 2 Public Beta

    We're still putting the finishing touches on Viewer 2 and will be pulling the covers off very soon -- we hope in February. Watch this blog for an announcement. Of course, Viewer 2 is just the beginning, not the end. There is much more to do. Our plan is for Viewers 2.1 and 2.2 to follow along in much shorter order than it took to get Viewer 2 out the door. We're currently targeting quarterly high-quality releases, and with the changes we've made to how we build software, we've got a very good shot of achieving that goal.

    Did we get it all right with Viewer 2? No. Is it a substantial leap forward in the discoverability and usability of our Viewer? Yes, I think it is. And I hope you will too. Stay tuned...

    Search: Upgrade Update!

    by Linden on ‎01-20-2010 03:03 PM

    Update: We are creating a new timeline for the Search upgrade announced here. The colo and server changes announced here made it prudent for Search to wait for the metaphorical dust to settle. We're taking advantage of the extra time to hone our algorithms and increase the size of our document base. Thanks for your patience.

    ______________________________________________

    Update: We rolled back part of our Search release today after some unexpected complications with our server upgrade. The Self Identified Scripted Agent change was unaffected. We hope to have more information soon on a revised timeline for the Search upgrade.

    ______________________________________________

    Please note, we have started the Search upgrade release announced here and expect it to finish in approximately 24 hours. You may see some variability in results during deployment.

    In the release, we included the removal of Self Identified Scripted Agents from traffic scores. Starting tomorrow, you should see more accurate traffic scores reflected in Search. Read more and see the Search Guidelines. Remember to register your scripted agents!

    We'll update again once the release is complete!

    Search: Upcoming Release - Update

    by Honored Resident Liana Linden on ‎01-14-2010 02:42 PM

    Please note, the Search upgrade announced here has been scheduled for Wednesday, January 20th.

    Also on the 20th, we will be releasing code that excludes Self Identified Scripted Agents from the traffic score. Scores visible on January 21st and onwards will reflect the revised calculation.

    Scripted agents, aka bots, can be useful tools, but they can also unfairly influence Search results. For further policy information, see the Search Guidelines and previous discussions here and here. Many thanks to those who have already registered their scripted agents! And, to those who haven't, please register your scripted agents now. Cheers.

    Diary of a Paranoid Mysql Upgrade

    by Recognized Resident Charity Linden on ‎01-11-2010 01:28 PM

    At 6 am on January 6th, our central database was upgraded from mysql 4.1 to 5.0.  YAY!!  This was a LONG time coming, and we learned a lot in the process.

    This was not our first shot at an upgrade.  We first tried to upgrade way back in November of 2007, but it turned out that 5.0 was just not fast enough, despite what sysbench and our other benchmarks had indicated.  After two or three long, wretched days of cascading downtimes, degraded services, and intermittent data loss, we gave up and rolled back to 4.1, all thoroughly traumatized by the experience.

    This is the story of our successful second attempt, and all the things we learned and checked and verified in order to make it successful.


    So our central db had gotten stuck running 4.1, even as all its slaves and all our inventory clusters and all the other 100+ Linden db hosts graduated to 5.0.51a-lenny.  This only got more embarrassing with time.  In 2009 we decided to revisit the problem.  But this time, we needed to know for damn sure that our shiny new db was as fast or faster than the older one.  Not faster in some abstract sense, but verifiably faster *with our query set*.  We needed some way to capture and replay our actual production traffic, using hundreds of concurrent clients, executing actual sequences of queries in order, over and over again.

    Unfortunately, nothing like this existed.  So we wrote our own tool.  We -- Mark Lentczner, Aaron Racine, and myself (Charity Majors)  -- wrote a distributed load testing framework with mysql protocol support, using python and RabbitMQ, which lets us capture hours or days worth of production traffic and replay it over and over on a db snapshot.  (We're actively working on open sourcing this softwarel, so stay tuned!)

    We also needed to know that our data would be sound, that none of the SQL syntax changes or order-dependent execution changes between major versions would bite us, and that we weren't going to run into any replication bugs.  For this we used the maatkit toolset.

    System info

    Our central database master, mysql.agni, is currently running on an 8-core Xeon E5450 with 64 gigs of RAM and a 400-gig attached DAS using RAID 10.  The db itself is about 250 gigs.  It has about twenty slaves total, mostly chained off a few relay-only slaves.  All of the slaves have been running 5.0 for a long time now; only the master and a fallback host have been stuck running 4.1.

    Traffic on the master ranges from 2500-4500 QPS over the course of the day, with 200-450 concurrent threads.  It executes just over 100 million queries per day.  Most of the read load goes to the slaves, which are load-balanced using haproxy.  The remaining traffic to mysql.agni is approx 7/1 read/write.


    Mysql benchmarking and tuning

    • The first thing we did was benchmark our traffic on 4.1.11 versus 5.0.51.  This is the version we had previously tried to upgrade to, and we were dying to know just how much slower it actually was.  And for our query set, 5.0.51 was *30%* slower.  Holy smokes! 
    • After talking it over with Percona, we decided our target version would be mysql 5.0.84.  We benchmarked the vanilla 5.0.84 build, and found it was just about exactly as fast as 4.1.11.  So obviously a lot of the 5.0 speed issues have been addressed since 5.0.51.
    • Next we benchmarked 5.0.84 with the Percona/Google patchset.  It wasn't any faster, to our surprise.  In fact, on our production hardware, 5.0.84-percona was *slower* than 4.1.11 (9-10k QPS peak vs 11-13k QPS).  After much fiddling around, and back-and-forth with their devs, we figured out this was because our I/O was completely pegged.  Fortunately this is where the Percona patches really shine, because they allow you to shift work from the I/O subsystem to the CPU.  We added these lines to our my.cnf to take advantage of I/O parallelization:

    innodb_read_io_threads = 8  # number of CPUs
    innodb_write_io_threads = 8
    innodb_io_capacity=1000
    innodb_read_ahead=none
    innodb_adaptive_checkpoint=1

    • We also switched the I/O scheduler from cfq to deadline.  Just switching the scheduler gave us a 15% performance boost  -- CFQ is *terrible* for databases.  We saw identical results for deadline or noop.
    • We also tested running with binlogs on a separate block device.  This is something we've actually been doing for a long time -- mysql data lives on the DAS, binlogs on the main disk -- but we were curious just how much it actually buys us.  Answer: about 10%.
    • Interestingly, the db warms up much faster with binlogs on a separate physical device -- 10-15 minutes instead of 35-40 minutes.  We saw some crazy long warmup times under mysql 5.0, often twice as long as 4.1 under identical circumstances.  After some experimentation, it seems like the long warmup times are linked to a higher sensitivity to thread concurrency in mysql5.  Higher thread counts eat up significantly more disk I/O overhead over a certain threshold (about 300 concurrent threads on production db hardware) compared to 4.1.  The 5.0 db actually warms up just as fast as 4.1 does if we streamline the I/O bottlenecks, by using fewer client threads (250-280 is a sweet spot) and/or by placing binlogs on a separate block device.

    After changing the I/O scheduler and innodb config, we finally achieved database nirvana -- 14-16k QPS peak concurrency, with only 80% I/O used.  In terms of raw query throughput, 4.1.11 topped out at 8200 queries/sec, while 5.0.84-percona executed 11,500 queries/sec.  Yay!

    Verifying the code

    This fell into two categories: making sure that our code would actually run under 5.0 and making sure it ran well.  The last thing we wanted was to find out that some dinky little query was optimized for 4.1 in a way that would suddenly start killing us under 5.0.  We also wanted to verify that replication worked the same way for 5.0->5.0 as it did for 4.1->5.0, since we know there were some order-dependency changes.

    • For query syntax checking, we ran mk-upgrade on 24 hours worth of production query logs.  Mk-upgrade is amazing -- it executes each query against 4.1 and 5.0 snapshots simultaneously, and reports any warnings or errors or differences in the results, including if a query took substantially longer to execute on one or the other.  It has a few limitations (it doesn't like queries that use the same column name twice, e.g. c.name and p.name, and you get checksum errors when the data type changes), but I was able to manually verify the queries that broke without any hassle.
    • We also hooked up our in-house query profiler tool to the 4.1 and 5.0 test instances, and verified that query execution times were in the same ballpark.
    • We also used the maatkit tools to verify replication integrity.  I hooked up a 5.0.84 slave to a 4.1 master, replayed 24 hours worth of logs, and ran mk-table-checksum/mk-table-sync on the whole thing as a consistency check.  Unfortunately, the consistency check showed that the slave and master had slightly inconsistent data after only a few hours.  Doubly-unfortunately, these inconsistencies were not unique to the test scenario; we also see them in replication between 4.1->4.1, 4.1->5.0, and 5.0->5.0.  This is probably a topic for another blog post; for now, suffice it to say, I verified that the upgrade was not going to introduce any *new* bugs, and left it alone.
    • Also, note that internally-assigned mysql data types changed a lot between mysql 4.1 and 5.0.  This was not such a huge deal to us, since we long ago converted our dev grids and dev stations to 5.0.  But back when we did *that*, this was a big pain in the ass.

    Upgrade plan

    The migration was performed early Wednesday morning by Ben O'Connor, Ben Hartshorne, and Landon McDowell.  Around 5:15 am they set the 4.1 master to read-only, switched DNS, and repointed the relay-only slaves to the 5.0 instance.  There was no actual downtime, but mysql.agni was read-only for about 45 minutes, from 5:15 to 6:00.  This caused services such as mapping, inventory, teleport, and LindeX to fail, as expected.  Full functionality was restored by 6 am.

    We also had a fallback plan in place, in case we needed to (god forbid) fail back to 4.1 again.  You can't hook up a 4.1 slave to 5.0 master due to binlog incompatibilities, so I wrote a gnarly little script that ran on the master and flushed the binlogs every 10 minutes, then scp'd the most recently flushed binlog over to the 4.1 "slave" and replayed the binlog there.  (We couldn't replay it over the network, it was slower by a factor of 20 than replaying from localhost, and couldn't keep up with the write load.  Thus the ssh hack.)

    Aftermath

    So, in the end, nothing terrible happened.  (At least not yet.)  We missed one query with a SQL syntax change (the left/internal join syntax change), and an internal mysql concurrency metric broke due to field name changes in 5.0, but that seems to be it.  This was perhaps the most overprepared-for upgrade in the history of Linden Lab, but in the end it was worth it.  We got some great new load-testing tools built, and learned a lot about innodb and mysql performance tuning.  And mysql4 is now DEAD TO US!!


    Huge thanks to Percona for all their help and support, and for debianizing our mysql builds.