Jump to content

WebRTC Is Replacing Vivox For Voice Communication In SL


Recommended Posts

LL has been working on replacing Vivox with WebRTC, and I have some questions about it.

If you want to get caught up on the subject, here are some links:

SL Wiki

Inara Pey

Daniel Voyager

TL;DR - WebRTC is a technology that lets web browsers do peer to peer video and audio conferencing, without needing to install plugins.  It can be used for other things.  I know some sites used to use it to play regular video, and there's a web based bit torrent client that uses it.  It's mainly known for teleconferencing, though.  SL wants to upgrade from the obsolete Vivox system to WebRTC, and they are testing some implementations for doing that right now.  We don't have all the details yet, but we know these things, so far:  Regions that run WebRTC won't run Vivox, and vice-versa.  They say the sample rate will be better, it will have built in gain control, and noise cancellation, and there will be better security, because LL will be running the streams through their own firewalls, to hide people's IP addresses.  I think I saw on a Twitter thread that it's limited to 50 avatars in a region, which is a lot in SL, right now.  They will probably increase that, eventually.  One really nice change is that there will be no separate voice.exe program to run your voice, which means one less thing to white list on your antivirus software.

My first reaction to this was... isn't WebRTC a security liability?  I have a plugin for my browser that blocks WebRTC, unless I want to run it, because it can expose your IP address, even through a VPN tunnel (similar to how Shoutcast exposes it).  This is not a bug that WebRTC intends to patch.  They say it is foundational to the technology.

When I read that LL is working on preventing that kind of leakage with a firewall, though, I was relieved.  I have no problem running their WebRTC, if the security concerns are addressed.

My second thought is:  Can we use this for live music?  What's the sample rate like?  I mean, I know it's better than Vivox, but Vivox's sample rate is somewhere between 4khz and 48khz, so that's a very low bar.  If the WebRTC sample rate beats Icecast, and similar shoutcast stream provicers, then we can ditch shoutcast and just use voice to run concerts, in real time, without the annoying 30 second lag, and without sharing my IP address with everybody in the club!  I wonder, if the latency is low enough, maybe this would allow us to do sing alongs, in real time?  Imagine a crowded SL club singing Do-Wah-Diddy-Diddy together.  That would just be fun.  Or imagine Rocky Horror, with real time, live callbacks!

Just to set the bar for this:  Icecast streams at 256khz.  It may be possible to perform at lower sample rates, like 128khz, I don't know, I haven't tried.

One other concern:  WebRTC has built in noise cancellation.  That feature may be an issue for this as a performance medium.  No musician wants human voice tuned noise cancellation to muffle their guitar, or piano, when their instrument makes sounds outside the normal human vocal range.  That's why I buy expensive microphones, like the SM-57.  It won't distort my guitar with noise cancellation, so all distortions in my performance are deliberate distortions, that I put there, using my software, and my audio interface.

 

Edited by Bubblesort Triskaidekaphobia
Fixed some awkward wording and inserted a link to WebRTC, at the start of the post, for clarity
  • Like 1
Link to comment
Share on other sites

2 hours ago, Bubblesort Triskaidekaphobia said:

limited to 50 avatars in a region

It supports 100 in a region, 50 in ad-hoc calls.

2 hours ago, Bubblesort Triskaidekaphobia said:

My first reaction to this was... isn't WebRTC a security liability?  I have a plugin for my browser that blocks WebRTC, unless I want to run it, because it can expose your IP address, even through a VPN tunnel (similar to how Shoutcast exposes it).  This is not a bug that WebRTC intends to patch.  They say it is foundational to the technology.

This isn't really relevant for us because the IP would only be exposed to LL. (Unless you're worried about an attack coming from LL, but that's a whole other discussion.)

Concerts happening through voice seems unlikely.

  • Thanks 1
Link to comment
Share on other sites

"Firewall"? "The IP would only be exposed to LL"? Clearly this won't be simple peer-to-peer but I'm not grokking the architecture here at all.

[EDIT: Okay, I finally read the SL Wiki article so I see the "relay" of the SL Voice Servers… which must be pretty clever beasts if listeners will be able to mix/mute separate audio sources within the region, which is kinda expected, right?]

But then I don't know anything about how Vivox works now either, and strive to keep all SL audio muted as much as possible. Still… internet sing-alongs? That's gotta need some synchronization magic beyond my ken to not end up sounding like an echoey Zoom call.

Edited by Qie Niangao
  • Like 2
Link to comment
Share on other sites

  • Lindens
On 3/25/2024 at 11:17 PM, Bubblesort Triskaidekaphobia said:

My first reaction to this was... isn't WebRTC a security liability?  I have a plugin for my browser that blocks WebRTC, unless I want to run it, because it can expose your IP address, even through a VPN tunnel (similar to how Shoutcast exposes it).  This is not a bug that WebRTC intends to patch.  They say it is foundational to the technology.

When I read that LL is working on preventing that kind of leakage with a firewall, though, I was relieved.  I have no problem running their WebRTC, if the security concerns are addressed.

For Second Life's use of WebRTC voice, all connections will be made between the viewer and Second Life servers, even for single-user direct voice calls. The only party that voice services WebRTC shares your IP with is Linden Lab.

This is different from the existing voice solution, which uses peer-to-peer connections for single-user direct voice calls, enabling IP discovery if a user accepts a single-user voice call.

  • Like 2
  • Thanks 5
Link to comment
Share on other sites

Hi all, sorry for the delay in commenting on this (I've been heads-down tying up loose ends on this feature.)

(just confirming some of which was said above and hopefully expanding on it.)

As far as security goes, we are not using WebRTC to connect client-to-client, even in when calling a single peer.  We bounce all connections off of a webrtc server (a WebRTC MCU - see more info on that here.)  Because we bounce everything off of our MCUs, IP addresses are not leaked.  Additionally, we have more control over authentication, authorization, and encryption.  All attempts to to connect to a P2P, conference, Group, or Spatial voice channel are authenticated by the simulators involved.  When users are kicked, they cannot reconnect to the same channel as the simulator will deny them, hence the voice server won't allow them to connect.  WebRTC itself is encrypted with SRTP/SCTP.

As far as audio quality, we're running at a 48khz sample rate overall (mono to the server, stereo from the server), which is an improvement over Vivox for the most part.  Running at a higher rate than that is pretty strenuous for the server, which potentially means fewer participants can take part.  Bandwidth for those running on lower quality networks is also a concern.  Still, there is more we can do, and because we have access to the open source code involved, and we're using our own code, we can make changes based on the needs of the community (unlike the old system.)

If you do want to try performing using WebRTC...

We've done what we can to address latency, but the simple delay in sending data down to the server and back up does add some.  Depending on network conditions, we're talking 100ms-200ms perhaps.  That can be a challenge if you want a 'tight' sound, but sing-alongs might be a possibility.  Musicians who want to play with one another could use direct p2p links through some other channel as their 'monitor' source, then stream down to SL, which may reduce perceived latency.  You'll have to experiment.

WebRTC does have pretty decent noise cancellation, echo cancellation, and automatic gain control which should improve things overall for most cases, but for performers, those things are not desired, so I'd like to give some user control over those things.  The google WebRTC core code gives pretty fine-grained control over those features.  Performers are important, so we'll take their needs into account in the future.  By all means, drop us a note at https://feedback.secondlife.com/webrtc-voice so we can track your needs.

As far as avatar limits, right now I'm clamping spatial voice at 100 participants, but that may change as we evaluate performance of the servers further (hopefully for the better.)  Group/conference sessions are currently capped at 50 participants in WebRTC, but again that may change.  We also have some other limitations - you can only hear the 8 or so others who are loudest to you, depending on position and signal strength and such.  Hopefully, you won't have 8 people yelling into your ears when you're at a concert.  We may tweak that number depending on server performance and community needs.

Feel free to ask additional questions.

 

  • Like 3
  • Thanks 3
Link to comment
Share on other sites

1 hour ago, Roxie Linden said:

Depending on network conditions, we're talking 100ms-200ms perhaps.

Are you considering to put some of those WebRTC MCUs in differently geolocated AWS regions? 

Like elect a preferred MCU based on the attending crowds geolocation to minimize average latency for them?

  • Like 1
Link to comment
Share on other sites

7 hours ago, Kathrine Jansma said:

Are you considering to put some of those WebRTC MCUs in differently geolocated AWS regions? 

Like elect a preferred MCU based on the attending crowds geolocation to minimize average latency for them?

That's certainly an interesting idea.  There are all sorts of things that would need to be sorted out, of course.  What to do when the mix of attendees changes?  Is geolocation sufficient?  Would the improved latency be sufficient to make much of a difference for practical purposes.

I'll think on it.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

At least for privately owned regions it would be nice to pick if the owner could select a preferred location in Europe, Singapore, US, etc. that handles voice for that region.

That could make a world of difference in latency for regions on the other side of the Atlantic or Pacific.

Like a European user typically has 100-150ms latency to AWS East, so bouncing traffic of a MCU there would get a minimum of 200-300ms, which is noticeable. But putting it to e.g. Amsterdam or Frankfurt would slash that to 20-30ms. 

  • Like 2
  • Thanks 2
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...