Audio for VR: Binaural Recordings and Ambisonics for Totally Immersive Experiences

Even just two years ago, audio geared for VR was a hit or miss situation of high complexity for those trying to produce a true 360 audio experience for their projects. Even cutting-edge researchers such as eleVR in San Francisco (hands-down, our favorite non-commercial research group dedicated to VR) were just starting to play around with the possibilities of what method would work best.

Initially, a format called binaural audio was thought to be a good solution because it produces a pseudo-3D feel for sound, and it seemed simple to deal with from a technical perspective. We consider binaural audioIt’s “pseudo” because it isn’t a true 3D audio format, and relies on simply on recording sounds with two microphones (one facing Right, one facing Left) in order to mimic a human’s head and pair of ears (Fig. 1). This is done to make it seem like sounds are “coming from within (a listener’s) head”.

 

Fig. 1

800px-dummyhead

One would think that mimicking the R/L stereo format into we all know so well from headphones would provide a sufficient audio experience, but the problem is that of stereo’s inherent “flatness” in sonic dimension becomes magnified when you become part of a 3D virtual world.

Imagine this: As you’re walking around town with your earbuds in, and you are listening to some music, or a podcast (or, better yet, one of John Cage’s Roaratorio tracks!), you’ll experience what’s coming into your ears as a kind of “flat plane” in your mind—rather than sound being integrated into the surrounding (physical) space and possessing dimension. Try it, but THINK about how the sound feels, spatially, as you do…

Even if you captured a high-quality stereo recording in LA’s Union Station, with all of the wonderful hubbub of people passing through, the menagerie of mechanical noises floating around, as well as the kid jamming Rachmaninoff on the grand piano in the main waiting room, and you listened to your recording on your headphones immediately after (while still standing in the station), the experience would not have the same spatial richness that the actual environment provides. This may seem an obvious point, but it is an obvious point with serious implications when trying to create 3D audio for the experience of VR.

Though binaural audio is ideal for VR in the sense that one needs to direct sounds to both ears, binaural sound recordings aren’t going to cut it because, well, as discussed, they just aren’t as robust, and our minds aren’t that easily fooled. But, fear not, as the solution for those of you (especially) hoping to shoot live-action VR and record live sound resides in…doubling the mics, and placing them into a weird shape called a “tetrahedra” (Fig. 2).

 

Fig. 2

screen-shot-2014-08-30-at-5.37.47-pm1

You can capture a richer, “spatialized” (and VR-approved) sound by using a Tetrahedral mic array (which is fairly inexpensive, or you can even construct one yourself using four mics and some rubber bands! Well, almost…) Why is this a better format than the usual R/L binaural mic setup? Because it records sound in a spherical way, covering not only the R and L directions, but also the Top and Bottom ones as well.

With regards to compositing sounds in constructed 3D worlds, it is also necessary to mold your audio in a way that provides a sense of spatial dimension to your created world, but we will get to that issue in a moment.

So, now what? Is it really as simple as collecting a bunch of sounds, or recording them with this Tetrahedral mic contraption, slapping all the audio into Audition and exporting it as an AIFF to place into your Unity environment? Nope.

Why? Because math.

What has been recently (re)discovered is that an old, commercially unviable, mathematically complex, audiophile-specific form of surround sound—developed during the hard-rocking 1970’s, called Ambisonics—is perfect for a 2010’s multimedia technology. It’s an audio technology that mathematically warps (or, sculpts, if you will) sound towards a center point (Fig. 3) so that the user in the middle of a “room” (IRL, or in VR) is completely immersed into the most “dimensionally” rich soundscape possible (s/he is accosted from above, below, and from all sides…in a good way).

 

Fig. 3

ambisoniclogo.svg

Ambisonics is, Long Math Story short (Fig. 4), a form of “full-sphere” surround sound which takes the four audio tracks from your Tetrahedral mic, and decodes them into a more normal, speaker-friendly format (aka, basic stereo).

 

Fig. 4 – Here is a visual model of the “Long Math Story” (mentioned above) showing a third-order distribution of sound that relies on modeling isotropic Lorentzian manifolds (themselves, a special subclass of pseudo-Riemannian manifolds) in three dimensions, which allows for tangentially-vectored signatures (versus Riemannian positive-definite metrics) to be classified into “timelike” or “spacelike” Causal structures…HUH??? Me neither, but it works great for shaping a more robust VR audio experience!

spherical_harmonics_deg3

As you can probably surmise, such a technology In Real Life would be kind of a pain to deal with because a person would always have to sit at the precise center of a room using ambisonic audio. But in Virtual Life, the user is always thecenter” of a VR environment (moving or still, as there is no other there, in “there”, for the computer), and the sounds can be programmed to be directed to that point at all times, with no pesky problems of Real Life getting in the way (i.e., like turning your head just a bit to the right or left and ending up “out of range” from the sweet spot).

It should be mentioned that all of this doesn’t just apply to live-action VR recording. Constructed 3D worlds also need “ambisonically massaged” audio as well. So, when taking both your individual diegetic (environmental, character voices, etc) and non-diegetic (mood music, voice over, etc) sounds, and conforming them to your VR environment, you must use the same mathematical methodology and apply them to your full set of audio files. Luckily, this can be accomplished with the same kinds of software and plug-ins (etc) used for live-action VR audio.

Now that you have a decent idea behind what it takes to create audio for VR, take a look at these helpful links to get you started. And, if you find others on your own that you feel are informative for future readers, please leave them in the Comments section below. It would be much appreciated!

 

Binaural Audio:

https://en.wikipedia.org/wiki/Binaural_recording

Ambisonics:

https://en.wikipedia.org/wiki/Ambisonics

http://www.ambisonic.net

http://ambisonics.iem.at/

Hardware (mics):

https://en-us.sennheiser.com/shape-the-future-of-audio-ambeo

http://core-sound.com/TetraMic/1.php

Software (for ambisonic audio integration for VR):

https://facebook360.fb.com/spatial-workstation/

 

Image citations/sources:

Top image: http://ambisonics.iem.at/

Fig 1: (Wikipedia) – By Gregory F. Maxwell <[email protected]> PGP:0xB0413BFA – By uploader, GFDL 1.2, https://commons.wikimedia.org/w/index.php?curid=154665

Fig 2: (eleVR) – http://elevr.com/audio-for-vr-film/

Fig 3: (Wikipedia) – By Source (WP:NFCC#4), Fair use, https://en.wikipedia.org/w/index.php?curid=41429748

Fig 4: (Wikipedia) – By Dr Franz Zotter <[email protected]> – Dr Franz Zotter <[email protected]>, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=30239736

Recent Posts