Conversion for VR

In a virtual reality ("VR") environment, 3D video and binaural sound are reproduced via VR glasses with headphones. Head position and rotation are processed in real time. 360° videos can also contain binaural sound, but only head rotation is processed, not the head position.

Fig. 9: VR glasses (Samsung)

If binaural sound is to respond to head tracking, a dummy head cannot be used as the recording method since it allows only for one head angle. Instead, the following sound components are gathered separately and assembled:

  • "Audio object" with dry sound
  • Binaural filters: "HRTF" (+ Room: "BRIR")

Usually the audio object e.g. a character in a VR video game, is a single source with a certain distance and 3D direction. It consists of dry sound, which is then processed via binaural and room filters (="binauralized") depending on its 3D direction. This direction is determined by the position of the audio object and the position and head rotation of the listener within the VR scene.

The acoustical background signal of a scene, or "ambience/atmo", is a very special kind of audio source. It cannot be recorded dry, nor can it be mapped to a single point source. In principle it could be produced by the superposition of numerous audio sources in space, but often this would either be inefficient (e.g. trees in a forest) or impossible (live ambience from a venue).

Thus a group of several audio objects forming an array of virtual loudspeakers is used to reproduce a stereophonic recording of the ambience. These group of loudspeakers can be chosen from a 3D preset, for example the Dolby setup 5.1.4, or the Auro3D setup 9.1, in each case without a center loudspeaker. If no preset is available, one can define an equal-sided cube around the listener.

These audio objects are "diegetic", i.e. they do - exactly as their visual counterparts - not move in response to head rotation. This does imply that their incidence angle in relation to the head changes with head rotations and thus the HRTFs change. The eight signals of the ORTF-3D microphone are utilized in this way to build up an optimal 3D live ambience in the VR environment.

The use of a first-order Ambisonic microphone for this purpose cannot be recommended as described above. Being a small, coincident setup, its output lacks sufficient separation among channels, thus reducing the quality of its spatiality and 3D stereophonic imaging.

 Fig. 10: Screenshot from Unity: Virtual 8.0 loudspeaker setup to reproduce live recorded ambience within a binaural environment
  • 3D Audio

    The new approaches included in "3D Audio" reproduce sound from all spatial directions. This includes the Dolby Atmos and Auro3D stereophonic systems; binaural / virtual reality ("VR") systems; and soundfield synthesis approaches such as Ambisonics and wavefield synthesis systems. 3D Audio can give distinctly better spatial perceptions than 5.1. Not only is the elevation of sound sources reproduced, but noticeable improvements can also be achieved with regard to envelopment, naturalness, and accuracy of tone color. The listening area can also be greater; listeners can move more freely within the playback room without hearing the image collapse into the nearest loudspeaker.
  • 1
  • 32nd TEC Award: ORTF-3D

    32nd TEC Award: ORTF-3D

  • 1