Project Background
The spatial quality of audio content delivery systems is becoming increasingly important as service providers attempt to
deliver enhanced experiences of spatial immersion and naturalness in audio-visual applications. Examples are virtual reality,
telepresence, home cinema, games and communications products. At the
low end of the spatial quality range mobile and
telecoms companies are increasingly interested in the spatial aspect of product sound quality. Here simple stereophony over
two loudspeakers, or headphones connected to a PDA/mobile phone/MP3 player, is increasingly typical. Binaural spatial
audio is soon to become a common feature in mobile devices. There is also substantial research at the
high end of spatial
audio content authoring, coding and delivery, incorporating MPEG-4 scene encoding and wavefield synthesis rendering
techniques involving hundreds of loudspeakers. In the
middle range, home cinema involving 5.1-channel surround sound is
one of the largest growth areas in consumer electronics, bringing enhanced spatial sound quality into a large number of
homes. Home computer systems are increasingly equipped with surround sound replay and recent multimedia players
incorporate multichannel surround sound streaming capabilities, for example.
The research trend is increasingly towards separating the rendering format from the method of coding/representation. This
suits scalable coding environments involving multiple data rate delivery mechanisms (e.g. digital broadcasting, internet, mobile
comms) and enables spatial audio content to be authored once but replayed in many different forms. The range of spatial
qualities that may be delivered to the listener will therefore be wide and severe compromises in spatial quality may be encountered,
particularly under the most band-limited delivery conditions or with basic rendering devices. (Recent encoding standards
such as MPEG-4, for example, incorporate scalable, parametric and scene-based coding modes for delivering spatial
audio content over media with a wide range of data bandwidths, ranging from high-rate wired links and physical media to
mobile and internet communications where delivery bandwidth is highly restricted.) Encoding and rendering processes can
lead to spatial quality degradations including the following: changes in source-related attributes such as perceived location,
width, distance and stability; changes in environment-related attributes such as envelopment, spaciousness and width. Under
conditions of extreme restriction, major changes in spatial resolution or dimensionality may be experienced (e.g. when
downmixing from many loudspeaker channels to two). These lead to a reduction in overall
spatial fidelity. Recent experiments
involving multivariate analysis of audio quality show that in home entertainment applications spatial quality accounts
for a significant proportion of the overall quality (typically as much as 30%).
The important research question that arises in relation to future spatial audio delivery systems is how to evaluate the spatial
audio quality. A possible answer is to do it by means of formal subjective tests, but these are time consuming and expensive.
Therefore it would be beneficial to employ an algorithm that could predict spatial quality on the basis of measured comparisons
between a reference reproduction and one that may have been impaired by coding or other forms of audio processing.
Such a system has not been developed yet. The current model for evaluating perceived audio quality (ITU-R BS.1387) does
not currently take into account the contribution of spatial quality to the overall user experience, concentrating on coding distortion,
noise and bandwidth degradations. It only considers monophonic signal characteristics, involving a relatively simple
weighting process to combine the results from left and right channels of a stereophonic signal. It does not allow for multichannel
audio signals or more sophisticated spatial rendering formats where the differences in spatial quality between reference
and
impaired versions could be considerable, despite highly similar signal characteristics. The research community has
begun to recognise this problem in evaluating spatial audio coding systems, and has become more interested in developing
algorithms that evaluate spatial factors.