A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments
Start date: 2007
End date: 2010
Reverberation continues to present a major problem for sound source separation algorithms, due to its corruption of many of the acoustical cues on which these algorithms rely. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. This project therefore considered the research question: can the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation be improved?
The precedence effect is a perceptual mechanism that aids our ability to localise sounds in reverberant environments. Despite this, relatively little work has been done on incorporating the precedence effect into automated sound source separation. Consequently, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm. The algorithm included a precedence model, which was replaced with the other precedence models during the investigation. The models were tested using a novel metric in a range of reverberant rooms and with a range of other mixture parameters. The metric, termed Ideal Binary Mask Ratio, is shown to be robust to the effects of reverberation and facilitates meaningful and direct comparison between algorithms across different acoustic conditions.
Large differences between the performances of the models were observed. The results showed that a separation algorithm incorporating a model based on interaural coherence produces the greatest performance gain over the baseline algorithm. The results from the study also indicated that it may be necessary to adapt the precedence model to the acoustic conditions in which the model is utilised. This effect is analogous to the perceptual Clifton effect, which is a dynamic component of the precedence effect that appears to adapt precedence to a given acoustic environment in order to maximise its effectiveness. However, no work has been carried out on adapting a precedence model to the acoustic conditions under test. Specifically, although the necessity for such a component has been suggested in the literature, neither its necessity nor benefit has been formally validated. Consequently, a further study was conducted in which parameters of each of the previously compared precedence models were varied in each room in order to identify if, and to what extent, the separation performance varied with these parameters. The results showed that the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation can be improved and can yield significant gains in separation performance.
- Hummersone C, Mason RD, Brookes TS. (2013) 'A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments'. Journal of the Audio Engineering Society, 61 (7/8 (July/August)), pp. 508-520.
- Hummersone C, Mason R, Brookes T. (2011) 'Ideal Binary Mask Ratio: a novel metric for assessing binary-mask-based sound source separation algorithms'. IEEE Transactions on Audio, Speech and Language Processing, 19 (7), pp. 2039-2045.
Full text is available at: epubs.surrey.ac.uk/7195
- Hummersone C. (2011) 'A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments.', PhD Thesis, University of Surrey IoSR
Full text is available at: epubs.surrey.ac.uk/2923
- Hummersone C, Mason R, Brookes T. (2010) 'Dynamic precedence effect modeling for source separation in reverberant environments'. IEEE Transactions on Audio, Speech and Language Processing, 18 (7), pp. 1867-1871.
Full text is available at: epubs.surrey.ac.uk/2935
- Hummersone C, Mason R, Brookes T. (2010) 'A comparison of computational precedence models for source separation in reverberant environments'. Audio Engineering Society Audio Engineering Society Preprint, London, UK: 128th Audio Engineering Society Convention 7981
Full text is available at: epubs.surrey.ac.uk/2936
- Hummersone C, Mason R, Brookes T. (2010) A perceptually–inspired approach to machine sound source separation in real rooms. University of Surrey Postgraduate Research Conference
Full text is available at: epubs.surrey.ac.uk/7250