Multimedia Signal Processing Laboratory

P. Kabal

Paper Abstracts 1996


Q.-G. Liu, B. Champagne, and P. Kabal

"A Microphone Array Processing Technique for Speech Enhancement in a Reverberant Space", Speech Communication, vol. 18, no. 4, pp. 317-334, June 1996.
See also: Demonstration

In this paper, a new microphone array processing technique is proposed for blind dereverberation of speech signals affected by room acoustics. It is based on the separate processing of the minimum-phase and all-pass components of delay-steered multi-microphone signals. The minimum-phase components are processed in the cepstrum-domain, where spatial averaging followed by low-time filtering is applied. The all-pass components, which contain the source location information, are processed in the frequency-domain by performing spatial averaging and by retaining only the all-pass component of the resulting output. The underlying motivation for the new processor is to use spatio-temporal processing over a single set of synchronous speech segments from several microphones to reconstruct the source speech, such that it is applicable to practical time-variant acoustic environments. Simulated room impulse responses are used to evaluate the new processor and to compare it to a conventional beamformer. Significant improvements in array gain and important reductions in reverberation in listening tests are observed.

S. Valaee and P. Kabal

"The Optimal Focusing Subspace for Coherent Signal Subspace Method", IEEE Trans. Signal Processing, vol. 44, no. 3, pp. 752-756, March 1996.

In this paper, we introduce a technique to determine an optimal focusing frequency for the direction-of-arrival, estimation of wideband signals using the coherent signal-subspace processing method. We minimize the subspace fitting error to select an optimal focusing frequency. Direct optimization of this criterion can be computationally complex - the complexity increases with the number of frequency samples. An alternative technique is introduced that performs nearly as well as the optimal method. This suboptimal technique is based on minimizing a tight bound to the error. The computational complexity of the suboptimal method is independent of the number of frequency samples. The simulation results show that the proposed method reduces both the bias of estimation and the resolution threshold signal-to-noise ratio (SNR).

Conference papers

J. H. Y. Loo, W.-Y. Chan, and P. Kabal

"Classified Nonlinear Predictive Vector Quantization of Speech Spectral Parameters", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Atlanta, GA), pp. 761-764, May 1996.
See also: Demonstration

Nonlinear predictive split vector quantization (NPSVQ) and classified NPSVQ (CNPSVQ) are introduced to exploit the correlation among the speech spectral parameters from two adjacent analysis frames. By interleaving intraframe SVQ with forward predictive SVQ, error propagation is limited to at most one adjacent frame. At an overall bit rate of about 21 bits/frame, NPSVQ can provide similar coding quality as intraframe SVQ at 24 bits/frame. Voicing classification is used in CNPSVQ to obtain an additional average gain of 1 bit/frame for unvoiced frames. Therefore, an overall bit rate of 20 bits/frame is obtained for unvoiced frames. The particular form of nonlinear prediction we use incurs virtually no additional encoding computational complexity. We have verified our comparative performance results using subjective listening tests.

Paper titles.