Multimedia Signal Processing Laboratory

P. Kabal


Paper Abstracts 2002


Conference papers

Y. Qian and P. Kabal

"Wideband Speech Recovery from Narrowband Speech Using Classified Codebook Mapping", Proc. Australian Int. Conf. Speech Science, Technol. (Melbourne), pp. 106-111, Dec. 2002.

Speech sounds occupy 8 kHz of more of bandwidth. However, current public telephone networks limit the speech bandwidth to 300-3400 Hz. Telephone speech is characterized by thin and muffled sounds, and degraded speaker identification. We describe an algorithm which generates the missing highband components from the narrowband speech signal. The algorithm is based on three acoustic-phonetic classified narrowband-to-wideband linear prediction (LP) spectrum mapping codebooks to recover the missing highband spectrum. Subjective test show that the reconstructed wideband speech improves the speech quality. LP spectrum bandwidth expansion is used to avoid sharp spectral peaks. The mean SD (log-spectrum distortion) decreases by 0.93 dB, comparing to non-classified codebooks without LP bandwidth expansion.

J. Thiemann and P. Kabal

"Low Distortion Acoustic Noise Suppression Using a Perceptual Model for Speech Signals" Proc. IEEE Workshop Speech Coding (Tsukuba, Japan), pp. 172-174, Oct. 2002.

Algorithms for the suppression of acoustic noise in speech signals are generally Short-Time Spectral Amplitude (STSA) methods such as Spectral Subtraction. These methods have been effective at reducing or removing the background noise, but have a tendency (at low SNR) to add annoying artefacts, such as musical noise, and distortion of the speech signal. By employing an auditory model, psychoacoustic effects such as simultaneous masking can be used to apply spectral modification in a more effective manner, reducing the amount of overall modification necessary. In this way, the artefacts introduced by the processing are reduced. This paper proposes a method to significantly improve the reduction in the background acoustic noise in narrowband and wideband speech signals, even at low SNR. Here we show that the use of a subtraction strategy and psychoacoustic model originally intended for audio signals yields an output signal with little or no audible distortion.

T. Agarwal and P. Kabal

"Pre-Processing of Noisy Speech for Voice Coders", Proc. IEEE Workshop Speech Coding (Tsukuba, Japan), pp. 169-171, Oct. 2002.

Accurate Linear Prediction Coefficient (LPC) estimation is one of the key requirements for low bit-rate voice coding. Under harsh acoustic conditions, LPC estimation can become unreliable. This results in poor quality of encoded speech and introduces annoying artifacts.

This paper presents a two-branch speech enhancement preprocessing scheme for low bit-rate voice coders. This scheme consists of two parallel denoising blocks. One block will enhance the degraded speech for LPC estimation. Another block will increase the perceptual quality of the speech to be coded. The goal of this paper is to design the two-branch scheme. Test results show that the two-branch scheme can provide better perceptual quality compared to conventional one-branch speech enhancement techniques in noisy environments.

P. J. Smith, P. Kabal, and R. Rabipour

"Speaker Selection for Tandem-Free Operation VoIP Conference Bridges", Proc. IEEE Workshop Speech Coding (Tsukuba, Japan), pp. 120-122, Oct. 2002.

A conventional Voice-over-Internet Protocol (VoIP) conference bridge reduces the speech quality due to tandeming the mixed multi-speaker signal with high compression speech codecs. One solution is to select and forward the compressed signal(s) to the endpoints, where they are decoded and mixed. In such arrangements, speaker selection is usually accomplished with an order-based approach which prevents listeners from interrupting the current speaker(s). This paper presents an alternative in which talking privileges are assigned based on order of activity and signal power. Subjective evaluations indicate that speaker switching is smooth, nearly transparent, and unanimously preferred over a VoIP conference with tandemed connections.

A. M. Wyglinski, P. Kabal, and F. Labeau

"Adaptive Filterbank Multicarrier Wireless Systems for Indoor Environments", Proc. IEEE Vehicular Technol. Conf. (Vancouver, BC), pp. 336-340, Sept. 2002.

We investigate the use of modulated filterbanks in adaptive wireless multicarrier systems operating in indoor environments. The motivation for using filterbanks, as opposed to OFDM, is that these spectrally selective modulation filters can decrease the amount of interchannel interference without using lengthy cyclic prefixes which are necessary in OFDM. The design of the synthesis and analysis filterbanks, based on a single lowpass prototype filter, is presented. Furthermore, optimal subcarrier MMSE equalization, adaptive bit and power loading, as well as null subcarrier placement techniques are employed to enhance system throughput and bit error rate performance when operating in frequency selective channels. The performance of this system is studied and compared with an IEEE 802.11a-compliant system, which is based upon OFDM modulation.

Y. Qian and P. Kabal

"Pseudo-wideband Speech Reconstruction from Telephone Speech", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 524-527, June 2002.

The bandwidth of telephone speech is limited to a 300-3400 Hz bandwidth. The sound quality is much lower than for broadcast radio and audio compact discs. We present an algorithm to regenerate the missing highband components (3.4-7 kHz). The highband spectrum recovery is based on a Line Spectrum Frequency (LSF) VQ codebook mapping from the narrowband speech to the high frequency components. The highband excitation employs a substitute of a bandpass (2-3 kHz) envelope modulated Gaussian White Noise. The modulation gain of the excitation exploits a lowband LSF VQ mapping codebook. Spectrograms demonstrate that the reproduced speech has obtained most of missing components. Subjective tests show that the reconstructed speech is significantly more pleasant and natural than the conventional telephone speech.

J. Thiemann and P. Kabal

"Noise Suppression using a Perceptual Model for Wideband Speech Signals", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 516-519, June 2002.

Traditional algorithms for suppressing background noise in speech signals can add annoying artefacts to the resulting denoised signal. In applications requiring better than toll quality, it is desirable that noise suppression should not add any audible artefacts. This paper describes a method that is effective for narrowband and applies these methods to wideband signals. The method presented uses a high-resolution psychoacoustic model originally developed for the evaluation of audio quality, and combines it with a method originally developed for audio signal enhancement. It is shown that while the method works well in narrowband applications, in wideband signals the quality needs to be improved.

W. Pereira and P. Kabal

"Improved Spectral Tracking Using Interpolated Linear Prediction Parameters", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Orlando, FL), pp. I-261-I-264, May 2002.

Conventional interpolation of Linear Prediction (LP) parameters can provide a poor spectral match to the underlying speech signal for the intermediate subframes. In this paper, we present a method of modifying the interpolation endpoint LP parameters to improve the spectral tracking over all subframes. This 'warping' algorithm is based on minimizing the distortion between the interpolated LP parameters and those computed directly from the speech signal. The algorithm has been integrated into the Adaptive Multi Rate (AMR) speech codec. Our results show that this method enhances coder performance by smoothing out the LP parameter tracks and reducing coding distortion. on smoothing the quantized LP parameters.

M. Klein and P. Kabal

"Signal Subspace Speech Enhancement with Perceptual Post-Filtering", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Orlando, FL), pp. I-537-I-540, May 2002.
See also: Demonstration & software

A methodology for suppressing musical noise produced by signal subspace speech enhancement is presented. An auditory post-filter is placed at the output of the subspace filter to smooth the enhanced speech spectra. By utilizing a perceptual filter, averaging is performed in a manner similar to that of the human auditory system. As such, distortion to the underlying speech signal is reduced.


Paper titles.