Multimedia Signal Processing Laboratory

P. Kabal

Paper Abstracts 2004


S. Valaee and P. Kabal

"An Information Theoretic Approach to Source Enumeration in Array Signal Processing", IEEE Trans. Signal Processing, vol. 52, no. 5, pp. 1171-1178, May 2004.

In this paper, a new information theoretic algorithm is proposed for signal enumeration in array processing. The approach is based on predictive description length (PDL) that is defined as the length of a predictive code for the set of observations. We assume that several models, with each model representing a certain number of sources, will compete. The PDL criterion is computed for the candidate models and is minimized over all models to select the best model and to determine the number of signals. In the proposed method, the correlation matrix is decomposed into two orthogonal components in the signal and noise subspaces. The maximum likelihood (ML) estimates of the angles-of-arrival are used to find the projection of the sample correlation matrix onto the signal and noise subspaces. The summation of the ML estimates of these matrices is the ML estimate of the correlation matrix. This method can detect both coherent and noncoherent signals. The proposed method can be used online and can be applied to time-varying systems and target tracking.

S. Nikneshan, A. K. Khandani, and P. Kabal

"Soft-Decision Decoding of Fixed-Rate Entropy-Coded Trellis-Coded Quantizer Over a Noisy Channel", IEEE Trans. Vehicular Technol., vol. 53, no. 2, pp. 329-338, March 2004.

This paper presents new techniques to improve the performance of a fixed-rate entropy-coded trellis-coded quantizer (FE-TCQ) in transmission over a noisy channel. In this respect, we first present the optimal decoder for a fixed-rate entropy-coded vector quantizer (FEVQ). We show that the optimal decoder for the FEVQ can be a maximum likelihood decoder where a trellis structure is used to model the set of possible codewords and the Viterbi algorithm is subsequently applied to select the most likely path through this trellis. In order to add quantization packing gain to the FEVQ, we take advantage of a trellis-coded quantization (TCQ) scheme. To prevent error propagation, it is necessary to use a block structure obtained through a truncation of the corresponding trellis. To perform this task in an efficient manner, we apply the idea of tail-biting to the trellis structure of the underlying TCQ. It is shown that the use of a tail-biting trellis significantly reduces the required block length with respect to some other possible alternatives known for trellis truncation. This results in a smaller delay and also mitigates the effect of error propagation in signaling over a noisy channel. Finally, we present methods and numerical results for the combination of the proposed FEVQ soft decoder and a tail-biting TCQ. These results show that, by an appropriate design of the underlying components, one can obtain a substantial improvement in the overall performance of such a fixed-rate entropy-coded scheme.


K. H. El-Maleh and P. Kabal

"Method and Apparatus for Providing Background Acoustic Noise During a Discontinued/Reduced Rate Transmission Mode of a Voice Transmission System", US Patent 6,782,361, Aug. 2004.

Natural-quality synthetic noise will replace background acoustic noise during speech gaps and will achieve a better representation of the excitation signal in a noise-synthesis model by classifying the type of acoustic environment noise into one or more of a plurality of noise classes. The noise class information is used to synthesize background noise that sounds similar to the actual background noise during speech transmission. In some embodiments, the noise class information is derived by the transmitter and transmitted to the receiver which selects corresponding excitation vectors and filters them using a synthesis filter to construct the synthetic noise. In other embodiments, the receiver itself classifies the background noise present in hangover frames and uses the class information as before to generate the synthetic noise. The improvement in the quality of synthesized noise during speech gaps helps to preserve noise continuity between talk spurts and speech pauses, and enhances the perceived quality of a conversation.

Kabal and H. Najafzadeh-Azghandi

"Perceptual Audio Coding", US Patent 6,704,705, March 2004.

A method and apparatus for perceptual audio coding. The method and apparatus provide high-quality sound for coding rates down to and below 1 bit/sample for a wide variety of input signals including speech, music and background noise. The invention provides a new distortion measure for coding the input speech and training the codebooks, where the distortion measure is based on a masking spectrum of the input frequency spectrum. The invention also provides a method for direct calculation of masking thresholds from a modified discrete cosine transform of the input signal. The invention also provides a predictive and non-predictive vector quantizer for determining the energy of the coefficients representing the frequency spectrum. As well, the invention provides a split vector quantizer for quantizing the fine structure of coefficients representing the frequency spectrum. Bit allocation for the split vector quantizer is based on the masking threshold. The split vector quantizer also makes use of embedded codebooks. Furthermore, the invention makes use of a new transient detection method for selection of input windows.

Conference papers

Y. Ould-Cheikh-Mouhamedou, S. Crozier, and P. Kabal

"Distance Measurement Method for Double Binary Turbo Codes and a New Interleaver Design for DVB-RCS", Proc. IEEE Global Telecommun. Conf. (Dallas, TX), 7 pp., Nov. 2004.

This paper presents a computationally efficient distance measurement method for double binary turbo codes, such as these used in the Digital Video Broadcast with Return Channel via Satellite (DVB-RCS) standard, based on Garello’s method. Distance spectra for all standardized DVB-RCS packet sizes and all standardized code rates are presented. A new interleaver design for DVB-RCS based on the dithered relative prime (DRP) interleaving approach is also presented. A minimum distance (dmin) of 36 has been achieved for an unpunctured ATM packet of 424 information bits with a DRP interleaver, whereas the dmin of the standardized DVB-RCS interleaver is 31. A dmin of 38 has been achieved for an unpunctured MPEG packet of size 1504 information bits with a DRP interleaver, whereas the dmin of the standardized DVB-RCS interleaver is 33. Simulation results for code rate 1/3 show an improvement at high signal to noise ratios of at least 0.15 dB and 0.25 dB for ATM and MPEG packets, respectively.

A. M. Wyglinski, F. Labeau, and P. Kabal

"Effects of Imperfect Subcarrier SNR Information on Adaptive Bit Loading Algorithms for Multicarrier Systems", Proc. IEEE Global Telecommun. Conf. (Dallas, TX), pp. 3835-3839, Nov. 2004.

In this paper, we evaluate and compare the robustness of several adaptive bit loading algorithms for multicarrier transmission systems when imperfect subcarrier signal-to-noise ratio (SNR) information is used. In particular, we investigate the impact of the uncertainty of data-aided channel estimation techniques on system performance. We also examine an implementation issue associated with adaptive bit loading algorithms that use metrics related to the SNR. Although such metrics can be derived via closed form expressions, look-up tables are used instead to reduce system complexity, resulting in the SNR values being quantized. Thus, we examine the effects of SNR quantization on system performance. Finally, we present a technique for choosing SNR values in a fixed length effects of SNR quantization on system performance. Finally, we present a technique for choosing SNR values in a fixed length look-up table in order to minimize quantization error.

Qian and P. Kabal

"Highband Spectrum Envelope Estimation of Telephone Speech Using Hard/Soft-Classification", Proc. Interspeech 2004 (Jeju Island, Korea), pp. 2717-2720, Oct. 2004.

The bandwidth for telephony is generally defined to be from 300-3400 Hz. This bandwidth restriction has a noticeable effect on speech quality. We present an algorithm which recovers the missing highband parts from telephone speech. We describe an MMSE estimator using hard/soft-classification to create the missing highband spectrum envelope. The classification is motivated by acoustic phonetics: voiced vowels and consonants, and unvoiced phonemes demonstrate different characteristic spectra. The classification also captures gender differences. A hard classification on phoneme characteristic parameters, such as a voicing degree and a pitch lag, reduces the MMSE of the highband spectrum envelope estimates. An estimator using HMM-based soft-classification can further bring down the estimated highband spectrum distortion by taking the time evolution of the spectra into consideration. Objective measures (mean log-spectrum distortion) and spectrograms confirm the improvement noted in informal subjective tests.

T. H. Falk, W.-Y. Chan, and P. Kabal

"Speech Quality Estimation Using Gaussian Mixture Models", Proc. Interspeech 2004 (Jeju Island, Korea), pp. 2013-2016, Oct. 2004.

We propose a novel method to estimate the quality of coded speech signals. The joint probability distribution of the subjective me an opinion score (MOS) and perceptual distortion feature variables is modelled using a Gaussian mixture density. The feature variables are sifted from a large pool of candidate features using statistical data mining techniques. We study what combinations of features and mixture model configuration are most effective. For our speech database, a five-feature, three-component GMM furnishes approximately 18% lower root-mean-squared MOS estimation error than ITU-T P.862 PESQ, the current best standard algorithm.

C. Elliott and P. Kabal

"Synchronization of Speaker Selection for Centralized Tandem Free VoIP Conferencing", Proc. Interspeech 2004 (Jeju Island, Korea), pp. 633-636, Oct. 2004.

Traditional teleconferencing uses a select-and-mix function at a centralized conferencing bridge. In VoIP environments, this mixing operation can lead to speech degradation when using high compression speech codecs due to tandem encodings and coding of multi-talker signals. A tandem-free architecture can eliminate tandem encodings and preserve speech quality. VoIP conference bridges must also consider the variable network delays experienced by different packetized voice streams. A synchronized speaker selection algorithm at the bridge can smooth out network delay variations and synchronize incoming voice streams. This provides a clean mapping of the N input packet streams to the M output streams representing selected speakers. This paper presents a synchronized speaker selection algorithm and evaluates its performance using a conference simulator. The synchronization process is shown to account for only a small part of the overall delay experienced by selected packets.

A. M. Wyglinski, P. Kabal and F. Labeau

  "Variable Length Subcarrier Equalizers for Multicarrier Systems", Proc. IEEE Vehicular Technol. Conf. (Los Angeles, CA), pp. 394-398, Sept. 2004.

We present a novel algorithm for defining the lengths of subcarrier equalizers employed by wireless multicarrier transmission systems operating in frequency-selective fading channels. The equalizer lengths across the subcarriers are varied incrementally in a "greedy" fashion until the average mean squared error (MSE) is below some prescribed threshold. By varying the equalizer lengths, the overall complexity of the equalization is constrained while the system meets a minimum error performance. The results show that when a system employs variable-length equalizers defined by the proposed algorithm, it significantly outperforms a system employing constant-length equalizers of the same overall complexity.

R. Der, P. Kabal, and W.-Y. Chan

"Bit Allocation Algorithms for Frequency and Time Spread Perceptual Coding", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Montreal, QC), pp. IV-201-IV-204, May 2004.

We examine the problem of bit allocation when time-spread and frequency-spread perceptual distortion criteria are used. For such measures, standard incremental techniques can fail. Two algorithms are introduced for bit allocation; the first a multi-band version of the greedy algorithm, and the second an inverse greedy algorithm initialized by the bit allocation of a forward algorithm driven by a non-spread metric. Experimental results show the second algorithm outperforms the first.

Y. Qian and P. Kabal

"Combining Equalization and Estimation for Bandwidth Extension of Narrowband Speech", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Montreal, QC), pp. I-713-I-716, May 2004.

Current public telephone networks compromise voice quality by bandlimiting the speech signal. Telephone speech is characterized by a bandpass response from 300 to 3400 Hz. The voice quality is perceived as being much worse than for wideband speech (50-7000Hz. We present a novel approach which combines equalization and estimation to create a wideband signal, with reconstructed components in the 3400 Hz to 7000 Hz range. Equalization is used in the 3400-4000 Hz range. Its performance is better than statistical estimation procedures, because the mutual dependencies between the narrowband and highband parameters are not sufficiently large. Subjective evaluation using an Improvement Category Rating shows that the reconstructed wideband speech using both equalization and estimation substantially enhances the quality of telephone speech. We have also evaluated the performance on the narrowband output of several standard codecs. Overall, the use of equalization for part of the highband regeneration makes the system more robust to phonetic variability and speaker gender.

A. M. Wyglinski, F. Labeau, and P. Kabal

"An Efficient Bit Allocation Scheme for Multicarrier Modulation", Proc. IEEE Wireless Commun., Networking Conf. (Atlanta, GA), pp. 1194-1199, March 2004.

In this paper we present an efficient bit allocation algorithm for multicarrier systems operating in frequency-selective environments. The proposed algorithm strives to maximize the overall throughput while guaranteeing that the mean bit error rate (BER) remains below a prescribed threshold. The algorithm is compared with several other algorithms found in literature in terms of the overall throughput, mean BER, and relative computational complexity. Furthermore, the algorithms are compared with an exhaustive search routine to determine the optimal bit allocation in terms of maximizing throughput given the constraint on error performance. No power allocation is performed by the algorithms. Results show the proposed algorithm has approximately the same throughput and mean BER as the optimal solution while possessing a significantly lower computational complexity relative to the other algorithms with similar performance. When compared to algorithms which employ approximations to waterfilling, the computational complexity is comparable while the overall throughput is closer to the optimum.

Paper titles.