Multimedia Signal Processing Laboratory

P. Kabal


Paper Abstracts 1991


Papers

P. Kabal and E. Dubois

"Interpolating and Nyquist Filters with Constraints at Certain Frequencies", Signal Processing, vol. 24, no. 2, pp. 117-126, Aug. 1991.

This paper presents modification procedures for designing FIR interpolating and FIR Nyquist filters that allow the output of the filter to be error-free for certain frequencies. For a minimum mean-square error interpolator, the modifications result in an expanded set of linear equations which include the constraints. For minimax stop-band Nyquist filters, the modifications involve factoring out a filter which implements the constraints on the transfer function. Approximate techniques for implementing the constraints are also discussed.

V. Iyengar and P. Kabal

"A Low Delay 16 kb/s Speech Coder", IEEE Trans. Signal Processing, vol. 39, no. 5, pp. 1049-1057, May 1991.

This paper studies a speech coder using a tree code generated by a stochastic innovations tree and backward adaptive synthesis filters. The synthesis configuration uses a cascade of two all-pole filters: a pitch (long time delay) filter followed by a formant (short time delay) filter. Both filters are updated using backward adaptation. The formant predictor is updated using an adaptive lattice algorithm. The multipath (M, L) search algorithm is used to encode the speech. A frequency-weighted error measure is used to reduce the perceptual loudness of the quantization noise. The speech coder has low delay (1 ms) and subjective tests show that the speech quality is equally preferred to 7-b log-PCM.


Conference papers

A. K. Khandani, P. Kabal, and H. Leib

"Combined Coding and Shaping over a Multitone Channel", Proc. IEEE Globecom Conf. (Phoenix, AZ), pp. 1182-1186, Dec. 1991.

In this work, the idea of combined shaping and coding of a signal constellation over a multitone channel is introduced. In selecting such a constellation, we are faced with the problems of distributing the rate and the energy among the subchannels. Assuming continuous approximation, these factors can be selected independently. However, in the discrete case, one obtains a better performance by using a joint optimization procedure. More importantly, often the structure of the constellation boundary imposes some restrictions on the rate distribution. This provides a stronger coupling between these factors. We introduce two joint optimization methods, partly integer, for distributing the rate and energy. In the first method, the minimum distance to noise ratio (protection) along all the dimensions is the same. The proposed method maximizes this protection. In the second method, this restriction is relaxed. In this case, the average error probability is minimized. Nether of these methods has a higher complexity than the conventional schemes. The second method outperforms the first one. As part of the calculations, a closed-form formula has been found for the weight distribution of the scaled E8 lattice.

B. Champagne, A. Lobo, and P. Kabal

"On the Use of a Split-Beam Array for Tracking a Moving Talker", 122nd Meeting, Acoustical Society of America (Houston, TX), Nov. 1991 (abstract appears in J. Acoust. Soc. Amer., vol. 90, p. 2315, Oct. 1991).

Microphone arrays have been used in an audio-teleconferencing environment to pick up the speech signal from a known direction in the presence of noise and reverberation. In current algorithms, the direction of interest is obtained by measuring the output of a beamformer at a finite set of look directions and comparing it to a threshold. Unless the set of look directions is made sufficiently large, this method is not well suited to track a moving talker. In this paper a split-beam array configuration is used for the tracking problem. The time delay between the two halves of the array is obtained by a generalized cross-correlation method. A one-step Kalman predictor is then used to predict the delay for the next frame. This value is used to steer the beamformer. The system has been tested on computer-simulated data which modeled a talker moving along a linear trajectory in a reverberant room. Results indicate that this method can provide reliable estimates of the talker bearing angle in highly reverberant environments.

A. Khandani and P. Kabal

"Optimum Block Based Signalling over 1D and 1-D2 Partial Response Channels", Proc. Allerton Conf. Commun., Control, Computing (Allerton, IL), pp. 798-808, Oct. 1991.

In this work we discuss the optimum block based signaling over the partial response channels. This is composed of selecting a basis for the space (modulating signals) and an appropriate boundary for the constellation. In specific, the channels 1D and 1-D2 are studied. It is shown that the optimum basis for these channels are either sine function or closely related to it. This facilitates the modulation operation by the use of the fast sine transform algorithms.

The selection of the constellation (including its dimensionality) is based on minimizing the loss with respect to a flat channel. This allocates equal energy to the nonempty dimensions. The optimum basis is compared with the Fourier basis. Analytical expressions are found for the capacity of these channels. We also calculate the power spectrum associated with the proposed schemes.

A. Khandani and P. Kabal

"Unsymmetrical Boundary Shaping in Multidimensional Spaces", Proc. Allerton Conf. Commun., Control, Computing (Allerton, IL), pp. 779-789, Oct. 1991.

The power spectrum of a signal constellation depends on the constellation dimensions (modulating waveforms) and on the power allocated to each dimension. For a given orthonormal set of dimensions, spectral shaping can be achieved by allocating different powers to different dimensions. The price to be paid for a nonflat spectrum is a reduction in the signal space volume (rate). In this work the idea of unsymmetrical shaping is introduced. This is the selection of the boundary of a signal constellation which has different values of power along different dimensions. For a given set of powers, lambdai, shaping is achieved by selecting a region with the second moment lambdai along the i'th dimension and the with the volume as large as possible. This is equivalent to maximizing the shape gain. A larger shape gain is achieved at the price of a larger constellation expansion ratio, CERs. The structure of the regions which optimize the tradeoff between CERs and gammas are discussed. A practical addressing scheme is given to a achieve a point near the knee of the corresponding tradeoff curves.

B. Champagne, A. Lobo, and P. Kabal

"Efficient Methods for Simulating a Moving Talker in a Rectangular Room", Final Program and Paper Summaries, Workshop Applications Signal Processing to Audio, Acoustics (New Paltz, NY), pp. 139-140, Oct. 1991.

In this paper, we describe two methods for efficiently simulating the response of a microphone to a moving talker in a rectangular room. Both methods are based on an extension of the image method to moving sources. In the first method, the microphone output signal is obtained by performing a time-domain filtering operation on the original speech signal, while in the second method, a time-frequency representation of this filtering operation is used. In each case, computational load and memory requirements are considerably reduced by taking advantage of the fact that the talker velocity is much smaller than the speed of sound.

J. Grass, P. Kabal, M. Foodeei, and P. Mermelstein

"High Quality Low-Delay Speech Coding at 12 kb/s", Proc. IEEE Workshop Speech Coding (Whistler, BC), pp. 12-14, Sept. 1991.

A. K. Khandani and P. Kabal

"Shaping multidimensional signal constellations", Proc. IEEE Int. Symp. Information Theory (Budapest), p. 4, June 1991.

Consider an optimally shaped N-dimensional (N even) signal constellation on a lattice. Assuming continuous approximation, the boundary of the two-dimensional subconstellations is a circle and the boundary of the whole constellation is a hypersphere. We derive analytical expressions for the optimum tradeoff between the shape gain and the Constellation-Expansion-Ratio and also between the shape gain and the Peak-to-Average-power-Ratio.

We introduce a method for achieving a point on the optimum tradeoff curves. This is based on mapping the constellation of the Voronoi region of the lattice Dn*, n=N/2. The addressing complexity is essentially that of decoding Dn* which using Dn*={(2Z)n} U {(2Z)n+(1)n}, where Z is the integer lattice, is simple. For dimensions up to 12 the point obtained is located on the knee of the curve. This coding method is less complex and has superior performance to that based on the Voronoi constellations.

In a second method, the n-dimensional subconstellations are first shaped using previous method. The N/n-fold cartesian product of these subconstellations is further shaped by a lookup table. Analytical expressions show that even for small lookup tables this method obtains results near the optimum tradeoff curve.

P. Kabal

"Adaptive Linear Prediction in Speech Coding", Preprints IFAC/IFORS Symp. Identification, System Parameter Estimation (Budapest), pp. 203-207, July 1991.

Adaptive linear prediction is commonly used as a key step in digital coding of speech. This paper discusses some of the techniques that have been developed for adapting and coding the predictor coefficients in speech coders. The linear predictors in high quality speech coding often consist of two stages, a short-time span (formant) filter and a long-time span (pitch) filter. The use of such filters in analysis-by-synthesis coders is examined. In addition, backward adaptive strategies can be used to achieve high quality, low delay coding. The filters in these coders can be high-order (50 or more time lags) filters. Computational complexity and numerical stability of the algorithms is of prime concern for these filters. A number of new directions in the application of adaptive prediction in speech coding are also discussed.

Y. Qian, B. Jiang, Q. Zhu, and P. Kabal

"An Enhanced Predictive Multipulse LPC Speech Coder at 2.4 kbits/s", Program and Abstracts Int. IEEE/AP-S Symp., North American Radio Science Meeting (London, ON), p. 353, June 1991.

P. Kabal, F.-M. Wang, D. O'Shaughnessy, and R. P. Ramachandran

"Adaptive Postfiltering for Enhancement of Noisy Speech in the Frequency Domain", Proc. IEEE Int. Symp. Circuits, Systems (Singapore), pp. 312-315, June 1991.

This paper presents a new frequency-domain adaptive postfilter for enhancement of noisy speech. The postfilter suppresses the noise in spectral valleys and allows more noise in the formant regions where it is masked by the speech signal. First, we perform an LPC analysis of the noisy speech and calculate its log magnitude spectrum. After identifying the formants and spectral valleys, the log magnitude spectrum is modified to obtain the postfilter frequency response. This response has local minima in the regions corresponding to the spectral valleys and local maxima of equal magnitude at the formant frequencies. The filtering uses an overlap-add FFT strategy. Experimental results how that this new frequency-domain approach results in enhanced speech of better perceptual quality than obtained by a time-domain approach. Our method is especially efficient in eliminating high frequency noise and in preserving the weaker, high frequency formants in sonorant sounds.

M. Foodeei and P. Kabal

"Backward Adaptive Prediction: High-Order Predictors and Formant-Pitch Configurations", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Toronto, ON), pp. 2405-2408, May 1991.

Backward adaptive linear prediction is used in low-delay speech coders. A good redundancy removal scheme must consider both near-sample (formant) and far-sample (pitch) correlations. Two approaches are considered; (1) separate pitch and formant predictors and (2) a single high-order predictor. This paper presents analysis and simulation results comparing the performance of several types of high-order backward adaptive predictors with orders up to 100. Issues in high-order LPC analysis, such as analysis methods, windowing, ill-conditioning, quantization noise effects, and computational complexity are studied. The performance of the various analysis methods is compared with the conventional sequential formant-pitch predictor. The Auto-correlation method (50th order) shows performance advantages over the sequential formant-pitch configurations. Several new backward high-order methods using covariance analysis and a lattice formulation show much better prediction gains than the Auto-correlation method.

D. Bees, M. Blostein, and P. Kabal

"Reverberant Speech Enhancement Using Cepstral Processing", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Toronto, ON), pp. 977-980, May 1991.

Complex cepstral deconvolution is applied to acoustic dereverberation. It is found that traditional cepstral techniques fail in acoustic dereverberation because segmentation errors in the time domain prevent accurate cepstral computation. An algorithm for speech dereverberation is presented which incorporates a new approach to the segmentation and windowing procedure for speech. Averaging in the cepstrum is exploited to increase the separation between the speech and impulse response. An estimate of the room impulse response is built up, and a least squared error inverse filter is used to remove the estimated impulse response from the reverberant speech. Reduction of reverberation with this technique is demonstrated.

J. Grass and P. Kabal

"Methods of Improving Vector-Scalar Quantization of LPC Coefficients", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Toronto, ON), pp. 657-660, May 1991.

Methods of improving vector-scalar quantization of Linear Predictive Coding (LPC) coefficients with 20 to 30 bits per 20 ms frame are studied in this paper. The approach in this work is to couple the vector and scalar quantization stages. The second innovation is the incorporation of a small adaptive codebook to the larger fixed codebook. Frame-to-frame correlation of the LPC coefficients is exploited at no extra cost in bits. The results of this paper show that the performance of vector-scalar quantization with the use of the two new techniques is better than that of scalar coding techniques currently used in LPC coders.

M. Foodeei and P. Kabal

"Low-Delay CELP and Tree Coders: Comparison and Performance Improvements", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Toronto, ON), pp. 25-28, May 1991.

There has been a recent interest in network-quality speech coders with low-delay at 16 kb/s for CCITT standardization, There is reason to believe that coders with rates below 16 kb/s will be able to meet the same quality standards. The challenge is to develop high-performance mutually-compatible components for the target coder. A stochastic tree coder based on the (M,L) search algorithm by Iyengar and Kabal and a low-delay CELP proposed by Chan are considered. First the individual components (predictors, gain adaptation, excitation coding) of the two coders are analyzed. Second, the performance of the two types of coders is compared. The two coders have comparable performance at 16 kb/s under clean channel conditions. Finally, methods to improve the performance of the coders, particularly with a view to bringing the bit rate to below 16 kb/s are studied. Suggestions to improve the performance include an improved high-order predictor (applicable to both coders), and training of the excitation dictionary as well as a better gain adaptation strategy for the tree coder.

G. Roy and P. Kabal

"Wideband CELP Speech Coding at 16 kb/s", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Toronto, ON), pp. 17-20, May 1991.

This paper investigates the use of CELP (Code Excited Linear Prediction) in coding wideband speech signals at an operating rate of 16 kbits/s. The wideband signals under consideration are bandlimited to 7500 Hz and sampled at 16 kHz. In order to achieve a low operating rate, the coding places more emphasis on the lower frequencies (0 - 4 kHz), while the higher frequencies are coded less precisely, but with little perceived degradation. To this effect, the basic CELP model is modified to operate in a split-band mode.


Paper titles.