Multimedia Signal Processing Laboratory

P. Kabal


Paper Abstracts 1994


Papers

S. Valaee and P. Kabal

"Alternate Proofs for `On Unique Localization of Constrained-Signal Sources'", IEEE Trans. Signal Processing, vol. 42, no. 12, pp. 3547-3549, Dec. 1994.

In this correspondence, we provide simple proofs for the theorems in "On Unique Localization of Constrained-Signal Sources" by M. Wax. The approach is based on the topological dimension of a set. All the possible observation matrices form the legitimate set. The observation matrices that can have nonunique solutions form the ambiguity set. The components of the legitimate and ambiguity set are random matrices. We find the conditions under which the dimensionality of the ambiguity set is smaller than the dimensionality of the legitimate set. In such a case, the probability of the ambiguity set is zero and with probability one, a unique solution can be found for the localization problem.

A. K. Khandani and P. Kabal

"Shaping of Multidimensional Signal Constellations Using a Lookup Table", IEEE Trans. Information Theory, vol. 40, no. 6, pp. 2058-2062, Nov. 1994.

This paper describes a lookup table for the addressing of an optimally shaped constellation. The method is based on partitioning the subconstellations into shaping macro-shells of integer bit rate and increasing average energy. The macro-shells do not need to have an equal number of points. A lookup table is used to select a subset of the partitions in the cartesian product space. By devising appropriate partitioning / merging rules, we obtain suboptimum schemes of very low addressing complexity and small performance degradation. The performance is computed using the weight distribution of an optimally shaped constellation.

A. Khandani and P. Kabal

"Block-Based Eigensystem of the 1D and 1-D2 Partial Response Channels", IEEE Trans. Information Theory, vol. 40, no. 5, pp. 1645-1647, Sept. 1994.

We find analytical expressions for the block-based input and output eigenvectors and eigenvalues of the systems with responses 1D and 1-D2. The input eigenvectors form an orthonormal basis which is the optimum modulator for a channel with that transfer function. The output eigenvectors form an orthonormal basis with the same spectral nulls as the corresponding system. This basis can be used to produce line codes with spectral nulls. The eigenvectors are sinusoids. This reduces the computational complexity by allowing for fast transform algorithms to perform the modulation for a block of data.

Y. Qian, G. Chahine, and P. Kabal

"Pseudo-Multi-Tap Pitch Filters in a Low Bit-Rate CELP Speech Coder", Speech Communication, vol. 14, no. 4, pp. 339-358, Sept. 1994.

The pitch filter in a low bit-rate CELP speech coder has a strong impact on the quality of the reconstructed speech. In this paper we propose a pseudo-multi-tap pitch filter with fewer degrees of freedom than the number of prediction coefficients, but which gives a higher pitch prediction gain and a more appropriate frequency response than a conventional one-tap pitch filter. First, we present an analysis model for the pseudo-multi-tap pitch prediction filter. Then, we introduce a pseudo-multi-tap pitch prediction filter with a fractional pitch lag. The prediction gain of the pseudo-multi-tap pitch filter is compared to that of conventional one-tap and three-tap pitch filters with integer and non-integer pitch lags. A switching configuration is also studied. This filter switches modes depending on the prediction gain. The stability of a pseudo-multi-tap pitch synthesis filter in a CELP coder is considered. We propose a stabilization method with a relaxed stability test. This relaxed test gives better results than a strict stability test. Finally, we have incorporated the pseudo-multi-tap pitch filter into a 4.8 kbit/s CELP speech coder. Both the objective SNR and subjective quality are better than for a conventional one-tap pitch filter.

A. De and P. Kabal

"Auditory Distortion Measure for Speech Coder Evaluation - Discrimination Information Approach", Speech Communication, vol. 14, no. 3, pp. 205-229, June 1994.

In this article, we devise a fidelity criterion for quantifying the degree of distortion introduced by a speech coder. An original speech and its coded version are transformed from the time-domain to a perceptual-domain using an auditory (cochlear) model. This perceptual-domain representation provides information pertaining to the probability-of-firings in the neural channels. The introduced cochlear discrimination information (CDI) measure compares these firing probabilities in an information-theoretic sense. In essence, it evaluates the cross-entropy of the neural firings for the coded speech with respect to those for the original one. The performance of this objective measure is compared with subjective evaluation results. Finally, we provide a rate-distortion analysis by computing the rate-distortion function for speech coding using the Blahut algorithm. Four state-of-the-art speech coders with rates ranging from 4.8 kbit/s (CELP) to 32 kbit/s (ADPCM) are studied from the viewpoint of their performances (as assessed by the CDI measure) with respect to the rate-distortion limits.


Chapter in book

A. K. Khandani, P. Kabal, and E. Dubois

"Efficient Algorithms for Fixed-rate Entropy-Coded Vector Quantization", in Information Theory and Applications (T. A. Gulliver and N. P. Secord, eds.), Lecture Notes in Computer Science series, vol. 793, pp. 385-394, Springer-Verlag, 1994.

In quantization of any source with a nonuniform probability density function, the entropy coding of the quantizer output can result in a substantial decrease in bit rate. A straight-forward entropy coding scheme presents us with the problem of the variable data rate. A solution in a space of dimensionality N is to select a subset of elements in the N-fold cartesian product of a scalar quantizer and represent them with code-words of the same length. A reasonable rule is to select the N-fold symbols of the highest probability. For a memoryless source, this is equivalent to selecting the N-fold symbols with the lowest additive self-information. The search/addressing of this scheme can no longer be achieved independently along the one-dimensional subspaces. In the case of a memoryless source, the selected subset has a high degree of structure which can be used to substantially decrease the complexity. In this work, a dynamic programming approach is used to exploit this structure. We build our recursive structure required for the dynamic programming in a hierarchy of stages. This results in several benefits over the conventional trellis-based approaches. Using this structure, we develop efficient rules (based on aggregating the states) to substantially reduce the search/addressing complexities while keeping the degradation negligible.


Conference papers

Y. Qian, G. Chahine, and P. Kabal

"An Enhanced Adaptive Codebook for a CELP Coder", Proc. European Signal Processing Conf. (Edinburgh, Scotland), pp. 908-911, Sept. 1994.

An adaptive codebook for a one-tap pitch filter has been used for determining the pitch filter using an analysis-by-synthesis procedure in CELP coders. In this paper, we first present the formulations for designing an enhanced adaptive codebook for a pseudo-three-tap pitch synthesis filter, which gives a better performance than a conventional one-tap pitch filter. Then, we focus on the stability analysis of the pseudo-three-tap pitch filters. We propose a sufficient test condition with a relaxed stability, which gives a better performance than a strict stability check. We have employed the enhanced adaptive codebook based on a pseudo-three-tap pitch filter with fractional pitch lags for a 4.8 kb/s CELP speech coder. Both objective and subjective quality have been improved with the enhanced adaptive codebook.

S. Valaee, P. Kabal, and B. Champagne

"A Parametric Approach to Extended Source Localization", Proc. European Signal Processing Conf. (Edinburgh, Scotland), pp. 764-767, Sept. 1994.

Point source modeling is frequently used in array processing. Although this assumption is good for many applications, there are some situations where point source modeling is unrealistic. For instance, in a multi-beam echo sounder, a reflected signal from the sea floor appears as a spatially extended source. In this paper we investigate distributed sources. The approach is based on the assumption that the correlation kernel of the distributed source belongs to a family of parametric functions. We generalize the MUSIC algorithm to a distributed signal parameter estimator (DSPE). The DSPE localizer minimizes a scalar product between an estimated basis for the noise subspace and the array manifold. We study two cases corresponding to completely correlated and totally uncorrelated signal distributions. We also discuss limitations of the application of ordinary beamformer techniques to spatially distributed signals by computing the array gain. It is shown that the array gain is upper bounded by a value which depends on the extension width of the source. Thus increasing the number of the sensors in a beamformer does not necessarily increase the resolution.

S. Valaee and P. Kabal

"Wideband Array Processing Using Total Least-Squares Transformations", Proc. SP Workshop Statistical Signal, Array Processing (Quebec, QC), pp. 133-136, June 1994.

In this paper, we introduce a new technique for wideband array processing. The new algorithm is based on the total least-square approach. The total least-square method is an alternative to the least-square method can uses the fact that the errors can exist both in the focusing location matrix and the estimated location matrix at the frequency bin. To prevent the focusing loss, we use a unitary approach for focusing. The new method does not require singular value decomposition. The computational complexity for the new technique is significantly lower than that for the similar methods which use singular value decomposition. The simulation results show that the new algorithm has a smaller resolution signal-to-noise ratio than the coherent signal-subspace method. The bias in the estimation of the directions-of-arrival is also smaller for the new method than that for the coherent signal-subspace method.

Y. Qian and P. Kabal

"Speech Coding at 4.8 kb/s With an Improved Pitch Filter", Proc. Int. Conf. Commun. Technol. (Shanghai, China), pp. 24.01.1-24.01.4, June 1994.

The reconstructed speech quality in a low bit-rate CELP coder is very dependent on the performance of the pitch filter. In this paper, we present an improved pitch filter, a fractional pseudo-three-tap pitch synthesis filter, which performs better than a conventional one-tap pitch filter. We discuss the frequency response of the improved pitch filter. We explore stability issues for three-tap pitch filters in a CELP coder. We have incorporated a fractional pseudo-three-tap pitch filter into a 4.8 kb/s CELP speech coder. Both objective and subjective quality have been improved with the fraction pseudo-three-tap pitch filter.

A. K. Khandani and P. Kabal

"Efficient Addressing of Multi-Dimensional Signal Constellations Using a Lookup Table", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 174-177, May 1994.

This paper describes a lookup table for the addressing of an optimally shaped constellation. The method is based on partitioning the subconstellations into shaping macro-shells of integer bit rate and increasing average energy. A lookup table is used to select a subset of the partitions in the cartesian product space. By devising appropriate partitioning/merging rules, we obtain suboptimum schemes of very low addressing complexity and small performance degradation. The performance is computed using weight distribution of an optimally shaped constellation.

J. Stachurski and P. Kabal

"A Pitch Pulse Evolution Model for a Dual Excitation Linear Predictive Speech Coder", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 107-110, May 1994.

This paper introduces a new technique to model the excitation waveform for a linear predictive speech coder. The target application is high quality speech coding for rates near 4 kb/s. Our pitch pulse evolution model decomposes the excitation into two separate but simultaneous signals: the evolving pitch pulse component and the unvoiced, noise-like contribution. A number of formulations for decomposing the excitation waveform are suggested.

A. K. Khandani, P. Kabal, and E. Dubois

"Efficient Algorithms for Fixed-Rate Entropy-Coded Vector Quantization", Proc. IEEE Int. Conf. Commun. (New Orleans, LA), pp. 240-244, May 1994.

In quantization of any source with a nonuniform probable density function, the entropy coding of the quantizer output can result in a substantial decrease in bit rate. A straight-forward entropy coding scheme presents us with the problem of the variable data rate. A solution in a space of dimensionality N is to select a subset of elements in the N-fold cartesian product of a scalar quantizer and represent them with code-words of the same length. A reasonable rule is to select the N-fold symbols of the highest probability. For a memoryless source, this is equivalent to selecting the N-fold symbols with the lowest additive self-information. The search/addressing of this scheme can no longer be achieved independently along the one-dimensional sub-spaces. In the case of a memoryless source, the selected subset has a high degree of structure which can be used to substantially decrease the complexity. In this work, a dynamic programming approach is used to exploit this structure. We build our recursive structure required for the dynamic programming in a hierarchy of stages. This results in several benefits over the conventional trellis-based approaches. Using this structure, we develop efficient rules (based on aggregating the states) to substantially reduce the search/addressing complexities while keeping the degradation negligible.


Paper titles.