Multimedia Signal Processing Laboratory

P. Kabal


Paper Abstracts 1992


Papers

A. Khandani and P. Kabal

"Two-dimensional Statistics of the Voronoi Constellations Based on the Shaping Lattices DN and DN*", J. China Institute Commun., vol. 13, pp. 38-48, Nov. 1992.

Shaping concerns the selection of the boundary of a signal constellation. The major purpose of this selection is to reduce the average energy of the set. In the Voronoi constellation, the Voronoi region of a lattice, denoted as the shaping lattice, is used as the boundary of the constellation. In this case, the set of the constellation points form a group under vector addition modulo the shaping lattice. This property is used to achieve the addressing. In this work, some properties of the Voronoi constellations based on the shaping lattices DN and DN* will be discussed. The induced probability distribution on the two dimensional subspaces is found. A prefix coding scheme as an alternative for the addressing is presented. This code in some sense simulates the effect of the boundary by using points on the subspaces with nonequal probability.

D. Boudreau and P. Kabal

"Asymptotic Receiver Structures for Joint Maximum Likelihood Time Delay Estimation and Channel Identification Using Gaussian Signals", IEEE Trans. Signal Processing, vol. 40, no. 5, pp. 1258-1261, May 1992.

This correspondence addresses the problem of jointly estimating the relative time delay and the impulse response linking two received discrete-time Gaussian signals. Using two different methods, possible structures for the joint maximum-likelihood (ML) estimator are proposed, when the observation interval is long compared to both the delay to estimate and the correlation time of the various random processes involved. These structures generalize the cross-correlation method with prefiltering that implements the ML estimation of pure time delays.

R. P. Ramachandran and P. Kabal

"Bandwidth Efficient Transmultiplexers, Part 2: Subband Complements and Performance Aspects", IEEE Trans. Signal Processing, vol. 40, no. 5, pp. 1108-1121, May 1992.

This paper examines the performance issues related to the quadrature amplitude modulation (QAM) and vestigial sideband (VSB) transmultiplexers synthesized in [1]. First, an analysis of the limitations of the configured systems regarding intersymbol interference and crosstalk suppression arising from the use of practical filters is made. Based on these observations, a new design technique for an FIR low-pass prototype that takes the practical degradations into account is formulated. The procedure involves the unconstrained optimization of an error function. A performance evaluation reveals that for four of the five systems, the new method is superior to a minimax approach in that lower intersymbol interference and crosstalk distortions are achieved with a smaller number of filter taps. For the other transmultiplexer, the advantage of the optimized design over the minimax design is in the added flexibility of taking crosstalk into account, thereby diminishing the crosstalk distortion. The five transmultiplexers can be converted into new subband systems. The authors show how the optimized design approach formulated for the transmultiplexers over to the new subband systems.

R. P. Ramachandran and P. Kabal

"Bandwidth Efficient Transmultiplexers, Part 1: Synthesis", IEEE Trans. Signal Processing, vol. 40, no. 1, pp. 70-84, Jan. 1992.

This paper develops a set of conditions which allow bandwidth efficient transmultiplexers to be synthesized. The synthesis procedure is based upon a generalized impulse response for the combining (modulating) and separation (demodulating) filters. In particular, the combining and separation filters are bandpass versions of one of two low-pass prototypes and are configured to cancel crosstalk by exploiting relationships between the center frequencies, delays, and phases in their impulse response. Based on the derived conditions, five different transmultiplexers are synthesized. Three of them implement multicarrier quadrature amplitude modulation (QAM). The other two accomplish multicarrier vestigial sideband modulation (VSB). Intersymbol interference is eliminated by appropriately designing the prototypes. The two band case is treated as a special case. For this case, the extra flexibility in choosing the center frequencies leads to the synthesis of additional transmultiplexers.


Conference papers

A. K. Khandani and P. Kabal

"Address Decomposition for the Shaping of Multi-Dimensional Signal Constellations", Proc. IEEE Globecom Conf. (Orlando, FL), pp. 1774-1778, Dec. 1992.

In this work, we introduce an efficient addressing scheme to realize points near to the knee of the tradeoff curves of an optimally shaped constellation. This scheme, called the address decomposition, is based on decomposing the addressing into a hierarchy of addressing steps, each of a low dimensionality. As the memory size associated with a direct addressing scheme has an exponential growth with the dimensionality, this decomposition of addressing results in a substantial decrease in the complexity. In this case, by using a memory of a practical size, one can move along a tradeoff curve which is nearly optimum. For example, in a space of dimensionality N=32, a block of memory of 2.5 kilo-bytes per N dimensions is used to achieve a shape gain of 0.92 dB with a constellation expansion ratio of 1.25 and a peak to average power ratio of 2.95. This scheme has no associated computation, is straightforward to implement, and is adaptable to the structure of a general coset coding scheme.

A. De, I. Sasase, and P. Kabal

"Trellis-Coded Phase/Frequency Modulation with Equal Usage of Signal Dimensions", Proc. IEEE Globecom Conf. (Orlando, FL), pp. 1769-1773, Dec. 1992.

A trellis-coded modulation scheme, using a coded 2-FSK (frequency shift keyed)/2m-PSK (phase shift keyed) modulation format instead of a conventional coded 2m+1-PSK format for m-bits/symbol data transmission, is considered. Previous work [1,2] on coded 2-FSK/2m-PSK modulation techniques chose two transmission frequencies equally often. In contrast, our method selects the signal points in such a manner that each of the signal dimensions is used equally. We compare the asymptotic coding gains of the proposed method to those of the previous schemes and calculate an upper bound to the bit-error probability. The spectral characteristics of the proposed coded modulation scheme are also studied vis-à-vis those of the coded 2m+1 PSK format and of the other [1,2] coded 2-FSK/2m-PSK formats. We show that the new scheme obtains higher coding gains, sometimes (but not always) at the expense of an increased bandwidth.

A. De and P. Kabal

"Rate-Distortion Function for Speech Coding Based on Perceptual Distortion Measure", Proc. IEEE Globecom Conf. (Orlando, FL), pp. 452-456, Dec. 1992.

In [1], we have proposed a perceptual distortion measure for speech coders using an auditory (cochlear) model. This measure evaluates the neural-firing cross-entropy of the coded speech with respect to that of the original one. In this paper, the output space of the cochlear model is explored using this measure form so as to verify the existence of the pitch and formant information. However, the prime objective of this article is to provide a rate-distortion analysis for speech coding. We evaluate a lower bound to the rate-distortion function based on this distortion measure and also compute the exact rate-distortion function using the Blahut algorithm. Four state-of-the-art speech coders with rates ranging from 4.8 kb/s (CELP) to 32 kb/s (ADPCM) are studied from the viewpoint of their performances with respect to the rate-distortion limits.

Y. M. Cheng, D. O'Shaughnessy, and P. Kabal

"Speech Enhancement Using a Statistically Derived Filter Mapping", Proc. Int. Conf. Spoken Language Processing (Banff, AB), pp. 515-518, Oct. 1992.

We view the speech enhancement task in two aspects: reduction of the perceptual noise level in degraded speech and reconstruction of the degraded information, which may result in improvement of speech intelligibility. We are also very interested in noise-independent speech enhancement where test noise environments could differ in intensity from those of algorithm development. To this end, we have developed in this paper an algorithm called Noise-Independent Statistical Spectral Mapping (NISSM) to estimate a speech enhancement Wiener filter. NISSM consists of a noise-resistant transformation, which converts noisy speech to a set of noise-resistant features, and a spectral mapping function, which maps the features to autoregressive spectra of clean speech. We will show that the proposed algorithm effectively reduces noise intensity. When the noise intensity of training differs from that of testing, NISSM outperforms significantly a conventional spectral mapping. The algorithm operates frame-by-frame and is designed for real-time applications. The noise interference could be stationary or non-stationary white noise with variable intensity.

S. Valaee and P. Kabal

"A Unitary Transformation Algorithm for Wideband Array Processing", Proc. IEEE SP Workshop Statistical Signal, Array Processing (Victoria, BC), pp. 300-303, Oct. 1992.

A new method for broadband array processing is proposed. The method is based on a unitary transformation on the cross-correlation matrices of the array. It is shown that the Two-sided Correlation Transformation (TCT) generates unbiased estimates of the directions of arrival regardless of the bandwidth of the signals. The capability of the method for resolving two closely spaced sources is compared with that of the Coherent Signal-subspace Method (CSM). The resolution threshold for the new technique is smaller than the threshold for CSM.

Y. Qian and P. Kabal

"Backward Adaptive Prediction Cascaded with Forward Formant and Pitch Configurations", Proc. Canadian Conf. Electrical, Computer Engineering (Toronto, ON), pp. WM9.24.1-WM9.24.4, Sept. 1992.

Two kinds of cascaded backward-adaptive predictor (forward formant-backward formant-forward pitch and backward formant-forward pitch) are investigated in this paper. We have analyzed and tested two important parameters for the backward-adaptive formant predictor in these configurations: the update rate of the linear prediction coefficients and the analysis frame length. We have found that if the analysis frame length of the backward-adaptive formant predictor is shorter than the pitch period, the backward prediction gain degrades rapidly. We have found that the average prediction gain for the slower update rates is close to the fast update one. The slower the update rate, the fewer the computations. Particularly, the backward predictor with slower update rate behaves more like a linear filter. These new results provide a useful platform to explore the applications of backward adaptive prediction to low bit-rate speech coders, in which the backward-adaptive formant predictor is cascaded with a forward pitch predictor or the forward formant-backward formant-forward adaptive pitch predictor is used [1],[2].

A. K. Khandani and P. Kabal

"ISI-Reduced Modulation over a Fading Multipath Channel", Proc. Int. Conf. Universal Personal Commun. (Dallas, TX), pp. 11.02.1-11.02.5, Sept. 1992.

In this work, the idea of using the channel eigenvectors as the basis for a block based signaling scheme over a fading multipath channel is introduced. This basis minimizes the product of the average fading attenuations along different dimensions. The ISI from the preceding blocks (intra-block ISI) is modeled by an additive Gaussian noise. To reduce the effect of the intra-block ISI, a number of zeros are transmitted between successive blocks. The number of zeros is optimized to minimize the average probability of error. As the transmission of zeros reduces the bandwidth efficiency, this optimization procedure is more useful for lower bit rates. By applying quadrature amplitude modulation (QAM) to each dimension, we obtain a set of two-dimensional subchannels with unequal fadings. A coherent M-PSK constellation is employed over each QAM subchannel. We propose two methods to distribute the rate and energy between the subchannels. In both methods, we impose the restriction that the average error probability for all the subchannels is the same. In the optimum method, the energy is distributed equally between the nonempty subchannels and the rate is distributed to obtain equal average error probabilities. In a second method, the rate is distributed equally and the energy is distributed to obtain equal average error probabilities. The second method allows us to use the same modulator/demodulator for all the subchannels and thereby reduces the complexity. Numerical results are presented for the second method. The results over a space of moderate dimensionality show substantial performance improvement with a small increase in the complexity.

Y. Qian, Y. M. Cheng, and P. Kabal

"Backward Adaptation for Single-Pulse Excitation Coder", Proc. Int. Conf. Commun. Technol. (Beijing, China), pp. 26.03.1-26.03.4, Sept. 1992.

Backward-adaptive linear prediction has been successfully used in medium rate speech coders with high quality and low delay (less than 2 ms) at 16 kb/s. The prediction gain of a forward-adaptive formant predictor cascaded with a backward-adaptive formant predictor has been first studied. We have found that if the analysis frame length of the backward predictor is larger than the pitch period, the backward prediction gain can reach that of a non-linear predictor or a cascaded forward formant predictor. Results for several speech segments of male and female speakers, with different analysis window lengths, have been given and compared. The proposed cascaded adaptive filter configuration, the first forward-adaptive synthesizer followed by second backward-adaptive synthesizer, has been incorporated into a 3 kb/s Single-Pulse Excitation/Code-Excited Linear Prediction (SPE/CELP) coder to improve the speech quality while maintaining almost the same bit-rate. Experimental results for the proposed SPE/CELP coder with backward adaptation show that the improvement of the segment SNR for voiced speech segments of several testing sentences can reach to 1.02-2.06 dB.

A. Khandani and P. Kabal

"Using a Prefix Code for Addressing the Voronoi Constellations Based on Lattices DN and DN*", Proc. IEEE Int. Conf. Commun. (Chicago, IL), pp. 1431-1435, June 1992.

Signal constellations for representing data values for transmission benefit from shaping of the constellation boundary. In the Voronoi constellations, the Voronoi region of a lattice, denoted as the shaping lattice, is used as the boundary of the signal constellation. In this work, some properties of Voronoi constellations based on the shaping lattices DN and DN* are discussed. The induced probability distribution on the two-dimensional subspaces is found. A prefix coding scheme as an alternative for the addressing is presented. This code in some sense simulates the effect of the boundary by using the points of the subspaces with nonequal probability. An example of such a coding scheme is presented.

A. Khandani and P. Kabal

"Shaping of Multi-Dimensional Signal Constellations Using a Lookup Table", Proc. IEEE Int. Conf. Commun. (Chicago, IL), pp. 927-931, June 1992.

Shaping concerns to the selection of the boundary of a signal constellation to reduce its average energy. Addressing is the assignment of the data bits to the constellation points. A major concern of the shaping regions is their addressing complexity. In this work, we use a lookup table for addressing. The method is based on partitioning the two-dimensional subconstellations into shaping shells of equal size and increasing average energy. A lookup table is used to select a subset of the cartesian product of the partitions. This partitioning is compatible with a multidimensional trellis coded modulation (TCM) scheme. As part of the calculations, we have found a closed-form for the weight distribution of the half integer lattice ZN+(1/2)N, for dimensionally N=4,8.

A. De and P. Kabal

"Cochlear Discrimination: An Auditory Information-Theoretic Distortion Measure for Speech Coders", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 419-423, May 1992.

In this paper, our objective is to devise a fidelity criterion for quantifying the degree of distortion introduced by a speech coder. Towards this end, both original speech and its coded versions are transformed from the time-domain to a perceptual-domain using a cochlear model. This perceptual-domain representation provides information pertaining to the probability-of-firings in the neural channels. We introduce a cochlear discrimination measure which compares these firing probabilities in an information-theoretic sense. This measure, in essence, evaluates the neural-firing cross-entropy of the coded speech with respect to that of the original one. The performance of this objective measure is compared with subjective evaluation results.

S. Valaee and P. Kabal

"Selection of the Focusing Frequency in Wideband Array Processing - MUSIC and ESPRIT", Proc. Biennial Symp. Commun. (Kingston, Ont.), pp. 410-414, May 1992.

Wide-band array processing using Coherent Signal-subspace Method (CSM) is discussed. It is shown that an optimal focusing subspace exists that improves the performance of he estimation. An error based on the subspace fitting is introduced. This error criterion gives the closest focused signal subspaces. Direct maximization of the criterion is very involved and the computational complexity increases with the number of frequency samples. A sub-optimal method is introduced that operates very close to the optimal case. This method is based on deriving tight bounds on the error. The computational complexity of the sub-optimal method is independent of the number of frequency samples. The sub-optimal method approaches the optimal case as the number of frequency samples increases. It is shown that the bias of the estimation is reduced by proper selection of the focusing subspace.

A. K. Khandani and P. Kabal

"Signaling in Multi-Dimensional Signal Spaces", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 296-299, May 1992.

In selecting the boundary of a signal constellation used for data transmission, the objective is to minimize the average energy of the set for a given number of points from a given packing. Reduction in the average energy because of using a region C as the boundary instead of a hypercube is called the shape gain of C. The price to be paid for shaping is: (i) an increase in the factor CER (Constellation-Expansion-Ratio), (ii) an increase in the factor PAR (Peal-to-Average-power-Ratio), and (iii) and increase in the addressing complexity. The structure of a region which optimizes the tradeoff between the shape gain and the CER, and also between the shape gain and the PAR in a finite dimensional space is discussed. Examples of the optimum tradeoff curves are given. The optimum shaping region is mapped to a hypercube truncated with a simplex. This mapping has properties which facilitate the addressing of the signal points. We discuss two addressing schemes with low complexity and good performance. In spectral shaping, the rate of the constellation is maximized subject to some constraints on its power spectrum. This results in a shaping region which has different values of power along different dimensions (unsymmetrical shaping). This spectral shaping also involves the selection of an appropriate basis (modulating waveform) for the space. Finally, we discuss the selection of a signal constellation for signaling over a partial-response channel using both continuous approximation and discrete analysis. We also present a close form formula for the weight distribution of the scale D4 and E8 lattices.

A. K. Khandani and P. Kabal

"Optimized PAM Transmission over a Fading Multipath Channel", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 293-295, May 1992.

In a mobile communication system, the intersymbol interference (ISI) due to the multipath nature of the wave propagation has a serious effect on the performance. In this work, we study the structure of an optimized time-multiplexed Pulse Amplitude Modulation (PAM) system for the transmission over such a channel. The ISI is modeled by an additive Gaussian noise. To reduce the effect of the ISI, a number of zeros are transmitted between successive time multiplexed impulses. By applying Quadrature Amplitude Modulation (QAM), we obtain two dimensions with identical statistics from each baseband time impulse. A coherent M-PSK signal constellation is employed over this two-dimensional space. The duration of the time impulses and also the number of zeros transmitted between them is selected to minimize the probability of the error between the constellation points. As the transmission of zeros reduces the bandwidth efficiency, this optimization procedure is more useful for lower bit rates. The performance of this scheme is compared with a PAM system without zero transmissions. The numerical results show substantial performance improvement without any increase in complexity.

Y. Qian, J. Liu, C. Feng, and P. Kabal

"Speech Coding Using an Enhanced Sinusoidal Model at Low Bit-Rates", Proc. Biennial Symp. Commun. (Kingston, Ont.), pp. 29-32, May 1992.

An enhanced sinusoidal model, which employs the time-varying amplitudes of three components to track the fast dynamical variations during the transition speech segments, and exploits the redundancies between the near-neighborhood components to reduce the number of sinusoidal components to a maximum of 20 with high synthesized quality is presented. Many components can be determined by linear prediction of the dominant and fundamental components, thereby reducing the number of the parameters required to be transmitted and the corresponding bit rate. This approach improves the synthesized quality of the unvoiced and transition speech segments.

An optimal algorithm for extracting dominant frequencies by formants and pitches is compared with a DFT method. The effects on the synthesis quality of the number of the time-varying amplitudes and the different base functions are compared.

Two vector quantization codebooks with group classifications are developed to reduce the storage and computation load for a 4.8 kbits/s coder. Objective measurements give a cepstrum distance of 2.62 dB for several phonetically balanced sentences. Informal listening tests have shown that the proposed speech coder with an enhanced sinusoidal model can obtain good quality speech at 4.8 kbits/s.

K. Abboud and P. Kabal

"Wideband CELP Speech Coding at 12 kb/s", Proc. Biennial Symp. Commun. (Kingston, ON), pp. 25-28, May 1992.

This paper investigates the use of CELP (Code Excited Linear Prediction) as a coding scheme for wideband speech at an operating bit rate of 12 kbits/sec. With the help of different parameter coding techniques, the bit rate was lowered from 16 kbits/sec [2] to 12 kbits/sec while maintaining a similar speech quality. Three encoding schemes were used to improve the performance of the wideband CELP coder. The first approach used a combination of a three way split vector quantization and a new weighted distance measure for a set of line spectral frequencies (LSFs). The second approach used fractional pitch delays to improve the coder's performance for high pitched sounds. The third approach used perceptual noise weighting to improve coding in the high frequency region. The combination of all these three schemes resulted in a substantial increase in speech quality at a lower bit rate (12 kbits/sec).

S. Valaee and P. Kabal

"Detection of the Number of Signals Using Predictive Stochastic Complexity", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (San Francisco, CA), pp. V-345-V-348, March 1992.

In this paper, we propose a new algorithm for the processing of signals by an array of sensors. The objective is to find the number and the Directions Of Arrival (DOA) of signals impinging on a linear array. The Predictive Stochastic Complexity (PSC) criterion of Rissanen is used to select the best model order. To reduce the computational load, the algorithm operates with a suboptimal estimator while maintaining the consistency of the estimator. The proposed method is on-line and can be utilized in time-varying systems for target tracking. The method can be used for both correlated and uncorrelated signals.

B. Sylvestre and P. Kabal

"Time-Scale Modification of Speech Using an Incremental Time-Frequency Approach with Waveform Structure Compensation", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (San Francisco, CA), pp. I-81-I-84, March 1992.

This paper first tries to identify the primary sources of distortion in a non-recursive time-scale modification (TSM) algorithm which is based on the short-time Fourier transform (STFT) (Portnoff, [1]). A simpler version of this TSM algorithm is then proposed for processing speech, where incremental estimators eliminate the need for explicit linear time-scaling operations. Also featured in the design is a waveform structure compensation stage to prevent excessive deterioration of the rate-changed output. A polar (i.e., magnitude-phase) synthesis equation is used for increased efficiency. The new TSM method is capable of generating high-quality rate-changed speech at a reasonable computational cost.

A. K. Khandani and P. Kabal

"Lattice-Based Nonuniform Vector Quantization", Proc. Conf. Information Sciences, Systems (Princeton, NJ), pp. 677-682, March 1992.

We propose some practical methods for applying a lattice-based uniform vector quantizer to a nonuniform source. The first method, denoted as cluster quantization, is based on the the k-fold cartesian product of one-dimensional compander in conjunction with a lattice quantizer. This scheme has an asymptotic gain of 1.53 dB with respect to the optimum one-dimensional quantizer. The complexity is essentially the complexity of decoding of a lattice. The second method, denoted as quantizer shaping, is based on selecting an appropriate boundary for a lattice quantizer. By increasing the space dimensionality, this scheme becomes asymptotically optimum. As a practical shaping method, we use the Voronoi region around the origin of a lattice to shape the quantizer. By using binary lattices, we can construct quantizer with an integral bit rate. In an extension of this scheme, we a lattice partition chain L0/.../Lm/Lm+1 to provide a set of m+1 Voronoi constellations, C(Li/Li+1), i=0,...,m. A copy of the Voronoi region of Li is centered around each point of C(Li/Li+1). This results in higher resolution for the partitions around the origin. This is denoted as a nonuniform Voronoi quantizer. The group property of the Voronoi constellations is used to decrease the complexity of the operations involved in the quantization. These are the operations of shaping, encoding, addressing and reconstruction. The overall complexity is in the order of the linear mappings. By using binary lattices, we construct quantizers with a rate very close to an integer number. This reduces the redundancy associated with a binary indexing of the quantizer output.

S. Valaee and P. Kabal

"Selection of the Focusing Frequency in Wideband Array Processing", Proc. Conf. Information Sciences, Systems (Princeton, NJ), p. 431, March 1992.

Wideband array systems can be decomposed into several narrowband systems by sampling in the frequency domain. Focusing is the combination of these narrowbands by transforming them into a focusing subspace. Corresponding to each focusing subspace there is a focusing frequency. So far, there has been no optimal way for choosing the focusing frequency - usually it is chosen to be the mid-band frequency. In this work we propose a technique to choose the focusing frequency. Our method is based on minimizing the subspace fitting error. The simulation results show that using the selected frequency for focusing improves the performance of the estimation by decreasing the resolution threshold and reducing the bias.

A. K. Khandani and P. Kabal

"Spectral Shaping with Unequal Power Distribution", Proc. Conf. Information Sciences, Systems (Princeton, NJ), pp. 294-299, March 1992.

We are going to maximize the entropy of a line code subject to some constraints on the power spectrum. The general tools are the selection of the constellation basis (modulating waveforms) and the power allocated to each constellation dimension. In our analysis, the basis is fixed and is selected to reduce the computational complexity of the modulation. The following constraints on the power spectrum are considered in detail: (i) A fraction of the total power equal to Fp is located in the frequency band [0,wc], and/or (ii) the spectrum has spectral nulls at the zero and/or at the Nyquist frequency. To realize a power spectrum with spectral null(s), we need a set of dimensions with the same set of nulls. We discuss the general structure of such a basis. In specific, we give an analytical expression for the basis providing spectral nulls at zero frequency and/or at the Nyquist frequency. These are either sine bases or closely related to them. This property reduces the computational complexity of the modulation by allowing for the use of the fast sine transform algorithms. The energy allocation is computed by an optimization procedure. We also propose a method to match the spectrum of the line code to a partial response channel. This procedure maximizes the entropy of the code subject to having equal minimum distance to noise ratio along all the dimension at the channel output. The noise is the sum of the additive Gaussian noise and the intersymbol interference.


Paper titles.