Multimedia Signal Processing Laboratory

P. Kabal

Paper Abstracts 2003


P. J. Smith, P. Kabal, M. Blostein, and R. Rabipour

"Tandem-Free VoIP Conferencing: A Bridge to Next Generation Networks", IEEE Commun. Magazine, pp. 136-145, May 2003.

This article surveys approaches to teleconferencing in voice over IP networks. The considerations for conferencing include perceived quality, scalability, control, and compatibility. Architectures used for conferencing range from centralized bridges to full mesh. Centralized conference bridges used with compressed speech degrade speech quality when multiple talkers are mixed and subjected to tandem coding operations. Full mesh and multicast solutions (mixing at the endpoints) are inappropriate when the number of conferees is large. This article discusses a hybrid solution that incorporates tandem-free bridging (the bridge selects and forwards packets) and endpoint mixing.

Conference papers

Y. Ould-Cheikh-Mouhamedou, P. Guinand, and P. Kabal

"Enhanced Max-Log-APP and Enhanced Log-APP Decoding for DVB-RCS",\ Proc, Int. Symp. Turbo Codes (Brest, France), pp. 259-262, Sept. 2003 (with corrections).

We present a new decoding technique for double-binary turbo codes, such as in the Digital Video Broadcasting for Return Channel via Satellite (DVBRCS) standard. The proposed techniques are referred to as the Enhanced Max-Log A Posteriori Probability (APP) Decoding and Enhanced Log-APP Decoding. Results for ATM packets show a degradation of 0.05 dB in BER/FER for enhanced max-log-APP compared to conventional log-APP. For MPEG packets the enhanced max-log-APP outperforms the log-APP at high SNRs. For both packet lengths, enhanced log-APP outperforms log-APP at high SNRs. Simulation results for an effective early stopping criterion are also presented.

Y. Qian and P. Kabal

"A Dual-Mode Wideband Speech Recovery from Narrowband Speech", Proc. European Conf. Speech Commun., Technol. (Geneva), pp. 1433-1436, Sept. 2003.

The present public telephone networks trim o. the lowband (50300 Hz) and the highband (34007000 Hz) components of sounds. As a result, telephone speech is characterized by thin and muffled sounds, and degraded speaker identification. The lowband components are deterministically recoverable, while the missing highband can be recovered statistically. We develop an equalizer to restore the lowband parts. The highband parts are filled in using a linear prediction approach. The highband excitation is generated using a bandpass envelope modulated Gaussian signal and the spectral envelope is generated using a Gaussian Mixture Model. The mean log-spectrum distortion decreases by 0.96 dB, comparing to a previous method using wideband reconstruction with a VQ codebook mapping algorithm. Informal subjective tests show that the reconstructed wideband speech enhances lowband sounds and regenerates realistic highband components.

A. M. Wyglinski, P. Kabal, and F. Labeau

"Adaptive Bit and Power Allocation for Indoor Wireless Multicarrier Systems", Proc. Int. Conf. Wireless Commun. (Calgary, AB), pp. 500-508, July 2003.

We propose an adaptive bit and power allocation algorithm for indoor wireless systems employing multicarrier modulation. The proposed scheme maximizes throughput via an incremental allocation algorithm while operating under a maximum mean bit error constraint. Unlike other bit allocation algorithms, which allocate a continuous distribution of bits followed by quantization, the proposed algorithm allocates a discrete distribution of bits. Moreover, the proposed algorithm employs a stricter subband power constraint to limit the interference to other users and satisfy government regulatory requirements, unlike other algorithms which only employ a total power constraint. Finally, the assumption of at subchannels is dropped, thus a subcarrier minimum mean-squared error equalizer is applied to the case of orthogonal frequency division multiplexing systems employing a cyclic prefix. The performance of the proposed system is evaluated in terms of throughput and bit allocation and compared with an IEEE 802.11a-compliant system. The results show that the proposed system outperforms the IEEE 802.11a-compliant system when transmitting at lower signal-to-noise ratios. Furthermore, the benefits of power allocation are noticeable at low signal-to-noise ratios.

P. J. Smith, P. Kabal, M. Blostein, and R. Rabipour

"Tandem-Free Operation for VoIP Conference Bridges", Proc. IEEE Int. Conf. Commun. (Anchorage, AK), pp. 794-798, May 2003.

Traditional telephone conferencing has been accomplished by way of a centralized conference bridge. The tandem arrangement of high compression speech codecs in conventional VoIP conference bridges lead to speech distortions and require a substantial number of computations. Decentralized architectures avoid the speech degradations and delay, but lack strong control and depend on silence suppression to make the endpoint bandwidth and processing requirements scalable. One solution is to use centralized speaker selection and forwarding, and decentralized decoding and mixing. This approach eliminates the problem of tandem encodings but maintains centralized control, thereby improving the speech quality and scalability of the conference. This paper considers design options and solutions for this model in the context of modern IP telephony networks. Performance was evaluated with real conferees over live conferences using a PC-based conferencing test bed, built using a custom software-based bridge and a third-party endpoint. Conferees strongly preferred the speech quality of the new arrangement to that of a conventional VoIP conference bridge.

A. Shallwani and P. Kabal

"An Adaptive Playout Algorithm with Delay Spike Detection for Real-Time VoIP", Proc. IEEE Canadian Conf. Electrical, Computer Engineering (Montreal, QC), pp. 997-1000, May 2003.

As the Internet is a best-effort delivery network, audio packets may be delayed or lost en route to the receiver due to network congestion. To compensate for the variation in network delay, audio applications buffer received packets before playing them out. Basic algorithms adjust the packet playout time during periods of silence such that all packets within a talkspurt are equally delayed. Another approach is to scale individual voice packets using dynamic time-scale modification. In this work, an adaptive playout algorithm based on the normalized least mean square algorithm, is improved by introducing a spike-detection mode to rapidly adjust to delay spikes. Simulations on Internet traces show that the enhanced bi-modal playout algorithm improves performance by reducing both the average delay and the loss rate as compared to the original algorithm.

R. Der, P. Kabal, and W.-Y. Chan

"Towards a New Perceptual Coding Paradigm for Audio Signals", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Hong Kong), pp. V-457-V-460, April 2003.

A new frequency domain approach to coding audio signals is introduced. The bit assignment strategy is aimed at reducing the perceived loudness difference between the original signal and the coded signal. As such it uses perceptual effects (spread excitation patterns), but does not directly invoke masking results. At low bit rates, examples coded with the new approach sound better than a more traditional bit allocation based on noise-to-mask ratio.

P. Kabal

"Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Hong Kong), I-824-I-827, April 2003.

This paper examines schemes that modify linear prediction (LP) analysis for speech signals. First, techniques which improve the conditioning of the LP equations are examined. White noise compensation for the correlations is justified from the point of view of reducing the range of values which the predictor coefficients take on. The efficacy of the procedure is measured over a large speech database. Various techniques for bandwidth expansion of the LP spectral peaks are also examined. These include lag windowing of the correlation, windowing of the predictor coefficients, and modification of the line spectral frequencies. New formulas for the bandwidth expansion factor are given.

Paper titles.