A. H. Khan and P. Kabal
"Tree Encoding for the ITU-T G.711.1 Speech Coder", Proc. Interspeech (Florence, Italy), pp. 2553-2556, Aug. 2011.
This paper examines enhancement to ITU-T Recommendation G.711.1 PCM wideband extension speech coder. To further improve the core lower-band coding performance the use of vector quantization and delayed decision coding is studied. A particular case of delayed decision coding, tree encoding, is implemented in the above standard. The bitstream is compatible with both the legacy G.711 and the G.711.1 decoder. PESQ (ITU-T P.862, Perceptual Evaluation of Speech Quality) is used to evaluate the performance. Both the vector quantizer and tree encoder have better performance than the original core layer encoder.
Q. Gong and P. Kabal
"Improved Quality for Conversational VoIP using Path Diversity", Proc. Interspeech (Florence, Italy), pp. 2549-2552, Aug. 2011.
In Voice-over-IP, the quality of interactive conversation is important to users. Quality-based playout buffering seeks an optimum balance between delay and loss. However, such a scheme still suffers when packet losses are bursty. Path diversity can alleviate the effect of losses and improve perceived quality by providing redundancy. In this paper, a new scheme is proposed which evaluates the performance of both paths. We consider three different path diversity schemes. The playout scheduling algorithms are designed based on conversational quality including both calling quality and interactivity. The simulation results show the efficacy of our algorithms in correcting for losses (isolated and burst) and improving perceived conversational quality.
A. Nour-Eldin and P. Kabal
"Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech", Proc. Interspeech (Florence, Italy), pp. 1185-1188, Aug. 2011.
In this paper, we extend our previous work on exploiting speech temporal properties to improve Bandwidth Extension (BWE) of narrowband speech using Gaussian Mixture Models (GMMs). By quantifying temporal properties through information theoretic measures and using delta features, we have shown that narrowband memory significantly increases certainty about highband parameters. However, as delta features are non-invertible, they can not be directly used to reconstruct highband frequency content. In the work presented herein, we embed temporal properties indirectly into the GMM structure through a memory-dependent tree-based approach to extend representation of the narrow band. In particular, sequences of past frames are progressively used to grow the GMM in a tree-like fashion. This growth approach results in reliable estimates for the GMM parameters such that Maximum Likelihood estimation is no longer necessary, thus circumventing the complexity accompanying high-dimensionality GMM training.
"Correlation Properties of Quantization Noise", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Prague, Czech Republic), pp. 5244-5247, May 2011.
This paper examines the correlation properties of quantization noise. The quantization noise energy is subtractive if the quantizer output levels are optimized for the probability density of the input signal (pdf optimized). This paper gives a new result that shows that a quantizer (uniform or not) which has quantizer break points midway between output levels (a minimum distance quantizer) and is scaled to minimize the mean-square error, also has this property. Examples are shown that show the correlation properties which determine whether the quantization noise energy is subtractive or additive. This paper also considers a postfilter configuration that compensates for the quantization noise. The postfilter frequency domain gains take the correlation properties of the quantization noise into account. An experiment on reducing the effect of quantization noise in speech gives an indication that taking account of the correlation is useful.
M. Konaté and P. Kabal
"Quantization Noise Estimation for Log-PCM", Proc. Canadian Conf. Elect., Computer Eng. (Niagara Falls, ON), pp. 1337-1341 , May 2011.
ITU-T G.711.1 is a multirate wideband extension for the well-known ITU-T G.711 pulse code modulation of voice frequencies. The extended system is fully interoperable with the legacy narrowband one. In the case where the legacy G.711 is used to code a speech signal and G.711.1 is used to decode it, quantization noise may be audible. For this situation, the standard proposes an optional postfilter. The application of postfiltering requires an estimation of the quantization noise. In this paper we review the process of estimating this coding noise and we propose a better noise estimator.