D. O'Shaughnessy, P. Kabal, D. Bernardi, L. Barbeau, C. C. Chu, and J.-L. Moncet
"Applying Speech Enhancement to Audio Surveillance", Proc. IEEE Int. Carnahan Conf. Security Technol.: Crime Countermeasures (Lexington, KY), pp. 69-72, Oct. 1988.
Audio surveillance tapes are prime candidates for speech enhancement, due to the many degradations and sources of interference that mask the speech signals on such tapes. In this paper we describe ways to cancel interference where an available reference signal is not synchronized with the surveillance recording, viz., the reference is obtained later from a phonograph record on air check recording from a broadcast source. As a specific example, we discuss our experiences processing a wiretap recording used in an actual court case. We transformed the reference signal to reflect room and transmission effects, and then subtracted the resulting secondary signal from the primary intercept signal, thus enhancing speech from the desired talkers by removing interfering sounds. Prior to subtraction, the signal had to be aligned properly in time. The intercept signal was subject to time-scale modifications due to variable phonograph and tape recorder speeds. While these speed differences are usually small enough not to affect perceived quality, they adversely affect the ability to cancel interference automatically. Concerning recording devices, we took into account four factors that affect signal quality: frequency response, nonlinear distortion, noise, and speed variations. The two methods that were most successful for enhancement were the LMS adaptive cancellation and spectral subtraction.
R. Tansony and P. Kabal
"A Variable Rate Adaptive Transform Coder for the Digital Storage of Audio Signals", Proc. IEEE Int. Conf. Commun. (Philadelphia, PA), pp. 42.1.1-42.1.6, June 1988.
A transform-coding algorithm, designed for the mass storage of variable-quality audio signals is presented. The storage of both speech and music is considered. Automatic silence deletion and signal order, energy, and bandwidth estimation are employed to provide a continuously variable bit rate, adaptively matched to the characteristics of the input signal. Test results show that the algorithm offers storage savings of more than 75% over linear-PCM (pulse-code-modulation) coders, and more than 63% over equivalent quality log-PCM coders. Results also show improved performance over the CCITT standard wideband coder. The complexity of the algorithm is such that it can be implemented in real time on existing digital signal processors.
J. F. Michaelides and P. Kabal
"Nonlinear Adaptive Filtering for Echo Cancellation", Proc. IEEE Int. Conf. Commun. (Philadelphia, PA), pp. 30.3.1-30.3.6, June 1988.
This paper examines the problem of nonlinear adaptive filtering for echo cancellation. The high-speed requirements of digital subscriber loops and voiceband data modems place constraints on the design of adaptive echo cancellers due to the presence of nonlinearities. This paper considers table look-up structures and nonlinear filters based on the Volterra series for general nonlinearities, and nonlinear compensators for specific practical configurations. For the first category, means to speed up the initial convergence of an adaptive table look-up structure are suggested. The new configurations involve two table-driven structures, one for cancellation and one to form the reference signal. It is shown how the second structure can also be used for decision-feedback equalization. A combined linear and nonlinear structure with improved convergence behaviour is also proposed. New theoretical convergence rate results are presented for such structures. In the second category, nonlinear compensators are cascaded with linear filters to combat nonlinearities for specific channel models.
"High Quality 16 kb/s Speech Coding for Network Applications", Proc. Speech Tech'88 (New York, NY), pp. 328-331, April 1988.
The integration of speech coders into a common carrier network raises important issues for coder design. These include speech quality, coding delay, coder complexity, and robustness to speaker variations and channel errors. This paper discusses new directions in speech coding which address these constraints to allow for toll quality coding of speech at rates down to 16 kb/s. Such a coder uses backward adaptation to limit coder delay and uses an embedded stochastically populated code tree to achieve high quality. These concepts point the way to a reduction by a factor of two in bit rate for the present 32 kb/s coding standard.
P. Kabal and R. P. Ramachandran
"Joint Solutions for Formant and Pitch Predictors in Speech Processing", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (New York, NY), pp. 315-318, April 1988.
The statistical correlation between speech samples motivates the use of prediction error filters that can remove this redundancy. Format filters act to remove the near-sample redundancies while pitch filters remove the far-sample redundancies for samples separated by the pitch period. The use of these filters is especially beneficial in low bit rate speech coding. The coefficients of these predictors are usually determined sequentially. This paper discusses jointly optimized solutions for the two predictors. The filters are implemented in transversal form with the formant filter always preceding the pitch (F-P cascade). The jointly optimized methods are further subdivided into a combined solution and an iterative sequential solution. Both yield higher prediction gains than their conventional sequentially determined counterpart. A practical version of the iterated sequential approach, in which the filters are constrained to be minimum phase generates decoded speech of high perceptual quality when applied in a coding environment.
V. Iyengar and P. Kabal
"A Low Delay 16 kbit/sec Speech Coder", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (New York, NY), pp. 243-246, April 1988.
This paper studies a speech coder using a tree code generated by a stochastic innovations tree and backward adaptive synthesis filters. The synthesis configuration uses a cascade of two all-pole filters - a pitch (long time delay) filter followed by a formant (short time delay) filter. Both filters are updated using backward adaptation. The formant predictor is updated using an adaptive lattice algorithm. The multipath (M,L) search algorithm is used to encode the speech. A frequency weighted error measure is used to reduce the perceptual loudness of the quantization noise. The speech coder has low delay (1 ms) and has been evaluated through formal subjective testing to have speech quality that is equivalent to that for 7-bit log-PCM.
P. Kabal, J.-L. Moncet, and C. C. Chu
"Synthesis Filter Optimization and Coding: Applications to CELP", Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (New York, NY), pp. 147-150, April 1988.
The success of Code-Excited Linear Prediction (CELP) for coding speech signals depends on the accurate representation of the pitch structure and the formant structure of the input speech. In this type of coder, an excitation waveform chosen from a dictionary of waveforms drives a cascade of a pitch and a formant filter. This paper develops the methodology to allow for a joint optimization of the waveform selection process, waveform scaling, and the pitch filter determination. Methods to accommodate high pitch speakers (pitch lag smaller than the analysis frame size) are given. Additionally, the requirements for coding the synthesis parameters into a bit stream at 4.8 kb/s are discussed.