Feature Extraction for Automatic Recognition of Telephone Speech

Abstract

A speech recognition system can be considered as containing a feature extractor and a word classifier. The research described in this thesis concerns a feature extraction system for transforming telephone speech into strings of symbols, suitable for use by a word classifier. The hypothesis upon which the feature extractor is based is that measurements of the relative changes in the speech data form a useful basis for the description of words. This was formed from present day knowledge of speech perception, production and analysis. Features produced by this system are a function of both time and the relative changes detected in an n-dimensional space, defined by an analysis stage of the process. The outputs of fourteen bandpass filters are used to specify a point in 14-dimensional space. For each 10 ms spectrum sample the angle and distance are measured from two reference points in the 14-dimensional space, to produce two waveforms for the complete word. These are segmented according to the sign of their slope. The total changes that occur between segment boundaries are used to signify particular feature symbols. Thus two feature strings are produced to describe each word. Results of tests with the experimental system confirmed that distinctive descriptions can be obtained for words spoken in isolation, over a simulated limiting telephone connection and with a single speaker. They also demonstrated that these features are consistent for a number of repetitions of a particular word. The advantages of this technique are that segmentation and classification at the signal level are simplified and do not require prior knowledge of the speech sound characteristics within the language being used. Also, as only relative measurements are used, any changes in the characteristics of the transmission medium should not affect the result.

Publication DOI: https://doi.org/10.48780/publications.aston.ac.uk.00008039
Divisions: College of Engineering & Physical Sciences
Additional Information: Copyright © Putman, A. J, 1980. Putman, A. J asserts their moral right to be identified as the author of this thesis. This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without appropriate permission or acknowledgement. If you have discovered material in Aston Publications Explorer which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately.
Institution: Aston University
Uncontrolled Keywords: feature extraction,automatic recognition,telephone speech
Last Modified: 17 Feb 2025 14:03
Date Deposited: 12 May 2010 14:11
Completed Date: 1980
Authors: Putman, Allan J.

Export / Share Citation


Statistics

Additional statistics for this record