Quality Control in Remote Speech Data Collection

Poorjam, Amir Hossein, Little, Max A., Jensen, Jesper Rindom and Christensen, Mads Graesboll (2019). Quality Control in Remote Speech Data Collection. IEEE Journal on Selected Topics in Signal Processing, 13 (2), pp. 236-243.

Abstract

There is a need for algorithms that can automatically control the quality of the remotely collected speech databases by detecting potential outliers, which deserve further investigation. In this paper, a simple and effective approach for identification of outliers in a speech database is proposed. Using the deterministic minimum covariance determinant (DetMCD) algorithm to estimate the mean and covariance of the speech data in the mel-frequency cepstral domain, this approach identifies potential outliers based on the statistical distance of the observations in the feature space from the central location of the data that are larger than a predefined threshold. DetMCD is a computationally efficient algorithm, which provides a highly robust estimate of the mean and covariance of multivariate data even when 50% of the data are outliers. Experimental results using eight different speech databases with manually inserted outliers show the effectiveness of the proposed method for outlier detection in speech databases. Moreover, applying the proposed method to a remotely collected Parkinson's voice database shows that the outliers that are part of the database are detected with 97.4% accuracy, resulting in a significant decrease in the effort required for manually controlling the quality of the database.

Publication DOI: https://doi.org/10.1109/JSTSP.2019.2904212
Divisions: Engineering & Applied Sciences > Mathematics
Engineering & Applied Sciences > Systems analytics research institute (SARI)
Additional Information: © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords: Outlier detection,quality control,remote data collection,robust estimation,speech database,Signal Processing,Electrical and Electronic Engineering
Full Text Link:
Related URLs: https://ieeexpl ... cument/8664108/ (Publisher URL)
http://www.scop ... tnerID=8YFLogxK (Scopus URL)
Published Online Date: 2019-03-11
Published Date: 2019-05-01
Authors: Poorjam, Amir Hossein
Little, Max A. ( 0000-0002-1507-3822)
Jensen, Jesper Rindom
Christensen, Mads Graesboll

Download

[img]

Version: Accepted Version

| Preview

Export / Share Citation


Statistics

Additional statistics for this record