Outlier detection with partial information:Application to emergency mapping

Abstract

This paper, addresses the problem of novelty detection in the case that the observed data is a mixture of a known 'background' process contaminated with an unknown other process, which generates the outliers, or novel observations. The framework we describe here is quite general, employing univariate classification with incomplete information, based on knowledge of the distribution (the 'probability density function', 'pdf') of the data generated by the 'background' process. The relative proportion of this 'background' component (the 'prior' 'background' 'probability), the 'pdf' and the 'prior' probabilities of all other components are all assumed unknown. The main contribution is a new classification scheme that identifies the maximum proportion of observed data following the known 'background' distribution. The method exploits the Kolmogorov-Smirnov test to estimate the proportions, and afterwards data are Bayes optimally separated. Results, demonstrated with synthetic data, show that this approach can produce more reliable results than a standard novelty detection scheme. The classification algorithm is then applied to the problem of identifying outliers in the SIC2004 data set, in order to detect the radioactive release simulated in the 'oker' data set. We propose this method as a reliable means of novelty detection in the emergency situation which can also be used to identify outliers prior to the application of a more general automatic mapping algorithm. © Springer-Verlag 2007.

Publication DOI: https://doi.org/10.1007/s00477-007-0164-8
Divisions: ?? 50811700Jl ??
College of Engineering & Physical Sciences > Systems analytics research institute (SARI)
Additional Information: The original publication is available at www.springerlink.com
Uncontrolled Keywords: novelty detection,known background proces,contamination,unknown process,outliers,novel observations,knowledge of the distribution,probability density function,pdf,prior ‘background’ probability,prior probabilities,Kolmogorov–Smirnov test,proportions,afterwards data,Bayes optimally separated,classification algorithm,SIC2004 data set,detection,radioactive release,‘joker’ data set,emergency situation,automatic mapping algorithm,Environmental Chemistry,Water Science and Technology,Environmental Engineering,Statistics and Probability,Civil and Structural Engineering
Publication ISSN: 1436-3259
Last Modified: 02 Jan 2024 08:11
Date Deposited: 28 Oct 2010 11:37
Full Text Link: http://www.spri ... 261351u0n52616/
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2008-08
Published Online Date: 2007-06-30
Authors: D'Alimonte, Davide
Cornford, Dan (ORCID Profile 0000-0001-8787-6758)

Download

Export / Share Citation


Statistics

Additional statistics for this record