A Time Series Dimensionality Reduction Method for Maximum Deviation Reduction and Similarity Search

Abstract

Similarity search over time series is essential in many applications. However, it may cause a “dimensionality curse” due to the high dimensionality of time series. Various dimensionality reduction methods have been developed. Some of them sacrifice max deviation to get a faster dimensionality reduction, such as equal-length segment dimensionality reduction methods: Piecewise Linear Approximation (PLA), Piecewise Aggregate Approximation (PAA), Chebyshev Polynomials (CHEBY), Piecewise Aggregate Approximation Lagrangian Multipliers (PAALM) and Symbolic Aggregate Approximation (SAX). Some sacrifice dimensionality reduction time for the smallest max deviation, such as adaptive-length segment dimensionality reduction methods: Adaptive Piecewise Linear Approximation (APLA) and Adaptive Piecewise Constant Approximation (APCA). APLA uses a guaranteed upper bound for the best max deviation with slow dimensionality reduction time. We investigate the existing basic dimensionality reduction techniques for time series data. We point out the limitations of the existing dimensionality reduction techniques and evaluate the dimensionality reduction techniques on real-life datasets. We also conduct preliminary research and make some improvements to the existing dimensionality reduction techniques. Our experimental results on several datasets compare and verify the efficiency and effectiveness of the existing techniques, including several dimensionality reduction techniques, one index-building method, and two k-Nearest Neighbours (k-NN) search methods. We propose an adaptive-length segment dimensionality reduction method called Self Adaptive Piecewise Linear Approximation (SAPLA). It consists of 1) initialization, 2) split & merge iteration, and 3) segment endpoint movement iteration. Increment area, reconstruction area, conditional upper bound and several equations are applied to prune redundant computations. The experiments show this method can speed up about n (time series length) times than APLA when reducing the dimensionality of the original time series with minor max deviation loss. We propose a lower bound distance measure between two time series to guarantee no false dismissals and tightness for adaptive-length segment dimensionality reduction methods. When mapping time series into a tree index, we propose Distance Based Covering with Convex Hull Tree (DBCH-tree) and split node and pick branch by using the proposed lower bounding distance, not waste area. Our experiments show that DBCH-tree helps improve k-NN search over time series using dimensionality reduction methods.

Additional Information: Copyright © Ruidong Xue, 2021. Ruidong Xue asserts his moral right to be identified as the author of this thesis. This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without appropriate permission or acknowledgement. If you have discovered material in Aston Publications Explorer which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately.
Institution: Aston University
Last Modified: 08 Dec 2023 09:00
Date Deposited: 14 Feb 2023 18:55
Completed Date: 2021-12
Authors: Xue, Ruidong

Export / Share Citation


Statistics

Additional statistics for this record