Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

Abstract

The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results.

Publication DOI: https://doi.org/10.1016/j.patcog.2019.01.042
Divisions: College of Engineering & Physical Sciences
College of Engineering & Physical Sciences > Aston Institute of Urban Technology and the Environment (ASTUTE)
College of Engineering & Physical Sciences > School of Engineering and Technology > Mechanical, Biomedical & Design
Additional Information: © 2019, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ Funding: This work is partially supported by the National Natural Science Foundation of China under grant nos. 71871003 and 61376042.
Uncontrolled Keywords: Categorical clustering,Data mining,Integer linear programming,Variable neighborhood search,Software,Signal Processing,Computer Vision and Pattern Recognition,Artificial Intelligence
Publication ISSN: 1873-5142
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
https://www.sci ... 0482?via%3Dihub (Publisher URL)
PURE Output Type: Article
Published Date: 2019-06
Published Online Date: 2019-01-29
Accepted Date: 2019-01-24
Authors: Xiao, Yiyong
Huang, Changhao
Huang, Jiaoying
Kaku, Ikou
Xu, Yuchun (ORCID Profile 0000-0001-6388-813X)

Export / Share Citation


Statistics

Additional statistics for this record