Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

Xiao, Yiyong, Huang, Changhao, Huang, Jiaoying, Kaku, Ikou and Xu, Yuchun (2019). Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering. Pattern Recognition, 90 , pp. 183-195.

Abstract

The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results.

Publication DOI: https://doi.org/10.1016/j.patcog.2019.01.042
Divisions: Engineering & Applied Sciences
Additional Information: © 2019, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ Funding: This work is partially supported by the National Natural Science Foundation of China under grant nos. 71871003 and 61376042.
Uncontrolled Keywords: Categorical clustering,Data mining,Integer linear programming,Variable neighborhood search,Software,Signal Processing,Computer Vision and Pattern Recognition,Artificial Intelligence
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
https://www.sci ... 0482?via%3Dihub (Publisher URL)
Published Online Date: 2019-01-29
Authors: Xiao, Yiyong
Huang, Changhao
Huang, Jiaoying
Kaku, Ikou
Xu, Yuchun ( 0000-0001-6388-813X)

Download

[img]

Version: Accepted Version

Access Restriction: Restricted to Repository staff only until 29 January 2020.

License: Creative Commons Attribution Non-commercial No Derivatives


Export / Share Citation


Statistics

Additional statistics for this record