Leveraging ensemble clustering for privacy-preserving data fusion:Analysis of big social-media data in tourism

Abstract

Discovering knowledge from social media becomes a trend in many domains such as tourism, where users' feedback and rating are the basis of recommendation systems. In this context, cluster analysis has been a major tool to disclose user groups by which the process of collaborative filtering can better determine a personalised suggestion. Matching this to the curse of big data is a challenge with previous studies either implementing conventional techniques on a distributed system or making use of data sampling. Specific to ensemble clustering, only a few aim to obtain both scalability and privacy preserving that are significant to handling social data. This paper presents a new bi-level framework of ensemble clustering in which an instance-segment based analysis is adopted to ensure data privacy and reduce the complexity of clustering the whole dataset. Unlike existing studies, instead of drawing a single clustering from each segment, multiple clusterings are selected to better represent instances therein. Based on published tourism datasets and different experimental settings, the new approach usually outperforms its baselines whilst being competitive to related methods found in the literature. Additional case studies on simulated big datasets and noisy variations are reported and discussed in addition to the analysis of algorithmic parameters.

Publication DOI: https://doi.org/10.1016/j.ins.2024.121336
Divisions: College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Software Engineering & Cybersecurity
College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies
College of Engineering & Physical Sciences
Aston University (General)
Funding Information: This research work is based on the collaboration between Aberystwyth, Aston and Northumbria Universities. It is also partly supported by UK FCDO: Research and Innovation for Development in ASEAN (RIDA) programme 2023-24 (Grant no. RSA-03160), with the pro
Additional Information: © 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Uncontrolled Keywords: Social media,Big data,Tourism,Ensemble clustering,Data fusion,Privacy preserving
Publication ISSN: 1872-6291
Data Access Statement: Data will be made available on request.
Last Modified: 21 Oct 2024 08:11
Date Deposited: 23 Aug 2024 10:59
Full Text Link:
Related URLs: https://www.sci ... 2507?via%3Dihub (Publisher URL)
http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2025-01-01
Published Online Date: 2024-08-14
Accepted Date: 2024-08-09
Authors: Iam-On, Natthakan
Boongoen, Tossapon
Naik, Nitin (ORCID Profile 0000-0002-0659-9646)
Yang, Longzhi

Download

[img]

Version: Published Version

License: Creative Commons Attribution

| Preview

Export / Share Citation


Statistics

Additional statistics for this record