Unsupervised event exploration from social text streams


Social media provides unprecedented opportunities for people to disseminate information and share their opinions and views online. Extracting events from social media platforms such as Twitter could help in understanding what is being discussed. However, event extraction from social text streams poses huge challenges due to the noisy nature of social media posts and dynamic evolution of language. We propose a generic unsupervised framework for exploring events on Twitter which consists of four major steps, filtering, pre-processing, extraction and categorization, and post-processing. Tweets published in a certain time period are aggregated and noisy tweets which do not contain newsworthy events are filtered by the filtering step. The remaining tweets are pre-processed by temporal resolution, part-of-speech tagging and named entity recognition in order to identify the key elements of events. An unsupervised Bayesian model is proposed to automatically extract the structured representations of events in the form of quadruples < entity, keyword, date, location > and further categorize the extracted events into event types. Finally, the categorized events are assigned with the event type labels without human intervention. The proposed framework has been evaluated on over 60 million tweets which were collected for one month in December 2010. A precision of 78.01% is achieved for event extraction using our proposed Bayesian model, outperforming a competitive baseline by nearly 13.6%. Moreover, events are also clustered into coherence groups with the automatically assigned event type labels with an accuracy of 42.57%.

Publication DOI: https://doi.org/10.3233/IDA-160048
Divisions: College of Engineering & Physical Sciences > Systems analytics research institute (SARI)
?? 50811700Jl ??
Additional Information: Copyright: 2017 – IOS Press and the authors. The final publication is available at IOS Press through http://dx.doi.org/10.3233/IDA-160048 Funding: This work was funded by the National Natural Science Foundation of China (61528302), the Natural Science Foundation of Jiangsu Province of China (BK20161430), the Innovate UK under the grant number 101779 and the Collaborative Innovation Center of Wireless Communications Technology.
Uncontrolled Keywords: Bayesian model,event extraction,social media,unsupervised learning,Theoretical Computer Science,Computer Vision and Pattern Recognition,Artificial Intelligence
Publication ISSN: 1571-4128
Last Modified: 15 Jul 2024 07:29
Date Deposited: 16 Oct 2017 10:00
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2017-08-19
Accepted Date: 2017-08-19
Authors: Zhou, Deyu
Chen, Liangyu
Zhang, Xuan
He, Yulan (ORCID Profile 0000-0003-3948-5845)



Version: Accepted Version

| Preview

Export / Share Citation


Additional statistics for this record