Understanding U.S. regional linguistic variation with Twitter data analysis

Abstract

We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.

Publication DOI: https://doi.org/10.1016/j.compenvurbsys.2015.12.003
Divisions: ?? 53981500Jl ??
College of Business and Social Sciences > Aston Institute for Forensic Linguistics
College of Business and Social Sciences > School of Social Sciences & Humanities > Centre for Language Research at Aston (CLaRA)
Additional Information: © 2015, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/
Uncontrolled Keywords: American dialects,linguistic,regionalization,social media,spatial data mining,Twitter,US regions,Ecological Modelling,Environmental Science(all),Geography, Planning and Development,Urban Studies
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2016-09
Published Online Date: 2015-12-31
Accepted Date: 2015-12-15
Submitted Date: 2015-09-28
Authors: Yuan, Huang
Guo, Diansheng
Kasakoff, Alice
Grieve, Jack (ORCID Profile 0000-0003-3630-7349)

Export / Share Citation


Statistics

Additional statistics for this record