Mohamed, Muhidin and Oussalah, Mourad (2020). A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics. Language Resources and Evaluation, 54 , 457–485.
Abstract
In this paper, we propose a hybrid approach for sentence paraphrase identification. The proposal addresses the problem of evaluating sentence-to-sentence semantic similarity when the sentences contain a set of named-entities. The essence of the proposal is to distinguish the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from Wikipedia entity co-occurrences and underpinned by Normalized Google Distance. In addition, the WordNet similarity measure is enriched with word part-of-speech (PoS) conversion aided with a Categorial Variation database (CatVar), which enhances the lexico-semantics of words. We validated our hybrid approach using two different datasets; Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. In our empirical evaluation, we showed that our system outperforms baselines and most of the related state-of-the-art systems for paraphrase detection. We also conducted a misidentification analysis to disclose the primary sources of our system errors.
Publication DOI: | https://doi.org/10.1007/s10579-019-09466-4 |
---|---|
Divisions: | Aston University (General) |
Additional Information: | © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
Uncontrolled Keywords: | Named-entity semantic relatedness,Paraphrase identification,Wikipedia,Word category subsumption,WordNet,Language and Linguistics,Education,Linguistics and Language,Library and Information Sciences |
Publication ISSN: | 1572-8412 |
Last Modified: | 16 Dec 2024 08:25 |
Date Deposited: | 07 May 2019 09:47 |
Full Text Link: | |
Related URLs: |
http://www.scop ... tnerID=8YFLogxK
(Scopus URL) |
PURE Output Type: | Article |
Published Date: | 2020-06 |
Published Online Date: | 2019-04-16 |
Accepted Date: | 2019-04-01 |
Authors: |
Mohamed, Muhidin
Oussalah, Mourad |