Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

Abstract

An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.

Publication DOI: https://doi.org/10.14569/IJACSA.2014.050725
Divisions: College of Engineering & Physical Sciences
Additional Information: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.
Publication ISSN: 2156-5570
Last Modified: 29 Oct 2024 14:38
Date Deposited: 23 Aug 2019 09:43
Full Text Link:
Related URLs: https://thesai. ... csa&SerialNo=25 (Publisher URL)
PURE Output Type: Article
Published Date: 2014-07-01
Authors: Mohamed, Muhidin
Oussalah, Mourad

Download

[img]

Version: Published Version

License: Creative Commons Attribution

| Preview

Export / Share Citation


Statistics

Additional statistics for this record