Enhancing Word Complexity Prediction Through Contextual Analysis

Uzzam, Muhammad and Htait, Amal (2024). Enhancing Word Complexity Prediction Through Contextual Analysis. IN: The Second UK AI Conference 2024. University of Birmingham, 2024-11-22 - 2024-11-22. (Unpublished)

Abstract

This paper presents a solution for predicting word complexity using contextual sentence information, a problem that traditional methods often struggle to address. It also introduces a user-friendly interface to dynamically assesses word complexity and provides explanations by considering both individual word features and their surrounding context. Three distinct approaches were explored in this work. The first approach applied a Bidirectional Long Short-Term Memory (Bi-LSTM) model, trained on linguistic and semantic features extracted from the text. The second method uses Bidirectional Encoder Representations from Transformers (BERT) with two separate models: one for sentence-level complexity and another for word-level complexity, with the predictions combined for more context-sensitive result. The third approach introduces a novel method that combines XLNet word embeddings with a Random Forest classifier to processes both sentence and word embeddings for predicting complexity levels. A diverse dataset covering the domains of religion, biomedical, and parliamentary texts was used in this word, as it is pre-categorised into five complexity levels (Very-easy, Easy, Medium, Hard, Very-hard). To ensure balanced class representation, data augmentation techniques were applied. Evaluation metrics revealed that the XLNet-based model (third method) outperformed others, achieving 80% accuracy (Macro-Average F1-measure = 0.78), particularly excelling at identifying highly complex words (F1-measure = 0.95). The BERT-based model closely followed, with an accuracy of 78% (Macro-Average F1-measure = 0.75), and the Bi-LSTM method achieved an accuracy of 63% (Macro-Average F1-measure = 0.63). The best-performing model (XLNet-based) is then selected as the engine behind a user-friendly interface created with Gradio, which can detect complex words in an input sentence and provide explanations. This work highlights the importance of utilising both word and sentence-level embeddings for effective complexity prediction. The developed models, along with the user-friendly interface, have significant potential applications in education by helping language learners in navigating challenging vocabulary.

Divisions:	College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Software Engineering & Cybersecurity College of Engineering & Physical Sciences > Aston Centre for Artifical Intelligence Research and Application College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies College of Business and Social Sciences > Aston Institute for Forensic Linguistics ?? 50811700Jl ??
Event Title:	The Second UK AI Conference 2024
Event Type:	Other
Event Location:	University of Birmingham
Event Dates:	2024-11-22 - 2024-11-22
Last Modified:	09 Apr 2026 07:05
Date Deposited:	03 Dec 2024 11:53
PURE Output Type:	Poster
Published Date:	2024-11-22
Authors:	Uzzam, Muhammad Htait, Amal ( 0000-0003-4647-9996)

Download

Version: Accepted Version

License: ["licenses_description_unspecified" not defined]

| Preview

Export / Share Citation

Explore Further

Statistics

Additional statistics for this record

Record administration