Bari, Nimra, Saleem, Tahir, Shah, Munam, Algarni, Abdulmohsen, Patel, Asma and Ullah, Insaf (2025). A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning. Computer Modeling in Engineering and Sciences, 145 (1), pp. 1167-1187.
Abstract
Today, phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers, passwords, and usernames. We can find several anti-phishing solutions, such as heuristic detection, virtual similarity detection, black and white lists, and machine learning (ML). However, phishing attempts remain a problem, and establishing an effective anti-phishing strategy is a work in progress. Furthermore, while most anti-phishing solutions achieve the highest levels of accuracy on a given dataset, their methods suffer from an increased number of false positives. These methods are ineffective against zero-hour attacks. Phishing sites with a high False Positive Rate (FPR) are considered genuine because they can cause people to lose a lot of money by visiting them. Feature selection is critical when developing phishing detection strategies. Good feature selection helps improve accuracy; however, duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm. Therefore, a combination of filter-based feature selection methods is proposed to detect phishing attacks, including constant feature removal, duplicate feature removal, quasi-feature removal, correlated feature removal, mutual information extraction, and Analysis of Variance (ANOVA) testing. The technique has been tested with different Machine Learning classifiers: Random Forest, Artificial Neural Network (ANN), Ada-Boost, Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Trees, Gradient Boosting Classifiers, Support Vector Machine (SVM), and two types of ensemble models, stacking and majority voting to gain A low false positive rate is achieved. Stacked ensemble classifiers (gradient boosting, random forest, support vector machine) achieve 1.31% FPR and 98.17% accuracy on Dataset 1, 2.81% FPR and Dataset 3 shows 2.81% FPR and 97.61% accuracy, while Dataset 2 shows 3.47% FPR and 96.47% accuracy.
| Publication DOI: | https://doi.org/10.32604/cmes.2025.070311 |
|---|---|
| Divisions: | College of Business and Social Sciences College of Business and Social Sciences > Aston Business School > Cyber Security Innovation (CSI) Research Centre College of Business and Social Sciences > Aston Business School Aston University (General) |
| Funding Information: | This research was financially supported by the Deanship of Scientific Research and Graduate Studies at King Khalid University under research grant number (R.G.P.2/21/46) and in part by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia, under Grant KFU253116. |
| Additional Information: | Copyright © 2025 The Author(s). Published by Tech Science Press. This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
| Publication ISSN: | 1526-1506 |
| Data Access Statement: | The datasets used in this study are publicly available from the following sources:<br/><br/>• Dataset 1: Sourced from Mohammad et al. (2012) and available at the UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Phishing+Websites (accessed on 21 August 2025).<br/><br/>• Dataset 2: Sourced from Buber (2019). The data was collected from PhishTank and Open Phish, which are publicly accessible platforms. https://www.phishtank.com/ (accessed on 21 August 2025) and https://openphish.com/ (accessed on 21 August 2025).<br/><br/>• Dataset 3: Sourced from Hannousse (2021). This dataset is a benchmark for machine learning-based phishing detection and is available for research purposes. 10.1016/j.engappai.2021.104347 (accessed on 21 August 2025). |
| Last Modified: | 05 Mar 2026 18:54 |
| Date Deposited: | 10 Feb 2026 10:42 |
| Full Text Link: | |
| Related URLs: |
https://www.tec ... ES/v145n1/64339
(Publisher URL) |
PURE Output Type: | Article |
| Published Date: | 2025-10-30 |
| Accepted Date: | 2025-08-22 |
| Authors: |
Bari, Nimra
Saleem, Tahir Shah, Munam Algarni, Abdulmohsen Patel, Asma (
0000-0003-1636-5955)
Ullah, Insaf |
0000-0003-1636-5955