Hybrid Phishing Detection Model: Integrating BERT with TF-IDF for Enhanced Email Security
Main Article Content
Abstract
Phishing emails remain a major cybersecurity problem because they cleverly exploit our natural trust by impersonating real messages. While standard NLP methods like TF-IDF and FastText are efficient, they often miss the subtle, contextual tricks found in today's sophisticated phishing attempts. On the other hand, advanced deep learning models like BERT are fantastic at understanding context, but they require a lot of computational power. In this paper, we suggest a hybrid solution. We merge the lightweight, statistical strengths of TF-IDF with the deep contextual power of BERT's embeddings to create a more robust phishing detection system. To test this, we ran experiments on datasets of 1,000, 5,000, and 10,000 emails, putting five different models head-to-head. Our results were clear: the hybrid models consistently beat the single-method ones. Interestingly, the TF-IDF + BERT combo was the most accurate on the smaller dataset (1,000 samples). However, for larger datasets (5,000 and 10,000 samples), TF-IDF + FastText offered the best balance of accuracy and speed. While the BERT hybrid was slightly more accurate, its slower processing time is a real hurdle for scaling up. We believe our proposed framework offers a practical and effective tool for real-world cybersecurity teams.
Manuscript received: 3 Jul 2025 | Revised: 25 Aug 2025 | Accepted: 7 Sep 2025 | Published: 30 Nov 2025
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
W. Syafitri, Z. Shukur, U. Asma’Mokhtar, R. Sulaiman and M. A. Ibrahim, "Social engineering attacks prevention: A systematic literature review," IEEE access, vol. 10, pp. 39325-39343, 2022.
DOI: https://doi.org/10.1109/ACCESS.2022.3162594
S. Gupta, A. Singhal and A. Kapoor, "A literature survey on social engineering attacks: Phishing attack," 2016 International Conference on Computing, Communication and Automation (ICCCA), pp. 537-540, 2016.
DOI: https://doi.org/10.1109/CCAA.2016.7813778
K. Chetioui, B. Bah, A.O. Alami and A. Bahnasse, "Overview of social engineering attacks on social networks," Procedia Computer Science, vol. 198, pp. 656-661, 2022.
DOI: https://doi.org/10.1016/j.procs.2021.12.302
R. Agarwal et al., "A novel approach for spam detection using natural language processing with AMALS models," IEEE Access, vol. 12, pp. 124298-124313, 2024.
DOI: https://doi.org/10.1109/ACCESS.2024.3391023
K.D. Tandale and S.N. Pawar, "Different types of phishing attacks and detection techniques: A review," 2020 International Conference on Smart Innovations in Design, Environment, Management, Planning and Computing (ICSIDEMPC), pp. 295-299, 2020.
DOI: https://doi.org/10.1109/ICSIDEMPC49020.2020.9299624
S. Salloum, T. Gaber, S. Vadera and K. Shaalan, "A systematic literature review on phishing email detection using natural language processing techniques," IEEE Access, vol. 10, pp. 65703-65727, 2022.
DOI: https://doi.org/10.1109/ACCESS.2022.3183083
P.H. Kyaw, J. Gutierrez and A. Ghobakhlou, "A systematic review of deep learning techniques for phishing email detection," Electronics, vol. 13, no. 19, p. 3823, 2024.
DOI: https://doi.org/10.3390/electronics13193823
N. Rifat, M. Ahsan, M. Chowdhury, and R. Gomes, "Bert against social engineering attack: Phishing text detection," 2022 IEEE International Conference on Electro Information Technology (eIT), pp. 1-6, 2022.
DOI: https://doi.org/10.1109/eIT53891.2022.9813922
K.S. Jishnu and B. Arthi, "Enhanced phishing URL detection using leveraging BERT with additional URL feature extraction," 2023 5th International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 1745-1750, 2023.
DOI: https://doi.org/10.1109/ICIRCA57980.2023.10220647
V. Sanh, L. Debut, J. Chaumond and T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter," arXiv preprint arXiv:1910.01108, 2019.
DOI: https://doi.org/10.48550/arXiv.1910.01108
X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang and Q. Liu, "Tinybert: Distilling bert for natural language understanding," arXiv, 2019.
DOI: https://doi.org/10.48550/arXiv.1909.10351
M. Safran and A. Musleh, "PhishingGNN: Phishing Email Detection Using Graph Attention Networks and Transformer-Based Feature Extraction," IEEE Access, no. 99, pp. 1-1, 2025. DOI: https://doi.org/10.1109/ACCESS.2025.3592135
L.R. Kalabarige, R.S. Rao, A. Abraham and L.A. Gabralla, "Multilayer stacked ensemble learning model to detect phishing websites," IEEE Access, vol. 10, pp. 79543-79552, 2022.
DOI: https://doi.org/10.1109/ACCESS.2022.3194672
B.A. Shajilal, "A Hybrid Approach for Detecting Phishing Mails Using Textual, Content, and URL Analysis with Ensemble Learning," National College of Ireland, 2024.