HybridEval: An Improved Novel Hybrid Metric for Evaluation of Text Summarization

Raheem Sarwar; Bilal Ahmad; Pin Shen Teh; Suppawong Tuarob; Tipajin  Thaipisutikul; Farooq Zaman; Naif R. Aljohani; Jia  Zhu; Saeed-Ul Hassan; Raheel Nawaz; Ali R  Ansari; Muhammad A B Fayyaz

doi:10.33093/jiwe.2024.3.3.15

PDF

Published: Oct 14, 2024

DOI: https://doi.org/10.33093/jiwe.2024.3.3.15

Keywords:

Evaluation, Text Summarization, BLEU, ROUGE, Natural Language Generation

Raheem Sarwar

Manchester Metropolitan University, United Kingdom

https://orcid.org/0000-0002-0640-807X

Bilal Ahmad

Information Technology University, Pakistan

Pin Shen Teh

Manchester Metropolitan University, United Kingdom

Suppawong Tuarob

Mahidol University, Thailand

Tipajin Thaipisutikul

Mahidol University, Thailand

Farooq Zaman

Information Technology University, Pakistan

Naif R. Aljohani

King Abdulaziz University, Saudi Arabia

Jia Zhu

Zhejiang Normal University, China

Saeed-Ul Hassan

Manchester Metropolitan University, United Kingdom

Raheel Nawaz

Staffordshire University, United Kingdom

Ali R Ansari

Gulf University for Science and Technology, Kuwait

Muhammad A B Fayyaz

Manchester Metropolitan University, United Kingdom

Abstract

The present work re-evaluates the evaluation method for text summarization tasks. Two state-of-the-art assessment measures e.g., Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Bilingual Evaluation Understudy (BLEU) are discussed along with their limitations before presenting a novel evaluation metric. The evaluation scores are significantly different because of the length and vocabulary of the sentences, this suggests that the primary restriction is its inability to preserve the semantics and meaning of the sentences and consistent weight distribution over the whole sentence. To address this, the present work organizes the phrases into six different groups and to evaluate “text summarization” problems, a new hybrid approach (HybridEval) is proposed. Our approach uses a weighted sum of cosine scores from InferSent’s SentEval algorithms combined with original scores, achieving high accuracy. HybridEval outperforms existing state-of-the-art models by 10-15% in evaluation scores.

How to Cite

Sarwar, R., Ahmad, B., Teh, P. S., Tuarob, S., Thaipisutikul, T., Zaman, F., Aljohani, N. R., Zhu, J., Hassan, S.-U., Nawaz, R., Ansari, A. R., & Fayyaz, M. A. B. (2024). HybridEval: An Improved Novel Hybrid Metric for Evaluation of Text Summarization. Journal of Informatics and Web Engineering, 3(3), 233–255. https://doi.org/10.33093/jiwe.2024.3.3.15

Issue

Vol. 3 No. 3 (2024): October 2024

Section

Thematic (Pervasive Computing)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to

Share — copy and redistribute the material in any medium or format under the following conditions:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
NonCommercial — You may not use the material for commercial purposes;
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

References

J. Rodriguez-Vidal, J. Carrillo-De-Albornoz, E. Amigo, L. Plaza, J. Gonzalo, and F. Verdejo, “Automatic generation of entity-oriented summaries for reputation management,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, no. 4, pp. 1577–1591, 2019, doi: 10.1007/s12652-019-01255-9.

R. C. Belwal, S. Rai, and A. Gupta, “A new graph-based extractive text summarization using keywords or topic modeling,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 10, pp. 8975–8990, 2020, doi: 10.1007/s12652-020-02591-x.

T. Vetriselvi and N. P. Gopalan, “RETRACTED ARTICLE: An improved key term weightage algorithm for text summarization using local context information and fuzzy graph sentence score,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 5, pp. 4609–4618, 2020, doi: 10.1007/s12652-020-01856-9.

F. Zaman, M. Shardlow, S.-U. Hassan, N. R. Aljohani, and R. Nawaz, “HTSS: A novel hybrid text summarisation and simplification architecture,” Information Processing & Management, vol. 57, no. 6, p. 102351, 2020, doi: 10.1016/j.ipm.2020.102351.

E. Akgul, Y. Delice, E. K. Aydogan, and F. E. Boran, “An application of fuzzy linguistic summarization and fuzzy association rule mining to Kansei Engineering: a case study on cradle design,” Journal of Ambient Intelligence and Humanized Computing, vol. 13, no. 5, pp. 2533–2563, 2021, doi: 10.1007/s12652-021-03292-9.

Z. Fang, J. Wang, X. Hu, L. Wang, Y. Yang, and Z. Liu, “Compressing Visual-linguistic Model via Knowledge Distillation,” arXiv (Cornell University), 2021, doi: 10.48550/arxiv.2104.02096.

Z. Li et al., “Text Compression-aided Transformer Encoding,” IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1, 2021, doi: 10.1109/tpami.2021.3058341.

M. Gupta and P. Agrawal, “Compression of Deep Learning Models for Text: A Survey,” arXiv (Cornell University), 2020, doi: 10.48550/arxiv.2008.05221.

W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Systems With Applications, vol. 165, p. 113679, 2020, doi: 10.1016/j.eswa.2020.113679.

T. Vetriselvi and N. P. Gopalan, “RETRACTED ARTICLE: An improved key term weightage algorithm for text summarization using local context information and fuzzy graph sentence score,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 5, pp. 4609–4618, 2020, doi: 10.1007/s12652-020-01856-9.

R. Dar and A. D. Dileep, “Small, narrow, and parallel recurrent neural networks for sentence representation in extractive text summarization,” Journal of Ambient Intelligence and Humanized Computing, vol. 13, no. 9, pp. 1-7, 2021, doi: 10.1007/s12652-021-03583-1.

J. Sheela and B. Janet, “RETRACTED ARTICLE: An abstractive summary generation system for customer reviews and news article using deep learning,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 7, pp. 7363–7373, 2020, doi: 10.1007/s12652-020-02412-1.

A. Ghadimi and H. Beigy, “Hybrid multi-document summarization using pre-trained language models,” Expert Systems With Applications, vol. 192, p. 116292, 2021, doi: 10.1016/j.eswa.2021.116292.

S. Gupta and S. K. Gupta, “Abstractive summarization: An overview of the state of the art,” Expert Systems With Applications, vol. 121, pp. 49–65, 2018, doi: 10.1016/j.eswa.2018.12.011.

N. Moratanch and S. Chitrakala, "A survey on extractive text summarization," 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), India, 2017, pp. 1-6, doi: 10.1109/ICCCSP.2017.7944061.

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation” in Acm Digital Library, 2001. doi: 10.3115/1073083.1073135.

A. Lavie and A. Agarwal, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,",” StatMT ’07: Proceedings of the Second Workshop on Statistical Machine Translation, 2007, doi: 10.3115/1626355.1626389.

I. Mani and M. T. Maybury, Advances in Automatic Text Summarization. MIT Press, 1999.

J.-M. Torres-Moreno, Automatic Text Summarization. John Wiley & Sons, 2014.

J.-M. Torres-Moreno, H. Saggion, I. Da Cunha, E. SanJuan, and P. Velazquez-Morales, “Summary Evaluation with and without References,” Polibits, vol. 42, pp. 13–19, 2010, doi: 10.17562/pb-42-2.

A. Nenkova and K. McKeown, “A Survey of Text Summarization Techniques,” in Springer eBooks, 2012, pp. 43–76. doi: 10.1007/978-1-4614-3223-4_3.

A. Louis and A. Nenkova, “Automatically Assessing Machine Summary Content Without a Gold Standard,” Computational Linguistics, vol. 39, no. 2, pp. 267–300, 2013, doi: 10.1162/coli_a_00123.

N. Moratanch and S. Chitrakala, “A survey on extractive text summarization,” International Conference on Computer, Communication and Signal Processing (ICCCSP), 2017, doi: 10.1109/icccsp.2017.7944061.

T. Vodolazova and E. Lloret, “The Impact of Rule-Based Text Generation on the Quality of Abstractive Summaries,” Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), 2019, doi: 10.26615/978-954-452-056-4_146.

A. Nenkova, R. Passonneau, and K. McKeown, “The Pyramid Method,” ACM Transactions on Speech and Language Processing, vol. 4, no. 2, p. 4, 2007, doi: 10.1145/1233912.1233913.

E. Lloret, L. Plaza, and A. Aker, “The challenging task of summary evaluation: an overview,” Language Resources and Evaluation, vol. 52, no. 1, pp. 101–148, 2017, doi: 10.1007/s10579-017-9399-2.

D. Yadav et al., “Qualitative Analysis of Text Summarization Techniques and Its Applications in Health Domain,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–14, 2022, doi: 10.1155/2022/3411881.

C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” Meeting of the Association for Computational Linguistics, pp. 74–81, 2004, [Online]. Available: http://anthology.aclweb.org/W/W04/W04-1013.pdf

T.-A. Nguyen-Hoang, K. Nguyen, and Q.-V. Tran, “TSGVi: a graph-based summarization system for Vietnamese documents,” Journal of Ambient Intelligence and Humanized Computing, vol. 3, no. 4, pp. 305–313, 2012, doi: 10.1007/s12652-012-0143-x.

A. Nenkova and L. Vanderwende, “The Impact of Frequency on Summarization,” 2005, [Online]. Available: https://www.cs.bgu.ac.il/~elhadad/nlp09/sumbasic.pdf

S. Nisioi, S. Stajner, S. P. Ponzetto, and L. P. Dinu, “Exploring Neural Text Simplification Models,” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017, doi: 10.18653/v1/p17-2014.

C. Li, W. Xu, S. Li, and S. Gao, “Guiding Generation for Abstractive Text Summarization Based on Key Information Guide Network,” Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, doi: 10.18653/v1/n18-2009.

T. Falke, L. F. R. Ribeiro, P. A. Utama, I. Dagan, and I. Gurevych, “Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, doi: 10.18653/v1/p19-1213.

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data,” arXiv (Cornell University), 2017, doi: 10.48550/arxiv.1705.02364.

J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, doi: 10.3115/v1/d14-1162.

K. Q. Yip, P. Y. Goh, and L. Y. Chong, “Social Messaging Application with Translation and Speech-to-Text Transformation,” Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 169–187, 2024, doi: 10.33093/jiwe.2023.3.2.13.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)