Intelligent Abstractive Summarization of Scholarly Publications with Transfer Learning
Main Article Content
Abstract
Intelligent abstractive text summarization of scholarly publications refers to machine-generated summaries that capture the essential ideas of an article while maintaining semantic coherence and grammatical accuracy. As information continues to grow at an overwhelming rate, text summarization has emerged as a critical area of research. In the past, summarization of scientific publications predominantly relied on extractive methods. These approaches involve selecting key sentences or phrases directly from the original document to create a summary or generate a suitable title. Although extractive methods preserve the original wording, they often lack the ability to produce a coherent, concise, and fluent summary, especially when dealing with complex or lengthy texts. In contrast, abstractive summarization represents a more sophisticated approach. Rather than extracting content from the source, abstractive models generate summaries using new language, often incorporating words and phrases not found in the original text. This allows for more natural, human-like summaries that better capture the key ideas in a fluid and cohesive manner. This study introduces two advanced models for generating titles from the abstracts of scientific articles. The first model employs a Gated Recurrent Unit (GRU) encoder coupled with a greedy-search decoder, while the second utilizes a Transformer model, known for its capacity to handle long-range dependencies in text. The findings demonstrate that both models outperform the baseline Long Short-Term Memory (LSTM) model in terms of efficiency and fluency. Specifically, the GRU model achieved a ROUGE-1 score of 0.2336, and the Transformer model scored 0.2881, significantly higher than the baseline LSTM model, which reported a ROUGE-1 score of 0.1033. These results underscore the potential of abstractive models to enhance the quality and accuracy of summarization in academic and scholarly contexts, offering more intuitive and meaningful summaries.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to
- Share — copy and redistribute the material in any medium or format under the following conditions:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
- NonCommercial — You may not use the material for commercial purposes;
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
References
B. Dorr, D. Zajic, and R. Schwartz, “Hedge Trimmer: A parse-and-trim approach to headline generation,” HLT-NAACL 03 Text Summarization Workshop, 2003, doi: 10.3115/1119467.1119468.
H. Jing, “Using Hidden Markov Modeling to Decompose Human-Written Summaries,” Computational Linguistics, vol. 28, no. 4, pp. 527–543, 2002, doi: 10.1162/089120102762671972.
R. Paulus, C. Xiong, and R. Socher, “A Deep Reinforced Model for Abstractive Summarization,” arXiv (Cornell University), 2017, doi: 10.48550/arxiv.1705.04304.
A. Nenkova, “Automatic Summarization,” Foundations and Trends® in Information Retrieval, vol. 5, no. 2, pp. 103–233, 2011, doi: 10.1561/1500000015.
A. M. Rush, S. Chopra, and J. Weston, “A Neural Attention Model for Abstractive Sentence Summarization,” arXiv (Cornell University), 2015, doi: 10.48550/arxiv.1509.00685.
R. Nallapati, B. Zhou, C. D. Santos, C. Gulcehre, and B. Xiang, “Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond,” SIGNLL Conference on Computational Natural Language Learning, 2016, doi: 10.18653/v1/k16-1028.
A. Vaswani et al., “Attention Is All You Need,” arXiv (Cornell University), 2017, doi: 10.48550/arxiv.1706.03762.
N. I. Nikolov, M. Pfeiffer, and R. H. R. Hahnloser, “Data-driven Summarization of Scientific Articles,” arXiv (Cornell University), 2018, doi: 10.48550/arxiv.1804.08875.
R. Sarwar, M. Perera, P. S. Teh, R. Nawaz, and M. U. Hassan, “Crossing Linguistic Barriers: Authorship Attribution in Sinhala Texts,” ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, doi: 10.1145/3655620.
C.-Y. Lin, Information Sciences Institute, and University of Southern California, “ROUGE: A Package for Automatic Evaluation of Summaries,” 2004. [Online]. Available: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/was2004.pdf
Y. Bengio, R. Ducharme, and P. Vincent, “A Neural Probabilistic Language Model,” Journal of Machine Learning Research, vol. 3, pp. 932-938, 2000, doi: 10.1162/153244303322533223.
J. Li, T. Luong, and D. Jurafsky, “A Hierarchical Neural Autoencoder for Paragraphs and Documents,” Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, doi: 10.3115/v1/p15-1107.
S. Jean, K. Cho, R. Memisevic, and Y. Bengio, “On Using Very Large Target Vocabulary for Neural Machine Translation,” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, doi: 10.3115/v1/p15-1001.
J. Cheng and M. Lapata, “Neural Summarization by Extracting Sentences and Words,” 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, doi: 10.18653/v1/p16-1046.
A. See, P. J. Liu, and C. D. Manning, “Get To The Point: Summarization with Pointer-Generator Networks,” 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jan. 2017, doi: 10.18653/v1/p17-1099.
L. Wang, J. Yao, Y. Tao, L. Zhong, W. Liu, and Q. Du, “A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization,” Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Jul. 2018, doi: 10.24963/ijcai.2018/619.
P. J. Liu et al., “Generating Wikipedia by Summarizing Long Sequences,” arXiv (Cornell University), 2018, doi: 10.48550/arxiv.1801.10198.
F. Sabah, Y. Chen, Z. Yang, M. Azam, N. Ahmad, and R. Sarwar, “Model optimization techniques in personalized federated learning: A survey,” Expert Systems With Applications, vol. 243, p. 122874, 2024, doi: 10.1016/j.eswa.2023.122874.
S. Gehrmann, Y. Deng, and A. Rush, “Bottom-Up Abstractive Summarization,” Conference on Empirical Methods in Natural Language Processing, 2018, doi: 10.18653/v1/d18-1443.
A. Bakar, R. Sarwar, S.-U. Hassan, and R. Nawaz, “Extracting Algorithmic Complexity in Scientific Literature for Advance Searching,” Journal of Computational and Applied Linguistics, vol. 1, pp. 39–65, 2023. doi: 10.33919/JCAL.23.1.2.
S. Song, H. Huang, and T. Ruan, “Abstractive text summarization using LSTM-CNN based deep learning,” Multimedia Tools and Applications, vol. 78, no. 1, pp. 857–875, 2018, doi: 10.1007/s11042-018-5749-3.
M. Yang, Q. Qu, W. Tu, Y. Shen, Z. Zhao, and X. Chen, “Exploring Human-Like Reading Strategy for Abstractive Text Summarization,” AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 7362–7369, Jul. 2019, doi: 10.1609/aaai.v33i01.33017362.
J. M. Conroy and S. T. Davis, “Section mixture models for scientific document summarization,” International Journal on Digital Libraries, vol. 19, no. 2–3, pp. 305–322, 2017, doi: 10.1007/s00799-017-0218-6.
J. Suzuki and M. Nagata, Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization, arXiv (Cornell University), 2017. doi: 10.18653/v1/e17-2047.
P. Nema, M. M. Khapra, A. Laha, and B. Ravindran, “Diversity driven attention model for query-based abstractive summarization,” 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 1063–1072.
P. Li, W. Lam, L. Bing, and Z. Wang, “Deep Recurrent Generative Decoder for Abstractive Text Summarization,” arXiv (Cornell University), 2017. doi: 10.48550/arxiv.1708.00625.
A. Celikyilmaz, A. Bosselut, X. He, and Y. Choi, “Deep Communicating Agents for Abstractive Summarization,” Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, doi: 10.18653/v1/n18-1150.
A. Cohan et al., “A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents,” Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, doi: 10.18653/v1/n18-2097.
S. K. Abbas et al., “Vision based intelligent traffic light management system using Faster R-CNN,” CAAI Transactions on Intelligence Technology, 2024, doi: 10.1049/cit2.12309.
S. Ma, X. Sun, J. Lin, and X. Ren, “A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification,” arXiv (Cornell University), 2018, doi: 10.48550/arxiv.1805.01089.
M. U. Hassan, S. Alaliyat, R. Sarwar, R. Nawaz, and I. A. Hameed, “Leveraging deep learning and big data to enhance computing curriculum for industry-relevant skills: A Norwegian case study,” Heliyon, vol. 9, no. 4, p. e15407, Apr. 2023, doi: 10.1016/j.heliyon.2023.e15407.
J. Xu, Z. Gan, Y. Cheng, and J. Liu, “Discourse-Aware Neural Extractive Text Summarization,” 58th Annual Meeting of the Association for Computational Linguistics, Jan. 2020, doi: 10.18653/v1/2020.acl-main.451.
B. Mutlu, E. A. Sezer, and M. A. Akcayol, “Candidate sentence selection for extractive text summarization,” Information Processing & Management, vol. 57, no. 6, p. 102359, 2020, doi: 10.1016/j.ipm.2020.102359.
Z. Li, Z. Peng, S. Tang, C. Zhang, and H. Ma, “Text Summarization Method Based on Double Attention Pointer Network,” IEEE Access, vol. 8, pp. 11279–11288, 2020, doi: 10.1109/access.2020.2965575.
M. Yang, C. Li, Y. Shen, Q. Wu, Z. Zhao, and X. Chen, “Hierarchical Human-Like Deep Neural Networks for Abstractive Text Summarization,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 6, pp. 2744–2757, 2020, doi: 10.1109/tnnls.2020.3008037.
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” Advances in Neural Information Processing Systems, vol. 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf.
D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv (Cornell University), 2014, doi: 10.48550/arxiv.1409.0473.
J. L. Elman, “Finding Structure in Time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990, doi: 10.1207/s15516709cog1402_1.
A. Graves, A. -r. Mohamed and G. Hinton, "Speech recognition with deep recurrent neural networks," IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 6645-6649, doi: 10.1109/ICASSP.2013.6638947.
S.-U. Hassan et al., “Exploiting tweet sentiments in altmetrics large-scale data,” Journal of Information Science, vol. 49, no. 5, pp. 1229–1245, 2022, doi: 10.1177/01655515211043713.
R. J. Williams and D. Zipser, “Experimental Analysis of the Real-time Recurrent Learning Algorithm,” Connection Science, vol. 1, no. 1, pp. 87–111, 1989, doi: 10.1080/09540098908915631.
T. A. Khan, R. Sadiq, Z. Shahid, M. M. Alam, and M. M. Su’ud, “Sentiment Analysis using Support Vector Machine and Random Forest,” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 67–75, 2024, doi: 10.33093/jiwe.2024.3.1.5.
H. Ng, M. S. Jalani, T. T. V. Yap, and V. T. Goh, “Performance of Sentiment Classification on Tweets of Clothing Brands,” Journal of Informatics and Web Engineering, vol. 1, no. 1, pp. 16–22, 2022, doi: 10.33093/jiwe.2022.1.1.2.