Enhancing Conversions and Lead Scoring in Online Professional Education: DOI: https://doi.org/10.33093/ijomfa.2024.5.1.2

WEN YANG YIM; KHAI WAH KHAW; SHIUH TONG LIM; XINYING CHEW

doi:10.33093/ijomfa.2024.5.1.2

Fulltext

Published: Feb 29, 2024

DOI: https://doi.org/10.33093/ijomfa.2024.5.1.2

Keywords:

Machine Learning, Lead Conversion, Lead Scoring

Wen Yang Yim

Universiti Sains Malaysia, Malaysia

Khai Wah Khaw

Universiti Sains Malaysia, Malaysia

https://orcid.org/0000-0003-2646-6477

Shiuh Tong Lim

Universiti Sains Malaysia, Malaysia

XinYing Chew

Universiti Sains Malaysia, Malaysia

https://orcid.org/0000-0001-5539-1959

Abstract

This study seeks to enhance lead conversion for online professional education providers by using supervised machine learning algorithms for lead conversion targeting and lead scoring, including Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Naïve Bayes, Random Forst, Bagging, Boosting, and Stacking. A lead dataset was used to train and test the machine-learning models. The Recursive Feature Elimination (RFE) is used to establish a precise lead profile. The performance of the trained lead conversion models was evaluated and compared using the 10-Folds cross-validation method based on accuracy, precision, recall, and F1-score. The results show that Stacking is the best model with an accuracy of 0.9233, precision of 0.9391, and F1-score of 0.8939. Meanwhile, the Logistic Regression-based lead scoring model demonstrated promising potential for automating lead scoring. The results of the Logistic Regression-based lead scoring model achieved an accuracy of 0.9019, recall of 0.9019, precision of 0.9015, and F1-score of 0.9014. The optimal lead scoring threshold is 0.20, which stroked the optimal trade-off balance between accuracy, sensitivity, and specificity.

How to Cite

WEN YANG YIM, KHAW, K. W., SHIUH TONG LIM, & XINYING CHEW. (2024). Enhancing Conversions and Lead Scoring in Online Professional Education: DOI: https://doi.org/10.33093/ijomfa.2024.5.1.2. International Journal of Management, Finance and Accounting, 5(1), 15–63. https://doi.org/10.33093/ijomfa.2024.5.1.2

Issue

Vol. 5 No. 1 (2024):

Section

Management, Finance and Accounting

References

Alabi, K. O., Abdulsalam, S. O., Ogundokun, R. O., & Arowolo, M. O. (2020). Credit Risk Prediction in Commercial Bank Using Chi-Square with SVM-RBF. In Communications in computer and information science. Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_13

Alfian, G., Ijaz, M., Syafrudin, M., Syaekhoni, M. A., Fitriyani, N. L., & Rhee, J. (2019). Customer behaviour analysis using real-time data processing. Asia Pacific Journal of Marketing and Logistics, 31(1), 265–290. https://doi.org/10.1108/apjml-03-2018-0088

Ampountolas, A., Nde, T. N., Date, P., & Constantinescu, C. (2021). A Machine Learning Approach for Micro-Credit Scoring. Risks, 9(3), 50. https://doi.org/10.3390/risks9030050

Aslam, W., Hussain, A., Farhat, K., & Arif, I. (2020). Underlying Factors Influencing Consumers’ Trust and Loyalty in E-commerce. Business Perspectives and Research, 8(2), 186–204. https://doi.org/10.1177/2278533719887451

Awotunde, J. B., Jimoh, R. G., Oladipo, I. D., & Abdulraheem, M. (2020). Prediction of Malaria Fever Using Long-Short-Term Memory and Big Data. In Communications in computer and information science (pp. 41–53). Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_4

Badawi, H. M., Azaïs, F., Bernard, S., Comte, M., Kerzérho, V., & Lefèvre, F. (2019). Use of ensemble methods for indirect test of RF circuits: can it bring benefits? In 2019 IEEE Latin American Test Symposium (LATS). https://doi.org/10.1109/latw.2019.8704641

Bahad, P., & Saxena, P. S. (2020). Study of AdaBoost and Gradient Boosting Algorithms for Predictive Analytics. In Algorithms for intelligent systems (pp. 235–244). Springer, Singapore. https://doi.org/10.1007/978-981-15-0633-8_22

Banerjee, S., & Bhardwaj, P. (2019). Aligning marketing and sales in multi-channel marketing: Compensation design for online lead generation and offline sales conversion. Journal of Business Research, 105, 293–305. https://doi.org/10.1016/j.jbusres.2019.06.016

Bilal, S. M., Almazroi, A. A., Bashir, S., Khan, F. R., & Almazroi, A. A. (2022). An ensemble based approach using a combination of clustering and classification algorithms to enhance customer churn prediction in telecom industry. PeerJ, 8, e854. https://doi.org/10.7717/peerj-cs.854

Blyzniuk, V., Yuryk, Y., Tokar, L. V., Serebrianska, I. M., Bezpalko, O., & Buryk, Z. (2021). Introduction of adult education as a modern educational and economic labour market trend. Laplage Em Revista, 7(1), 304–313. https://doi.org/10.24115/s2446-6220202171726p.304-313

Bokaba, T., Doorsamy, W., & Paul, B. S. (2022). A Comparative Study of Ensemble Models for Predicting Road Traffic Congestion. Applied Sciences, 12(3), 1337. https://doi.org/10.3390/app12031337

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324

Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215. https://doi.org/10.1016/j.neucom.2019.10.118

Choy, G., Khalilzadeh, O., Michalski, M., Do, S., Samir, A. E., Pianykh, O. S., Geis, J. R., Pandharipande, P. V., Brink, J. A., & Dreyer, K. J. (2018). Current Applications and Future Impact of Machine Learning in Radiology. Radiology, 288(2), 318–328. https://doi.org/10.1148/radiol.2018171820

Craney, T. A., & Surles, J. G. (2002). Model-Dependent Variance Inflation Factor Cutoff Values. Quality Engineering, 14(3), 391–403. https://doi.org/10.1081/qen-120001878

Dastile, X., Celik, T., & Potsane, M. M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263. https://doi.org/10.1016/j.asoc.2020.106263

DeMaris, A., & Selman, S. H. (2013). Logistic Regression. In Springer eBooks (pp. 115–136). Springer Nature. https://doi.org/10.1007/978-1-4614-7792-1_7

Džeroski, S., & Ženko, B. (2004). Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning, 54(3), 255–273. https://doi.org/10.1023/b:mach.0000015881.36452.6e

Eitle, V., & Buxmann, P. (2019). Business Analytics for Sales Pipeline Management in the Software Industry: A Machine Learning Perspective. In Proceedings of the . . . Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2019.125

Espadinha-Cruz, P., Fernandes, A., & Grilo, A. (2021). Lead management optimization using data mining: A case in the telecommunications sector. Computers & Industrial Engineering, 154, 107122. https://doi.org/10.1016/j.cie.2021.107122

Eurostat. (2022, January 24). Interest in online education grows in the EU. Eurostat. https://ec.europa.eu/eurostat/web/products-eurostat-news/-/edn-20220124-1

Gopalakrishna, S., Crecelius, A. T., & Patil, A. (2022). Hunting for new customers: Assessing the drivers of effective salesperson prospecting and conversion. Journal of Business Research, 149, 916–926. https://doi.org/10.1016/j.jbusres.2022.05.008

Gouveia, B., & Costa, O. (2022). Industry 4.0: Predicting lead conversion opportunities with machine learning in small and medium sized enterprises. Procedia Computer Science, 204, 54–64. https://doi.org/10.1016/j.procs.2022.08.007

Haleem, A., Javaid, M., Qadri, M. A., Singh, R. P., & Suman, R. (2022). Artificial intelligence (AI) applications for marketing: A literature-based study. International Journal of Intelligent Networks, 3, 119–132. https://doi.org/10.1016/j.ijin.2022.08.005

Hassonah, M. A., Rodan, A., Al-Tamimi, A., & Alsakran, J. (2019). Churn Prediction: A Comparative Study Using KNN and Decision Trees. In Information Technology Trends. https://doi.org/10.1109/itt48889.2019.9075077

Himeur, Y., Alinier, G., Bensaali, F., & Amira, A. (2020). Robust event-based non-intrusive appliance recognition using multi-scale wavelet packet tree and ensemble bagging tree. Applied Energy, 267, 114877. https://doi.org/10.1016/j.apenergy.2020.114877

Hong, Z., Deng, W., & Gong, X. (2022). Prediction of Car Loan Default Results Based on Multi Model Fusion. Frontiers in Business, Economics and Management, 5(1), 142–149. https://doi.org/10.54097/fbem.v5i1.1515

Hosni, M., Abnane, I., Idri, A., De Gea, J. M. C., & Alemán, J. L. F. (2019). Reviewing ensemble classification methods in breast cancer. Computer Methods and Programs in Biomedicine, 177, 89–112. https://doi.org/10.1016/j.cmpb.2019.05.019

Itoo, F., Meenakshi, & Singh, S. (2021). Comparison and analysis of logistic Regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology, 13(4), 1503–1511. https://doi.org/10.1007/s41870-020-00430-y

Jabbouri, J. (2023). The Application of Inbound Marketing to Improve Business Performance: Systematic Literature Review. www.ijafame.org. https://doi.org/10.5281/zenodo.7654781

Jadli, A., Hamim, M., Hain, M., & Hasbaoui, A. (2022). TOWARD A SMART LEAD SCORING SYSTEM USING MACHINE LEARNING. Indian Journal of Computer Science and Engineering, 13(2), 433–443. https://doi.org/10.21817/indjcse/2022/v13i2/221302098

Jafarzadeh, H., Mahdianpari, M., Gill, E. W., Mohammadimanesh, F., & Homayouni, S. (2021). Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sensing, 13(21), 4405.

https://doi.org/10.3390/rs13214405

Jain, H., Khunteta, A., & Srivastava, S. (2020). Churn Prediction in Telecommunication using Logistic Regression and Logit Boost. Procedia Computer Science, 167, 101–112. https://doi.org/10.1016/j.procs.2020.03.187

Jain, S., & Salau, A. O. (2019). An image feature selection approach for dimensionality reduction based on kNN and SVM for AkT proteins. Cogent Engineering, 6(1). https://doi.org/10.1080/23311916.2019.1599537

Jeon, H., & Oh, S. (2020). Hybrid-Recursive Feature Elimination for Efficient Feature Selection. Applied Sciences, 10(9), 3211. https://doi.org/10.3390/app10093211

Joseph, V. R. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining, 15(4), 531–538. https://doi.org/10.1002/sam.11583

Kaur, I., & Kaur, J. (2020). Customer Churn Analysis and Prediction in Banking Industry using Machine Learning. In Grid Computing. https://doi.org/10.1109/pdgc50313.2020.9315761

Kelly, J. L., Messaoud, R. B., Joyeux-Faure, M., Terrail, R., Tamisier, R., Martinot, J., Le-Dong, N., Morrell, M. J., & Pepin, J. (2022). Diagnosis of sleep apnoea using a mandibular monitor and machine learning analysis: One-Night agreement compared to in-Home polysomnography. Frontiers in Neuroscience, 16. https://doi.org/10.3389/fnins.2022.726880

Khan, M. M., Arif, R. B., Siddique, M. a. B., & Oishe, M. R. (2018). Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository. In arXiv (Cornell University). Cornell University. https://doi.org/10.1109/ceeict.2018.8628041

Kravchenko, Y., Dakhno, N., Leshchenko, O., & Tolstokorova, A. (2020). Machine Learning Algorithms for Predicting the Results of COVID-19 Coronavirus Infection. CEUR Workshop Proceedings, 2845, 371–381.

Kumar, A., Kumar, P., Palvia, S. C. J., & Verma, S. K. (2017). Online education worldwide: Current status and emerging trends. Journal of Information Technology Case and Application Research, 19(1), 3–9. https://doi.org/10.1080/15228053.2017.1294867

Kumar, G. N., & Hariharanath, K. (2021). Designing a Lead Score Model for Digital Marketing Firms in Education Vertical in India. Indian Journal of Science and Technology, 14(16), 1302–1309. https://doi.org/10.17485/ijst/v14i16.290

Latha, C. B. C., & Jeeva, S. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16, 100203. https://doi.org/10.1016/j.imu.2019.100203

Lee, T., Ullah, A., & Wang, R. (2020). Bootstrap Aggregating and Random Forest. In Springer eBooks (pp. 389–429). Springer Nature. https://doi.org/10.1007/978-3-030-31150-6_13

Li, Y., & Chen, W. (2020). A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics, 8(10), 1756. https://doi.org/10.3390/math8101756

Lies, J. (2019). Marketing Intelligence and Big Data: Digital Marketing Techniques on their Way to Becoming Social Engineering Techniques in Marketing. International Journal of Interactive Multimedia and Artificial Intelligence, 5(5), 134. https://doi.org/10.9781/ijimai.2019.05.002

Masetic, Z., & Subasi, A. (2016). Congestive heart failure detection using random forest classifier. Computer Methods and Programs in Biomedicine, 130, 54–64. https://doi.org/10.1016/j.cmpb.2016.03.020

Maskeliunas, R., Lauraitis, A., Damsevicius, R., & Misra, S. (2020). Multi-class Model MOV-OVR for Automatic Evaluation of Tremor Disorders in Huntington’s Disease. In Communications in computer and information science. Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_1

McComb, M., Bies, R., & Ramanathan, M. (2021). Machine learning in pharmacometrics: Opportunities and challenges. British Journal of Clinical Pharmacology, 88(4), 1482–1499. https://doi.org/10.1111/bcp.14801

Mebawondu, J. O. (2020). Comparative Analyses of Machine Learning Paradigms for Operators’ Voice Call Quality of Service. In Communications in computer and information science (pp. 66–79). Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_6

Monat, J. P. (2011). Industrial sales lead conversion modeling. Marketing Intelligence & Planning, 29(2), 178–194. https://doi.org/10.1108/02634501111117610

Nair, K. S., & Gupta, R. (2021). Application of AI technology in modern digital marketing environment. World Journal of Entrepreneurship, Management and Sustainable Development, ahead-of-print(ahead-of-print). https://doi.org/10.1108/wjemsd-08-2020-0099

Ni, D., Xiao, Z., & Lim, M. K. (2020). A systematic review of the research trends of machine learning in supply chain management. International Journal of Machine Learning and Cybernetics, 11(7), 1463–1482. https://doi.org/10.1007/s13042-019-01050-0

Nygard, R., & Mezei, J. (2020). Automating Lead Scoring with Machine Learning: An Experimental Study. Proceedings of the . . . Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2020.177

Ohiomah, A., Andreev, P. A., Benyoucef, M., & Hood, D. (2019). The role of lead management systems in inside sales performance. Journal of Business Research, 102, 163–177. https://doi.org/10.1016/j.jbusres.2019.05.018

Paschen, J., Wilson, M. W., & Ferreira, J. J. (2020). Collaborative intelligence: How human and artificial intelligence create value along the B2B sales funnel. Business Horizons, 63(3), 403–414. https://doi.org/10.1016/j.bushor.2020.01.003

Pisner, D. A., & Schnyer, D. M. (2020). Support vector machine. Machine Learning, 101–121. https://doi.org/10.1016/b978-0-12-815739-8.00006-7

Priya V, L. (2020). Implementing Lead Qualification Model Using ICP for Saas Products. International Journal of Management, 11(10). DOI: 10.34218/IJM.11.10.2020.008

Rahmany, M., Zin, A. M., & Sundararajan, E. A. (2020). COMPARING TOOLS PROVIDED BY PYTHON AND R FOR EXPLORATORY DATA ANALYSIS. DOAJ (DOAJ: Directory of Open Access Journals), 4(3), 131. https://doi.org/10.56327/ijiscs.v4i3.933

Rahmat, A., Syakhrani, A. W., & Satria, E. (2021). Promising online learning and teaching in digital age: systematic review analysis. International Research Journal of Engineering, IT and Scientific Research, 7(4), 126–135. https://doi.org/10.21744/irjeis.v7n4.1578

Ray, S. D. (2019). A Quick Review of Machine Learning Algorithms. In International Conference Machine Learning, Big Data, Cloud and Parallel Computing. https://doi.org/10.1109/comitcon.2019.8862451

Rosario, A. T., Moniz, L., & Cruz, R. M. (2021). Data Science Applied to Marketing: A Literature Review. Journal of Information Science and Engineering, 37(5), 1067–1081. https://doi.org/10.6688/jise.202109_37(5).0006

Sen, P. C., Hajra, M., & Ghosh, M. (2020). Supervised Classification Algorithms in Machine Learning: A Survey and Review. In Advances in intelligent systems and computing (pp. 99–111). Springer Nature. https://doi.org/10.1007/978-981-13-7403-6_11

Sevinc, E. (2022). An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Computers & Industrial Engineering, 165, 107912. https://doi.org/10.1016/j.cie.2021.107912

Shah, K., Patel, H. A., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0

Sharm, D. (2009, December 1). THE CONCEPT OF SENSITIVITY AND SPECIFICITY IN RELATION TO TWO TYPES OF ERRORS AND ITS APPLICATION IN MEDICAL RESEARCH. https://journals.riverpublishers.com/index.php/JRSS/article/view/22065

Sidey-Gibbons, J. a. M., & Sidey-Gibbons, C. (2019). Machine learning in medicine: a practical introduction. BMC Medical Research Methodology, 19(1). https://doi.org/10.1186/s12874-019-0681-4

Silva, E. C. E., Lopes, I., Correia, A., & Faria, S. (2020). A logistic regression model for consumer default risk. Journal of Applied Statistics, 47(13–15), 2879–2894. https://doi.org/10.1080/02664763.2020.1759030

Singh, B. E. R., & Sivasankar, E. (2019). Enhancing Prediction Accuracy of Default of Credit Using Ensemble Techniques. In Advances in intelligent systems and computing (pp. 427–436). Springer Nature. https://doi.org/10.1007/978-981-13-1580-0_41

Singh, U., Hur, M., Dorman, K. S., & Wurtele, E. S. (2020). MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets. Nucleic Acids Research, 48(4), e23. https://doi.org/10.1093/nar/gkz1209

Speiser, J. L., Miller, M. I., Tooze, J. A., & Ip, E. H. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems With Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028

Terho, H., Salonen, A., & Yrjänen, M. (2022). Toward a contextualized understanding of inside sales: the role of sales development in effective lead funnel management. Journal of Business & Industrial Marketing, 38(2), 337–352. https://doi.org/10.1108/jbim-12-2021-0596

Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers &Amp; Education, 143, 103676. https://doi.org/10.1016/j.compedu.2019.103676

Tyralis, H., Tyralis, H., & Langousis, A. (2019). A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources. Water, 11(5), 910. https://doi.org/10.3390/w11050910

Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-1004-8

Verma, S. K., Sharma, R., Deb, S., & Maitra, D. (2021). Artificial intelligence in marketing: Systematic review and future research direction. International Journal of Information Management Data Insights, 1(1), 100002. https://doi.org/10.1016/j.jjimei.2020.100002

Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). A Comparative Assessment of Credit Risk Model Based on Machine Learning ——a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069

Wen, L., & Hughes, M. (2020). Coastal Wetland Mapping Using Ensemble Learning Algorithms: A Comparative Study of Bagging, Boosting and Stacking Techniques. Remote Sensing, 12(10), 1683. https://doi.org/10.3390/rs12101683

Wickramasinghe, I., & Kalutarage, H. (2020). Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Computing, 25(3), 2277–2293. https://doi.org/10.1007/s00500-020-05297-6

Wu, M., Andreev, P., & Benyoucef, M. (2023). The state of lead scoring models and their impact on sales performance. Information Technology & Management. https://doi.org/10.1007/s10799-023-00388-w

Yang, F. J. (2018). An Implementation of Naive Bayes Classifier. 2018 International Conference on Computational Science and Computational Intelligence (CSCI). https://doi.org/10.1109/csci46756.2018.00065

Zabor, E. C., Reddy, C. A., Tendulkar, R. D., & Patil, S. (2021). Logistic Regression in Clinical Studies. International Journal of Radiation Oncology Biology Physics, 112(2), 271–277. https://doi.org/10.1016/j.ijrobp.2021.08.007

Zumstein, D., Oswald, C., Gasser, M., Lutz, R., & Schoepf, A. M. (2021). Lead Generation and Lead Qualification Through Data-Driven Marketing in B2B. Marketing Automation Report 2021. https://doi.org/10.21256/zhaw-2402

Article Sidebar

Main Article Content

Abstract

Article Details

References