Enhancing Conversions and Lead Scoring in Online Professional Education DOI: https://doi.org/10.33093/ijomfa.2024.5.1.2

Main Article Content

Wen Yang Yim
Khai Wah Khaw
Shiuh Tong Lim
XinYing Chew

Abstract

This study seeks to enhance lead conversion for online professional education providers by using supervised machine learning algorithms for lead conversion targeting and lead scoring, including Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Naïve Bayes, Random Forst, Bagging, Boosting, and Stacking. A lead dataset was used to train and test the machine-learning models. The Recursive Feature Elimination (RFE) is used to establish a precise lead profile. The performance of the trained lead conversion models was evaluated and compared using the 10-Folds cross-validation method based on accuracy, precision, recall, and F1-score. The results show that Stacking is the best model with an accuracy of 0.9233, precision of 0.9391, and F1-score of 0.8939. Meanwhile, the Logistic Regression-based lead scoring model demonstrated promising potential for automating lead scoring. The results of the Logistic Regression-based lead scoring model achieved an accuracy of 0.9019, recall of 0.9019, precision of 0.9015, and F1-score of 0.9014. The optimal lead scoring threshold is 0.20, which stroked the optimal trade-off balance between accuracy, sensitivity, and specificity.

Article Details

Section
Management, Finance and Accounting

References

Alabi, K. O., Abdulsalam, S. O., Ogundokun, R. O., & Arowolo, M. O. (2020). Credit Risk Prediction in Commercial Bank Using Chi-Square with SVM-RBF. In Communications in computer and information science. Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_13

Alfian, G., Ijaz, M., Syafrudin, M., Syaekhoni, M. A., Fitriyani, N. L., & Rhee, J. (2019). Customer behaviour analysis using real-time data processing. Asia Pacific Journal of Marketing and Logistics, 31(1), 265–290. https://doi.org/10.1108/apjml-03-2018-0088

Ampountolas, A., Nde, T. N., Date, P., & Constantinescu, C. (2021). A Machine Learning Approach for Micro-Credit Scoring. Risks, 9(3), 50. https://doi.org/10.3390/risks9030050

Aslam, W., Hussain, A., Farhat, K., & Arif, I. (2020). Underlying Factors Influencing Consumers’ Trust and Loyalty in E-commerce. Business Perspectives and Research, 8(2), 186–204. https://doi.org/10.1177/2278533719887451

Awotunde, J. B., Jimoh, R. G., Oladipo, I. D., & Abdulraheem, M. (2020). Prediction of Malaria Fever Using Long-Short-Term Memory and Big Data. In Communications in computer and information science (pp. 41–53). Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_4

Badawi, H. M., Azaïs, F., Bernard, S., Comte, M., Kerzérho, V., & Lefèvre, F. (2019). Use of ensemble methods for indirect test of RF circuits: can it bring benefits? In 2019 IEEE Latin American Test Symposium (LATS). https://doi.org/10.1109/latw.2019.8704641

Bahad, P., & Saxena, P. S. (2020). Study of AdaBoost and Gradient Boosting Algorithms for Predictive Analytics. In Algorithms for intelligent systems (pp. 235–244). Springer, Singapore. https://doi.org/10.1007/978-981-15-0633-8_22

Banerjee, S., & Bhardwaj, P. (2019). Aligning marketing and sales in multi-channel marketing: Compensation design for online lead generation and offline sales conversion. Journal of Business Research, 105, 293–305. https://doi.org/10.1016/j.jbusres.2019.06.016

Bilal, S. M., Almazroi, A. A., Bashir, S., Khan, F. R., & Almazroi, A. A. (2022). An ensemble based approach using a combination of clustering and classification algorithms to enhance customer churn prediction in telecom industry. PeerJ, 8, e854. https://doi.org/10.7717/peerj-cs.854

Blyzniuk, V., Yuryk, Y., Tokar, L. V., Serebrianska, I. M., Bezpalko, O., & Buryk, Z. (2021). Introduction of adult education as a modern educational and economic labour market trend. Laplage Em Revista, 7(1), 304–313. https://doi.org/10.24115/s2446-6220202171726p.304-313

Bokaba, T., Doorsamy, W., & Paul, B. S. (2022). A Comparative Study of Ensemble Models for Predicting Road Traffic Congestion. Applied Sciences, 12(3), 1337. https://doi.org/10.3390/app12031337

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324

Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215. https://doi.org/10.1016/j.neucom.2019.10.118

Choy, G., Khalilzadeh, O., Michalski, M., Do, S., Samir, A. E., Pianykh, O. S., Geis, J. R., Pandharipande, P. V., Brink, J. A., & Dreyer, K. J. (2018). Current Applications and Future Impact of Machine Learning in Radiology. Radiology, 288(2), 318–328. https://doi.org/10.1148/radiol.2018171820

Craney, T. A., & Surles, J. G. (2002). Model-Dependent Variance Inflation Factor Cutoff Values. Quality Engineering, 14(3), 391–403. https://doi.org/10.1081/qen-120001878

Dastile, X., Celik, T., & Potsane, M. M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263. https://doi.org/10.1016/j.asoc.2020.106263

DeMaris, A., & Selman, S. H. (2013). Logistic Regression. In Springer eBooks (pp. 115–136). Springer Nature. https://doi.org/10.1007/978-1-4614-7792-1_7

Džeroski, S., & Ženko, B. (2004). Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning, 54(3), 255–273. https://doi.org/10.1023/b:mach.0000015881.36452.6e

Eitle, V., & Buxmann, P. (2019). Business Analytics for Sales Pipeline Management in the Software Industry: A Machine Learning Perspective. In Proceedings of the . . . Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2019.125

Espadinha-Cruz, P., Fernandes, A., & Grilo, A. (2021). Lead management optimization using data mining: A case in the telecommunications sector. Computers & Industrial Engineering, 154, 107122. https://doi.org/10.1016/j.cie.2021.107122

Eurostat. (2022, January 24). Interest in online education grows in the EU. Eurostat. https://ec.europa.eu/eurostat/web/products-eurostat-news/-/edn-20220124-1

Gopalakrishna, S., Crecelius, A. T., & Patil, A. (2022). Hunting for new customers: Assessing the drivers of effective salesperson prospecting and conversion. Journal of Business Research, 149, 916–926. https://doi.org/10.1016/j.jbusres.2022.05.008

Gouveia, B., & Costa, O. (2022). Industry 4.0: Predicting lead conversion opportunities with machine learning in small and medium sized enterprises. Procedia Computer Science, 204, 54–64. https://doi.org/10.1016/j.procs.2022.08.007

Haleem, A., Javaid, M., Qadri, M. A., Singh, R. P., & Suman, R. (2022). Artificial intelligence (AI) applications for marketing: A literature-based study. International Journal of Intelligent Networks, 3, 119–132. https://doi.org/10.1016/j.ijin.2022.08.005

Hassonah, M. A., Rodan, A., Al-Tamimi, A., & Alsakran, J. (2019). Churn Prediction: A Comparative Study Using KNN and Decision Trees. In Information Technology Trends. https://doi.org/10.1109/itt48889.2019.9075077

Himeur, Y., Alinier, G., Bensaali, F., & Amira, A. (2020). Robust event-based non-intrusive appliance recognition using multi-scale wavelet packet tree and ensemble bagging tree. Applied Energy, 267, 114877. https://doi.org/10.1016/j.apenergy.2020.114877

Hong, Z., Deng, W., & Gong, X. (2022). Prediction of Car Loan Default Results Based on Multi Model Fusion. Frontiers in Business, Economics and Management, 5(1), 142–149. https://doi.org/10.54097/fbem.v5i1.1515

Hosni, M., Abnane, I., Idri, A., De Gea, J. M. C., & Alemán, J. L. F. (2019). Reviewing ensemble classification methods in breast cancer. Computer Methods and Programs in Biomedicine, 177, 89–112. https://doi.org/10.1016/j.cmpb.2019.05.019

Itoo, F., Meenakshi, & Singh, S. (2021). Comparison and analysis of logistic Regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology, 13(4), 1503–1511. https://doi.org/10.1007/s41870-020-00430-y

Jabbouri, J. (2023). The Application of Inbound Marketing to Improve Business Performance: Systematic Literature Review. www.ijafame.org. https://doi.org/10.5281/zenodo.7654781

Jadli, A., Hamim, M., Hain, M., & Hasbaoui, A. (2022). TOWARD A SMART LEAD SCORING SYSTEM USING MACHINE LEARNING. Indian Journal of Computer Science and Engineering, 13(2), 433–443. https://doi.org/10.21817/indjcse/2022/v13i2/221302098

Jafarzadeh, H., Mahdianpari, M., Gill, E. W., Mohammadimanesh, F., & Homayouni, S. (2021). Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sensing, 13(21), 4405.

https://doi.org/10.3390/rs13214405

Jain, H., Khunteta, A., & Srivastava, S. (2020). Churn Prediction in Telecommunication using Logistic Regression and Logit Boost. Procedia Computer Science, 167, 101–112. https://doi.org/10.1016/j.procs.2020.03.187

Jain, S., & Salau, A. O. (2019). An image feature selection approach for dimensionality reduction based on kNN and SVM for AkT proteins. Cogent Engineering, 6(1). https://doi.org/10.1080/23311916.2019.1599537

Jeon, H., & Oh, S. (2020). Hybrid-Recursive Feature Elimination for Efficient Feature Selection. Applied Sciences, 10(9), 3211. https://doi.org/10.3390/app10093211

Joseph, V. R. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining, 15(4), 531–538. https://doi.org/10.1002/sam.11583

Kaur, I., & Kaur, J. (2020). Customer Churn Analysis and Prediction in Banking Industry using Machine Learning. In Grid Computing. https://doi.org/10.1109/pdgc50313.2020.9315761

Kelly, J. L., Messaoud, R. B., Joyeux-Faure, M., Terrail, R., Tamisier, R., Martinot, J., Le-Dong, N., Morrell, M. J., & Pepin, J. (2022). Diagnosis of sleep apnoea using a mandibular monitor and machine learning analysis: One-Night agreement compared to in-Home polysomnography. Frontiers in Neuroscience, 16. https://doi.org/10.3389/fnins.2022.726880

Khan, M. M., Arif, R. B., Siddique, M. a. B., & Oishe, M. R. (2018). Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository. In arXiv (Cornell University). Cornell University. https://doi.org/10.1109/ceeict.2018.8628041

Kravchenko, Y., Dakhno, N., Leshchenko, O., & Tolstokorova, A. (2020). Machine Learning Algorithms for Predicting the Results of COVID-19 Coronavirus Infection. CEUR Workshop Proceedings, 2845, 371–381.

Kumar, A., Kumar, P., Palvia, S. C. J., & Verma, S. K. (2017). Online education worldwide: Current status and emerging trends. Journal of Information Technology Case and Application Research, 19(1), 3–9. https://doi.org/10.1080/15228053.2017.1294867

Kumar, G. N., & Hariharanath, K. (2021). Designing a Lead Score Model for Digital Marketing Firms in Education Vertical in India. Indian Journal of Science and Technology, 14(16), 1302–1309. https://doi.org/10.17485/ijst/v14i16.290

Latha, C. B. C., & Jeeva, S. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16, 100203. https://doi.org/10.1016/j.imu.2019.100203

Lee, T., Ullah, A., & Wang, R. (2020). Bootstrap Aggregating and Random Forest. In Springer eBooks (pp. 389–429). Springer Nature. https://doi.org/10.1007/978-3-030-31150-6_13

Li, Y., & Chen, W. (2020). A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics, 8(10), 1756. https://doi.org/10.3390/math8101756

Lies, J. (2019). Marketing Intelligence and Big Data: Digital Marketing Techniques on their Way to Becoming Social Engineering Techniques in Marketing. International Journal of Interactive Multimedia and Artificial Intelligence, 5(5), 134. https://doi.org/10.9781/ijimai.2019.05.002

Masetic, Z., & Subasi, A. (2016). Congestive heart failure detection using random forest classifier. Computer Methods and Programs in Biomedicine, 130, 54–64. https://doi.org/10.1016/j.cmpb.2016.03.020

Maskeliunas, R., Lauraitis, A., Damsevicius, R., & Misra, S. (2020). Multi-class Model MOV-OVR for Automatic Evaluation of Tremor Disorders in Huntington’s Disease. In Communications in computer and information science. Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_1

McComb, M., Bies, R., & Ramanathan, M. (2021). Machine learning in pharmacometrics: Opportunities and challenges. British Journal of Clinical Pharmacology, 88(4), 1482–1499. https://doi.org/10.1111/bcp.14801

Mebawondu, J. O. (2020). Comparative Analyses of Machine Learning Paradigms for Operators’ Voice Call Quality of Service. In Communications in computer and information science (pp. 66–79). Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_6

Monat, J. P. (2011). Industrial sales lead conversion modeling. Marketing Intelligence & Planning, 29(2), 178–194. https://doi.org/10.1108/02634501111117610

Nair, K. S., & Gupta, R. (2021). Application of AI technology in modern digital marketing environment. World Journal of Entrepreneurship, Management and Sustainable Development, ahead-of-print(ahead-of-print). https://doi.org/10.1108/wjemsd-08-2020-0099

Ni, D., Xiao, Z., & Lim, M. K. (2020). A systematic review of the research trends of machine learning in supply chain management. International Journal of Machine Learning and Cybernetics, 11(7), 1463–1482. https://doi.org/10.1007/s13042-019-01050-0

Nygard, R., & Mezei, J. (2020). Automating Lead Scoring with Machine Learning: An Experimental Study. Proceedings of the . . . Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2020.177

Ohiomah, A., Andreev, P. A., Benyoucef, M., & Hood, D. (2019). The role of lead management systems in inside sales performance. Journal of Business Research, 102, 163–177. https://doi.org/10.1016/j.jbusres.2019.05.018

Paschen, J., Wilson, M. W., & Ferreira, J. J. (2020). Collaborative intelligence: How human and artificial intelligence create value along the B2B sales funnel. Business Horizons, 63(3), 403–414. https://doi.org/10.1016/j.bushor.2020.01.003

Pisner, D. A., & Schnyer, D. M. (2020). Support vector machine. Machine Learning, 101–121. https://doi.org/10.1016/b978-0-12-815739-8.00006-7

Priya V, L. (2020). Implementing Lead Qualification Model Using ICP for Saas Products. International Journal of Management, 11(10). DOI: 10.34218/IJM.11.10.2020.008

Rahmany, M., Zin, A. M., & Sundararajan, E. A. (2020). COMPARING TOOLS PROVIDED BY PYTHON AND R FOR EXPLORATORY DATA ANALYSIS. DOAJ (DOAJ: Directory of Open Access Journals), 4(3), 131. https://doi.org/10.56327/ijiscs.v4i3.933

Rahmat, A., Syakhrani, A. W., & Satria, E. (2021). Promising online learning and teaching in digital age: systematic review analysis. International Research Journal of Engineering, IT and Scientific Research, 7(4), 126–135. https://doi.org/10.21744/irjeis.v7n4.1578

Ray, S. D. (2019). A Quick Review of Machine Learning Algorithms. In International Conference Machine Learning, Big Data, Cloud and Parallel Computing. https://doi.org/10.1109/comitcon.2019.8862451

Rosario, A. T., Moniz, L., & Cruz, R. M. (2021). Data Science Applied to Marketing: A Literature Review. Journal of Information Science and Engineering, 37(5), 1067–1081. https://doi.org/10.6688/jise.202109_37(5).0006

Sen, P. C., Hajra, M., & Ghosh, M. (2020). Supervised Classification Algorithms in Machine Learning: A Survey and Review. In Advances in intelligent systems and computing (pp. 99–111). Springer Nature. https://doi.org/10.1007/978-981-13-7403-6_11

Sevinc, E. (2022). An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Computers & Industrial Engineering, 165, 107912. https://doi.org/10.1016/j.cie.2021.107912

Shah, K., Patel, H. A., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0

Sharm, D. (2009, December 1). THE CONCEPT OF SENSITIVITY AND SPECIFICITY IN RELATION TO TWO TYPES OF ERRORS AND ITS APPLICATION IN MEDICAL RESEARCH. https://journals.riverpublishers.com/index.php/JRSS/article/view/22065

Sidey-Gibbons, J. a. M., & Sidey-Gibbons, C. (2019). Machine learning in medicine: a practical introduction. BMC Medical Research Methodology, 19(1). https://doi.org/10.1186/s12874-019-0681-4

Silva, E. C. E., Lopes, I., Correia, A., & Faria, S. (2020). A logistic regression model for consumer default risk. Journal of Applied Statistics, 47(13–15), 2879–2894. https://doi.org/10.1080/02664763.2020.1759030

Singh, B. E. R., & Sivasankar, E. (2019). Enhancing Prediction Accuracy of Default of Credit Using Ensemble Techniques. In Advances in intelligent systems and computing (pp. 427–436). Springer Nature. https://doi.org/10.1007/978-981-13-1580-0_41

Singh, U., Hur, M., Dorman, K. S., & Wurtele, E. S. (2020). MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets. Nucleic Acids Research, 48(4), e23. https://doi.org/10.1093/nar/gkz1209

Speiser, J. L., Miller, M. I., Tooze, J. A., & Ip, E. H. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems With Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028

Terho, H., Salonen, A., & Yrjänen, M. (2022). Toward a contextualized understanding of inside sales: the role of sales development in effective lead funnel management. Journal of Business & Industrial Marketing, 38(2), 337–352. https://doi.org/10.1108/jbim-12-2021-0596

Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers &Amp; Education, 143, 103676. https://doi.org/10.1016/j.compedu.2019.103676

Tyralis, H., Tyralis, H., & Langousis, A. (2019). A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources. Water, 11(5), 910. https://doi.org/10.3390/w11050910

Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-1004-8

Verma, S. K., Sharma, R., Deb, S., & Maitra, D. (2021). Artificial intelligence in marketing: Systematic review and future research direction. International Journal of Information Management Data Insights, 1(1), 100002. https://doi.org/10.1016/j.jjimei.2020.100002

Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). A Comparative Assessment of Credit Risk Model Based on Machine Learning ——a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069

Wen, L., & Hughes, M. (2020). Coastal Wetland Mapping Using Ensemble Learning Algorithms: A Comparative Study of Bagging, Boosting and Stacking Techniques. Remote Sensing, 12(10), 1683. https://doi.org/10.3390/rs12101683

Wickramasinghe, I., & Kalutarage, H. (2020). Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Computing, 25(3), 2277–2293. https://doi.org/10.1007/s00500-020-05297-6

Wu, M., Andreev, P., & Benyoucef, M. (2023). The state of lead scoring models and their impact on sales performance. Information Technology & Management. https://doi.org/10.1007/s10799-023-00388-w

Yang, F. J. (2018). An Implementation of Naive Bayes Classifier. 2018 International Conference on Computational Science and Computational Intelligence (CSCI). https://doi.org/10.1109/csci46756.2018.00065

Zabor, E. C., Reddy, C. A., Tendulkar, R. D., & Patil, S. (2021). Logistic Regression in Clinical Studies. International Journal of Radiation Oncology Biology Physics, 112(2), 271–277. https://doi.org/10.1016/j.ijrobp.2021.08.007

Zumstein, D., Oswald, C., Gasser, M., Lutz, R., & Schoepf, A. M. (2021). Lead Generation and Lead Qualification Through Data-Driven Marketing in B2B. Marketing Automation Report 2021. https://doi.org/10.21256/zhaw-2402