Enhancing Conversions and Lead Scoring in Online Professional Education DOI: https://doi.org/10.33093/ijomfa.2024.5.1.2
Main Article Content
Abstract
This study seeks to enhance lead conversion for online professional education providers by using supervised machine learning algorithms for lead conversion targeting and lead scoring, including Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Naïve Bayes, Random Forst, Bagging, Boosting, and Stacking. A lead dataset was used to train and test the machine-learning models. The Recursive Feature Elimination (RFE) is used to establish a precise lead profile. The performance of the trained lead conversion models was evaluated and compared using the 10-Folds cross-validation method based on accuracy, precision, recall, and F1-score. The results show that Stacking is the best model with an accuracy of 0.9233, precision of 0.9391, and F1-score of 0.8939. Meanwhile, the Logistic Regression-based lead scoring model demonstrated promising potential for automating lead scoring. The results of the Logistic Regression-based lead scoring model achieved an accuracy of 0.9019, recall of 0.9019, precision of 0.9015, and F1-score of 0.9014. The optimal lead scoring threshold is 0.20, which stroked the optimal trade-off balance between accuracy, sensitivity, and specificity.
Article Details
References
Alabi, K. O., Abdulsalam, S. O., Ogundokun, R. O., & Arowolo, M. O. (2020). Credit Risk Prediction in Commercial Bank Using Chi-Square with SVM-RBF. In Communications in computer and information science. Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_13
Alfian, G., Ijaz, M., Syafrudin, M., Syaekhoni, M. A., Fitriyani, N. L., & Rhee, J. (2019). Customer behaviour analysis using real-time data processing. Asia Pacific Journal of Marketing and Logistics, 31(1), 265–290. https://doi.org/10.1108/apjml-03-2018-0088
Ampountolas, A., Nde, T. N., Date, P., & Constantinescu, C. (2021). A Machine Learning Approach for Micro-Credit Scoring. Risks, 9(3), 50. https://doi.org/10.3390/risks9030050
Aslam, W., Hussain, A., Farhat, K., & Arif, I. (2020). Underlying Factors Influencing Consumers’ Trust and Loyalty in E-commerce. Business Perspectives and Research, 8(2), 186–204. https://doi.org/10.1177/2278533719887451
Awotunde, J. B., Jimoh, R. G., Oladipo, I. D., & Abdulraheem, M. (2020). Prediction of Malaria Fever Using Long-Short-Term Memory and Big Data. In Communications in computer and information science (pp. 41–53). Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_4
Badawi, H. M., Azaïs, F., Bernard, S., Comte, M., Kerzérho, V., & Lefèvre, F. (2019). Use of ensemble methods for indirect test of RF circuits: can it bring benefits? In 2019 IEEE Latin American Test Symposium (LATS). https://doi.org/10.1109/latw.2019.8704641
Bahad, P., & Saxena, P. S. (2020). Study of AdaBoost and Gradient Boosting Algorithms for Predictive Analytics. In Algorithms for intelligent systems (pp. 235–244). Springer, Singapore. https://doi.org/10.1007/978-981-15-0633-8_22
Banerjee, S., & Bhardwaj, P. (2019). Aligning marketing and sales in multi-channel marketing: Compensation design for online lead generation and offline sales conversion. Journal of Business Research, 105, 293–305. https://doi.org/10.1016/j.jbusres.2019.06.016
Bilal, S. M., Almazroi, A. A., Bashir, S., Khan, F. R., & Almazroi, A. A. (2022). An ensemble based approach using a combination of clustering and classification algorithms to enhance customer churn prediction in telecom industry. PeerJ, 8, e854. https://doi.org/10.7717/peerj-cs.854
Blyzniuk, V., Yuryk, Y., Tokar, L. V., Serebrianska, I. M., Bezpalko, O., & Buryk, Z. (2021). Introduction of adult education as a modern educational and economic labour market trend. Laplage Em Revista, 7(1), 304–313. https://doi.org/10.24115/s2446-6220202171726p.304-313
Bokaba, T., Doorsamy, W., & Paul, B. S. (2022). A Comparative Study of Ensemble Models for Predicting Road Traffic Congestion. Applied Sciences, 12(3), 1337. https://doi.org/10.3390/app12031337
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215. https://doi.org/10.1016/j.neucom.2019.10.118
Choy, G., Khalilzadeh, O., Michalski, M., Do, S., Samir, A. E., Pianykh, O. S., Geis, J. R., Pandharipande, P. V., Brink, J. A., & Dreyer, K. J. (2018). Current Applications and Future Impact of Machine Learning in Radiology. Radiology, 288(2), 318–328. https://doi.org/10.1148/radiol.2018171820
Craney, T. A., & Surles, J. G. (2002). Model-Dependent Variance Inflation Factor Cutoff Values. Quality Engineering, 14(3), 391–403. https://doi.org/10.1081/qen-120001878
Dastile, X., Celik, T., & Potsane, M. M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263. https://doi.org/10.1016/j.asoc.2020.106263
DeMaris, A., & Selman, S. H. (2013). Logistic Regression. In Springer eBooks (pp. 115–136). Springer Nature. https://doi.org/10.1007/978-1-4614-7792-1_7
Džeroski, S., & Ženko, B. (2004). Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning, 54(3), 255–273. https://doi.org/10.1023/b:mach.0000015881.36452.6e
Eitle, V., & Buxmann, P. (2019). Business Analytics for Sales Pipeline Management in the Software Industry: A Machine Learning Perspective. In Proceedings of the . . . Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2019.125
Espadinha-Cruz, P., Fernandes, A., & Grilo, A. (2021). Lead management optimization using data mining: A case in the telecommunications sector. Computers & Industrial Engineering, 154, 107122. https://doi.org/10.1016/j.cie.2021.107122
Eurostat. (2022, January 24). Interest in online education grows in the EU. Eurostat. https://ec.europa.eu/eurostat/web/products-eurostat-news/-/edn-20220124-1
Gopalakrishna, S., Crecelius, A. T., & Patil, A. (2022). Hunting for new customers: Assessing the drivers of effective salesperson prospecting and conversion. Journal of Business Research, 149, 916–926. https://doi.org/10.1016/j.jbusres.2022.05.008
Gouveia, B., & Costa, O. (2022). Industry 4.0: Predicting lead conversion opportunities with machine learning in small and medium sized enterprises. Procedia Computer Science, 204, 54–64. https://doi.org/10.1016/j.procs.2022.08.007
Haleem, A., Javaid, M., Qadri, M. A., Singh, R. P., & Suman, R. (2022). Artificial intelligence (AI) applications for marketing: A literature-based study. International Journal of Intelligent Networks, 3, 119–132. https://doi.org/10.1016/j.ijin.2022.08.005
Hassonah, M. A., Rodan, A., Al-Tamimi, A., & Alsakran, J. (2019). Churn Prediction: A Comparative Study Using KNN and Decision Trees. In Information Technology Trends. https://doi.org/10.1109/itt48889.2019.9075077
Himeur, Y., Alinier, G., Bensaali, F., & Amira, A. (2020). Robust event-based non-intrusive appliance recognition using multi-scale wavelet packet tree and ensemble bagging tree. Applied Energy, 267, 114877. https://doi.org/10.1016/j.apenergy.2020.114877
Hong, Z., Deng, W., & Gong, X. (2022). Prediction of Car Loan Default Results Based on Multi Model Fusion. Frontiers in Business, Economics and Management, 5(1), 142–149. https://doi.org/10.54097/fbem.v5i1.1515
Hosni, M., Abnane, I., Idri, A., De Gea, J. M. C., & Alemán, J. L. F. (2019). Reviewing ensemble classification methods in breast cancer. Computer Methods and Programs in Biomedicine, 177, 89–112. https://doi.org/10.1016/j.cmpb.2019.05.019
Itoo, F., Meenakshi, & Singh, S. (2021). Comparison and analysis of logistic Regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology, 13(4), 1503–1511. https://doi.org/10.1007/s41870-020-00430-y
Jabbouri, J. (2023). The Application of Inbound Marketing to Improve Business Performance: Systematic Literature Review. www.ijafame.org. https://doi.org/10.5281/zenodo.7654781
Jadli, A., Hamim, M., Hain, M., & Hasbaoui, A. (2022). TOWARD A SMART LEAD SCORING SYSTEM USING MACHINE LEARNING. Indian Journal of Computer Science and Engineering, 13(2), 433–443. https://doi.org/10.21817/indjcse/2022/v13i2/221302098
Jafarzadeh, H., Mahdianpari, M., Gill, E. W., Mohammadimanesh, F., & Homayouni, S. (2021). Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sensing, 13(21), 4405.
https://doi.org/10.3390/rs13214405
Jain, H., Khunteta, A., & Srivastava, S. (2020). Churn Prediction in Telecommunication using Logistic Regression and Logit Boost. Procedia Computer Science, 167, 101–112. https://doi.org/10.1016/j.procs.2020.03.187
Jain, S., & Salau, A. O. (2019). An image feature selection approach for dimensionality reduction based on kNN and SVM for AkT proteins. Cogent Engineering, 6(1). https://doi.org/10.1080/23311916.2019.1599537
Jeon, H., & Oh, S. (2020). Hybrid-Recursive Feature Elimination for Efficient Feature Selection. Applied Sciences, 10(9), 3211. https://doi.org/10.3390/app10093211
Joseph, V. R. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining, 15(4), 531–538. https://doi.org/10.1002/sam.11583
Kaur, I., & Kaur, J. (2020). Customer Churn Analysis and Prediction in Banking Industry using Machine Learning. In Grid Computing. https://doi.org/10.1109/pdgc50313.2020.9315761
Kelly, J. L., Messaoud, R. B., Joyeux-Faure, M., Terrail, R., Tamisier, R., Martinot, J., Le-Dong, N., Morrell, M. J., & Pepin, J. (2022). Diagnosis of sleep apnoea using a mandibular monitor and machine learning analysis: One-Night agreement compared to in-Home polysomnography. Frontiers in Neuroscience, 16. https://doi.org/10.3389/fnins.2022.726880
Khan, M. M., Arif, R. B., Siddique, M. a. B., & Oishe, M. R. (2018). Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository. In arXiv (Cornell University). Cornell University. https://doi.org/10.1109/ceeict.2018.8628041
Kravchenko, Y., Dakhno, N., Leshchenko, O., & Tolstokorova, A. (2020). Machine Learning Algorithms for Predicting the Results of COVID-19 Coronavirus Infection. CEUR Workshop Proceedings, 2845, 371–381.
Kumar, A., Kumar, P., Palvia, S. C. J., & Verma, S. K. (2017). Online education worldwide: Current status and emerging trends. Journal of Information Technology Case and Application Research, 19(1), 3–9. https://doi.org/10.1080/15228053.2017.1294867
Kumar, G. N., & Hariharanath, K. (2021). Designing a Lead Score Model for Digital Marketing Firms in Education Vertical in India. Indian Journal of Science and Technology, 14(16), 1302–1309. https://doi.org/10.17485/ijst/v14i16.290
Latha, C. B. C., & Jeeva, S. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16, 100203. https://doi.org/10.1016/j.imu.2019.100203
Lee, T., Ullah, A., & Wang, R. (2020). Bootstrap Aggregating and Random Forest. In Springer eBooks (pp. 389–429). Springer Nature. https://doi.org/10.1007/978-3-030-31150-6_13
Li, Y., & Chen, W. (2020). A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics, 8(10), 1756. https://doi.org/10.3390/math8101756
Lies, J. (2019). Marketing Intelligence and Big Data: Digital Marketing Techniques on their Way to Becoming Social Engineering Techniques in Marketing. International Journal of Interactive Multimedia and Artificial Intelligence, 5(5), 134. https://doi.org/10.9781/ijimai.2019.05.002
Masetic, Z., & Subasi, A. (2016). Congestive heart failure detection using random forest classifier. Computer Methods and Programs in Biomedicine, 130, 54–64. https://doi.org/10.1016/j.cmpb.2016.03.020
Maskeliunas, R., Lauraitis, A., Damsevicius, R., & Misra, S. (2020). Multi-class Model MOV-OVR for Automatic Evaluation of Tremor Disorders in Huntington’s Disease. In Communications in computer and information science. Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_1
McComb, M., Bies, R., & Ramanathan, M. (2021). Machine learning in pharmacometrics: Opportunities and challenges. British Journal of Clinical Pharmacology, 88(4), 1482–1499. https://doi.org/10.1111/bcp.14801
Mebawondu, J. O. (2020). Comparative Analyses of Machine Learning Paradigms for Operators’ Voice Call Quality of Service. In Communications in computer and information science (pp. 66–79). Springer Science+Business Media. https://doi.org/10.1007/978-3-030-69143-1_6
Monat, J. P. (2011). Industrial sales lead conversion modeling. Marketing Intelligence & Planning, 29(2), 178–194. https://doi.org/10.1108/02634501111117610
Nair, K. S., & Gupta, R. (2021). Application of AI technology in modern digital marketing environment. World Journal of Entrepreneurship, Management and Sustainable Development, ahead-of-print(ahead-of-print). https://doi.org/10.1108/wjemsd-08-2020-0099
Ni, D., Xiao, Z., & Lim, M. K. (2020). A systematic review of the research trends of machine learning in supply chain management. International Journal of Machine Learning and Cybernetics, 11(7), 1463–1482. https://doi.org/10.1007/s13042-019-01050-0
Nygard, R., & Mezei, J. (2020). Automating Lead Scoring with Machine Learning: An Experimental Study. Proceedings of the . . . Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2020.177
Ohiomah, A., Andreev, P. A., Benyoucef, M., & Hood, D. (2019). The role of lead management systems in inside sales performance. Journal of Business Research, 102, 163–177. https://doi.org/10.1016/j.jbusres.2019.05.018
Paschen, J., Wilson, M. W., & Ferreira, J. J. (2020). Collaborative intelligence: How human and artificial intelligence create value along the B2B sales funnel. Business Horizons, 63(3), 403–414. https://doi.org/10.1016/j.bushor.2020.01.003
Pisner, D. A., & Schnyer, D. M. (2020). Support vector machine. Machine Learning, 101–121. https://doi.org/10.1016/b978-0-12-815739-8.00006-7
Priya V, L. (2020). Implementing Lead Qualification Model Using ICP for Saas Products. International Journal of Management, 11(10). DOI: 10.34218/IJM.11.10.2020.008
Rahmany, M., Zin, A. M., & Sundararajan, E. A. (2020). COMPARING TOOLS PROVIDED BY PYTHON AND R FOR EXPLORATORY DATA ANALYSIS. DOAJ (DOAJ: Directory of Open Access Journals), 4(3), 131. https://doi.org/10.56327/ijiscs.v4i3.933
Rahmat, A., Syakhrani, A. W., & Satria, E. (2021). Promising online learning and teaching in digital age: systematic review analysis. International Research Journal of Engineering, IT and Scientific Research, 7(4), 126–135. https://doi.org/10.21744/irjeis.v7n4.1578
Ray, S. D. (2019). A Quick Review of Machine Learning Algorithms. In International Conference Machine Learning, Big Data, Cloud and Parallel Computing. https://doi.org/10.1109/comitcon.2019.8862451
Rosario, A. T., Moniz, L., & Cruz, R. M. (2021). Data Science Applied to Marketing: A Literature Review. Journal of Information Science and Engineering, 37(5), 1067–1081. https://doi.org/10.6688/jise.202109_37(5).0006
Sen, P. C., Hajra, M., & Ghosh, M. (2020). Supervised Classification Algorithms in Machine Learning: A Survey and Review. In Advances in intelligent systems and computing (pp. 99–111). Springer Nature. https://doi.org/10.1007/978-981-13-7403-6_11
Sevinc, E. (2022). An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Computers & Industrial Engineering, 165, 107912. https://doi.org/10.1016/j.cie.2021.107912
Shah, K., Patel, H. A., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0
Sharm, D. (2009, December 1). THE CONCEPT OF SENSITIVITY AND SPECIFICITY IN RELATION TO TWO TYPES OF ERRORS AND ITS APPLICATION IN MEDICAL RESEARCH. https://journals.riverpublishers.com/index.php/JRSS/article/view/22065
Sidey-Gibbons, J. a. M., & Sidey-Gibbons, C. (2019). Machine learning in medicine: a practical introduction. BMC Medical Research Methodology, 19(1). https://doi.org/10.1186/s12874-019-0681-4
Silva, E. C. E., Lopes, I., Correia, A., & Faria, S. (2020). A logistic regression model for consumer default risk. Journal of Applied Statistics, 47(13–15), 2879–2894. https://doi.org/10.1080/02664763.2020.1759030
Singh, B. E. R., & Sivasankar, E. (2019). Enhancing Prediction Accuracy of Default of Credit Using Ensemble Techniques. In Advances in intelligent systems and computing (pp. 427–436). Springer Nature. https://doi.org/10.1007/978-981-13-1580-0_41
Singh, U., Hur, M., Dorman, K. S., & Wurtele, E. S. (2020). MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets. Nucleic Acids Research, 48(4), e23. https://doi.org/10.1093/nar/gkz1209
Speiser, J. L., Miller, M. I., Tooze, J. A., & Ip, E. H. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems With Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028
Terho, H., Salonen, A., & Yrjänen, M. (2022). Toward a contextualized understanding of inside sales: the role of sales development in effective lead funnel management. Journal of Business & Industrial Marketing, 38(2), 337–352. https://doi.org/10.1108/jbim-12-2021-0596
Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers &Amp; Education, 143, 103676. https://doi.org/10.1016/j.compedu.2019.103676
Tyralis, H., Tyralis, H., & Langousis, A. (2019). A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources. Water, 11(5), 910. https://doi.org/10.3390/w11050910
Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-1004-8
Verma, S. K., Sharma, R., Deb, S., & Maitra, D. (2021). Artificial intelligence in marketing: Systematic review and future research direction. International Journal of Information Management Data Insights, 1(1), 100002. https://doi.org/10.1016/j.jjimei.2020.100002
Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). A Comparative Assessment of Credit Risk Model Based on Machine Learning ——a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069
Wen, L., & Hughes, M. (2020). Coastal Wetland Mapping Using Ensemble Learning Algorithms: A Comparative Study of Bagging, Boosting and Stacking Techniques. Remote Sensing, 12(10), 1683. https://doi.org/10.3390/rs12101683
Wickramasinghe, I., & Kalutarage, H. (2020). Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Computing, 25(3), 2277–2293. https://doi.org/10.1007/s00500-020-05297-6
Wu, M., Andreev, P., & Benyoucef, M. (2023). The state of lead scoring models and their impact on sales performance. Information Technology & Management. https://doi.org/10.1007/s10799-023-00388-w
Yang, F. J. (2018). An Implementation of Naive Bayes Classifier. 2018 International Conference on Computational Science and Computational Intelligence (CSCI). https://doi.org/10.1109/csci46756.2018.00065
Zabor, E. C., Reddy, C. A., Tendulkar, R. D., & Patil, S. (2021). Logistic Regression in Clinical Studies. International Journal of Radiation Oncology Biology Physics, 112(2), 271–277. https://doi.org/10.1016/j.ijrobp.2021.08.007
Zumstein, D., Oswald, C., Gasser, M., Lutz, R., & Schoepf, A. M. (2021). Lead Generation and Lead Qualification Through Data-Driven Marketing in B2B. Marketing Automation Report 2021. https://doi.org/10.21256/zhaw-2402