Term Standardisation With LDA Model To Detect Service Disruption Events Using English And Manglish Tweets

Main Article Content

Noraysha Yusuf
Maizatul Akmar Ismail
Tasnim M.A. Zayet
Kasturi Dewi Varathan
Rafidah MD Noor


Rapid transit is one of Malaysia's most important transportation modes, where commuters use public transportation to travel. Any disruption in the rapid transit service affects their daily routines. Therefore, detecting such service disruption has become fundamental. In this study, the disruption in Malaysia's rapid transit service was assessed using English and Manglish (a combination of English and Malay) tweets through Latent Dirichlet Allocation (LDA). The gathered tweets were classified into event and non-event tweets and LDA was applied to the event tweets. Manglish event tweets were pre-processed using the proposed term standardisation technique.  As a result, LDA has proved its efficiency in topic detection for both English and Manglish tweets with better performance for Manglish tweets; The best event detection rate of the LDA_English model was at the likelihood of 80% while the best detection rate of the LDA_Manglish model was at a likelihood of 60%.

Article Details



H. Sun, J. Wu, L. Wu, X. Yan, and Z. Gao, "Estimating the influence of common disruptions on urban rail transit networks", Transportation Research Part A: Policy and Practice, vol. 94, pp. 62-75, 2016.

Y. Yuan, S. Li, L. Yang and Z. Gao, "Real-time optimization of train regulation and passenger flow control for urban rail transit network under frequent disturbances", Transportation Research Part E: Logistics and Transportation Review, vol. 168, pp.102942, 2022.

I. Y. Oh and Y. Y. Lim, “Commuter chaos as Kelana Jaya Line breaks down”, The Star, 2019. Available: https://www.thestar.com.my/news/nation/2019/02/18/commuter-chaos-as-kelana-jaya-line-breaks-down/

T. Lee, and M. Tso, “A universal sensor data platform modeled for realtime asset condition surveillance and big data analytics for railway systems: Developing a “Smart Railway” mastermind for the betterment of reliability, availability, maintainability and safety of railway systems and passenger service”, Sensors, pp. 1-3, 2016.

F. Zantalis, G. Koulouras, S. Karabetsos and D. Kandris, "A review of machine learning and IoT in smart transportation.", Future Internet, vol. 94, pp. 11.4, 2019.

S. M. Grant-Muller, A. Gal-Tzur, E. Minkov, S. Nocera, T. Kuflik and I. Shoor, “ Enhancing transport data collection through social media sources: methods, challenges and opportunities for textual data”, IET Intelligent Transport Systems, vol. 9.4, pp. 407-417, 2014.

P. Fraga-Lamas, T. M. Fernández-Caramés L. and Castedo, “Towards the Internet of smart trains: A review on industrial IoT-connected railways”, Sensors, vol 17.6, pp. 1457, 2017.

A. Lim, “Train door sensors may cut delays at MRT stations”, The Straits Times, 2018. Available: https://www.straitstimes.com/singapore/transport/train-door-sensors-may-cut-delays-at-mrt-stations.

J. X. Chew, “Condition monitoring of train door system 2”, Nanyang Technological University, 2020. Available: https://hdl.handle.net/10356/141443

J. Weng, Y. Zheng, X. Qu and X. Yan, “Development of a maximum likelihood regression tree-based model for predicting subway incident delay”, Transportation Research Part C: Emerging Technologies, vol. 57, pp. 30-41, 2015.

J. Weng, Y. Zheng, X. Yan and Q. Meng,” Development of a subway operation incident delay model using accelerated failure time approaches”, Accident Analysis & Prevention, vol. 73, pp. 12-19, 2014.

S. M. Grant-Muller, A. Gal-Tzur, E. Minkov, S. Nocera, T. Kuflik and I. Shoor, “ Enhancing transport data collection through social media sources: methods, challenges and opportunities for textual data”, IET Intelligent Transport Systems, vol. 9.4, pp. 407-417, 2014.

A. M. Ertugrul, B. Velioglu and P. Karagoz, “ Word embedding based event detection on social media”, International Conference on Hybrid Artificial Intelligence Systems, Springer, 2017, pp. 3-14.

D. Ramachandran and P. Ramasubramanian,” Event detection from Twitter–a survey”, International Journal of Web Information Systems, 2018.

L. Zou and W. W. Song, “ Lda-tm: A two-step approach to Twitter topic data clustering”, IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), 2016, pp. 342-347.

D. Shang, X. Y. Dai, W. Ge, S. Huang and J. Chen,” A Multi-view Clustering Model for Event Detection in Twitter”, International Conference on Computational Linguistics and Intelligent Text Processing, Springer, 2017, pp. 366-378.

N. N. Haghighi, X. C. Liu, R. Wei, W. Li and H. Shao, “Using Twitter data for transit performance assessment: a framework for evaluating transit riders’ opinions about quality of service”, Public Transport, vol. 10.2, pp. 363-377, 2018.

T. Hoang, P. H. Cher, P. K. Prasetyo and E. P. Lim, “ Crowdsensing and analyzing micro-event tweets for public transportation insights”, IEEE International Conference on Big Data (Big Data), 2016, pp. 2157-2166.

K. Balakrishnan, “Influence of Cultural Dimensions on Intercultural Communication Styles: Ethnicity in a Moderating Role”, JCLC, vol. 2.1, pp. 46–62, 2022.

T. Ji, K. Fu, N. Self, C. T. Lu and N. Ramakrishnan,” Multi-task learning for transit service disruption detection”, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2018, pp. 634-641.

E. Mogaji and I. Erkan, “ Insight into consumer experience on UK train transportation services”, Travel Behaviour and Society, vol. 14, pp. 21-33, 2019.

S. L. Lo, R. Chiong and D. Cornforth, “ An unsupervised multilingual approach for online social media topic identification”, Expert Systems with Applications, vol. 81, pp. 282-298, 2017.

G. Abali, E. Karaarslan, A. Hurriyetoglu, and F. Dalkilic, "Detecting citizen problems and their locations using twitter data", IEEE International Istanbul Smart Grids and Cities Congress and Fair (ICSG), 2018, pp. 30-33.

M. O. Pratama, W. Satyawan, R. Jannati, B. Pamungkas, M. E. Syahputra and I. Neforawati, "The sentiment analysis of Indonesia commuter line using machine learning based on twitter data", Journal of Physics: Conference Series, IOP Publishing, Vol. 1193.1, 2019.

G. Currie and C. Muir, “Understanding passenger perceptions and behaviors during unplanned rail disruptions”, Transportation research procedia, vol. 25, pp. 4392-4402, 2017.

I. Casas and E. C. Delmelle, “ Tweeting about public transit—Gleaning public perceptions from a social media microblog”, Case Studies on Transport Policy, vol. 5.4, pp. 634-642, 2017.

R. I. Sarker, S. Kaplan, M. Mailer and H. J. Timmermans, “Applying affective event theory to explain transit users’ reactions to service disruptions”, Transportation Research Part A: Policy and Practice, vol. 130, pp. 593-605, 2019.

N. Keane, C. Yee and L. Zhou, “ Using topic modeling and similarity thresholds to detect events”, Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, 2015, pp. 34-42.

Z. S. Qian, “Real-time incident detection using social media data”, Pennsylvania. Dept. of Transportation, (No. FHWA-PA-2016-004-CMU WO 03), 2016.

M. S. Mredula, N. Dey, M. S. Rahman, I. Mahmud and Y. Z. Cho, “A Review on the Trends in Event Detection by Analyzing Social Media Platforms’ Data”, Sensors, vol. 22.12, pp. 4531, 2022.

Z. Zhang, M. Ni, J. Gao and Q. He, “Mining transportation information from social media for planned and unplanned events”, 2016.

G. Paltoglou, "Sentiment-based event detection in Twitter”, Journal of the Association for Information Science and Technology, vol. 67.7, pp. 1576-1587, 2016.

C. Vicient and A. Moreno,”Unsupervised topic discovery in micro-blogging networks”, Expert Systems with Applications, vol. 42, pp. 17-18, 2015.

M. Hajjem and C. Latiri,” Combining IR and LDA topic modeling for filtering microblogs”, Procedia Computer Science, vol. 112, pp. 761-770, 2017.

K. Rein, R. Coote, L. Sikorski and U. Schade, “ Standardization to Deal with Multilingual Information in Social Media During Large-Scale Crisis Situations Using Crisis Management Language”, Application of Social Media in Crisis Management, Springer, pp. 115-131, 2017.

A. M. Bucur, A. Cosma, and L. P. Dinu, "Sequence-to-sequence lexical normalization with multilingual transformers", arXiv preprint, 2021.

R. van der Goot, A. Ramponi, A. Zubiaga, B. Plank, B. Muller, I. S. V. Roncal and W. Sidorenko, "MultiLexNorm: A shared task on multilingual lexical normalization", Seventh Workshop on Noisy User-generated Text, Association for Computational Linguistics, 2021.

K. Dashtipour, S. Poria, A. Hussain, E. Cambria, A. Y. Hawalah, A. Gelbukh and Q. Zhou, “Multilingual sentiment analysis: state of the art and independent comparison of techniques”, Cognitive computation, vol. 8.4, pp. 757-771, 2016.

H. Saadany, C. Orasan, R. C. Quintana, F. D. Carmo and L. Zilio,"Challenges in Translation of Emotions in Multilingual User-Generated Content: Twitter as a Case Study.", arXiv preprint, 2021.

E. D. Gutiérrez, E. Shutova, P. Lichtenstein, G. de Melo and L. Gilardi,”Detecting cross-cultural differences using a multilingual topic model”, Transactions of the Association for Computational Linguistics, vol. 4, pp. 47-60, 2016.

A. Balahur and J. M. Perea-Ortega, “ Sentiment analysis system adaptation for multilingual processing: The case of tweets.”, Information Processing & Management, vol. 51.4, pp. 547-556, 2015.

S. L. Lo, R. Chiong and D. Cornforth, “An unsupervised multilingual approach for online social media topic identification”, Expert Systems with Applications, vol. 81, pp. 282-298, 2017.