Scratch Train for Lightweight Models for Face Mask Detection

Kai Liang Lew; Lazaroo Shane; Chean Khim Toa; Tetuko Kurniawan

doi:10.33093/ijoras.2026.8.1.9

PDF

Published: 31 March 2026

DOI: https://doi.org/10.33093/ijoras.2026.8.1.9

Keywords:

Lightweight, Classification, COVID-19, Mask Detection, Convolution Neural Network

Kai Liang Lew

Faculty of Engineering and Technology, Multimedia University (Malaysia)

Lazaroo Shane

Infineon Technologies, Melaka (Malaysia)

Chean Khim Toa

School of Computing and Data Science, Xiamen University Malaysia (Malaysia)

Tetuko Kurniawan

Institute of Fundamental Technological Research, Polish Academy of Sciences, Pawinskiego (Poland)

Abstract

Automated systems for detecting face mask use in public became urgent during the COVID-19 pandemic. Most existing mask detection research fine-tunes ImageNet pre-trained backbones on relatively small datasets of masks. This approach raises concerns about model performance in situations with limited computational resources or when external pre-trained weights are not accessible. Additionally, there is a limited comparative analysis of recent lightweight architectures under consistent training conditions for mask detection tasks. This paper evaluates four state-of-the-art lightweight architectures for binary mask detection, including RepViT, ShuffleNetV2, EdgeNeXt-Small, and EfficientFormer. These models are trained from scratch using identical training protocols on two datasets containing 7,553 and 12,000 RGB images, respectively. The performances are then assessed using standardised metrics, including accuracy, precision, recall, and F1-score. Results demonstrate that ShuffleNet V2 achieves an optimal balance between classification accuracy and computational efficiency, delivering 0.987 accuracy on Dataset 1 and 0.998 accuracy on Dataset 2 while maintaining the fastest inference time of 0.464-0.667 milliseconds and the smallest model size of 1.26 million parameters. RepViT and EdgeNeXt-Small achieve slightly higher accuracy but require significant computational resources. EfficientFormer consistently underperforms across all evaluation metrics. These findings indicate that extremely lightweight CNNs can excel at mask detection when trained from scratch, making ShuffleNet V2 the ideal choice for resource-constrained deployment scenarios.

Manuscript received: 12 Jun 2025 | Revised: 30 Jul 2025 | Accepted: 24 Sep 2025 | Published: 31 Mar 2026

How to Cite

Lew, K. L., Lazaroo Shane, Chean Khim Toa, & Tetuko Kurniawan. (2026). Scratch Train for Lightweight Models for Face Mask Detection. International Journal on Robotics, Automation and Sciences, 8(1), 87–95. https://doi.org/10.33093/ijoras.2026.8.1.9

Issue

Vol. 8 No. 1 (2026): International Journal on Robotics, Automation and Sciences

Section

Article

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

C. Dewi, D. Manongga, Hendry, E. Mailoa and K. D. Hartomo, "Deep Learning and YOLOv8 Utilized in an Accurate Face Mask Detection System," Big Data Cognitive Computing, vol. 8, no. 1, p. 9, 2024.

DOI: https://doi.org/10.3390/bdcc8010009

R.A.S. Naseri, A. Kurnaz and H.M. Farhan, "Optimised face detector-based intelligent face mask detection model in IoT using deep learning approach," Applied Soft Computing, vol. 134, p. 109933, 2023.

DOI: https://doi.org/10.1016/j.asoc.2022.109933

B.U.H. Sheikh and A. Zafar, "RRFMDS: Rapid Real-Time Face Mask Detection System for Effective COVID-19 Monitoring," SN Computer Science, vol. 4, no. 3, p. 288, 2023.

DOI: https://doi.org/10.1007/s42979-023-01738-9

Vibhuti, N. Jindal, H. Singh and P.S. Rana, "Face mask detection in COVID-19: a strategic review," Multimedia Tools and Applications, vol. 81, no. 28, pp. 40013–40042, 2022.

DOI: https://doi.org/10.1007/s11042-022-12999-6

Y. Himeur, S. Al-Maadeed, I. Varlamis, N. Al-Maadeed, K. Abualsaud and A. Mohamed, "Face Mask Detection in Smart Cities Using Deep and Transfer Learning: Lessons Learned from the COVID-19 Pandemic," Systems, vol. 11, no. 2, p. 107, 2023.

DOI: https://doi.org/10.3390/systems11020107

J.V.B. Benifa et al., "FMDNet: An Efficient System for Face Mask Detection Based on Lightweight Model during COVID-19 Pandemic in Public Areas," Sensors, vol. 23, no. 13, p. 6090, 2023.

DOI: https://doi.org/10.3390/s23136090

H. Wang, Y. Gu and H. Li, "Research on Face Detection and Recognition with Face Mask Based on FaceNet," Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence, pp. 618–623, 2023.

DOI: https://doi.org/10.1145/3584376.3584485

H. Goyal, K. Sidana, C. Singh, A. Jain and S. Jindal, "A real time face mask detection system using convolutional neural network," Multimedia Tools and Applications, vol. 81, no. 11, pp. 14999–15015, 2022.

DOI: https://doi.org/10.1007/s11042-022-12166-x

A. Panda, D. Panigrahi, S. Mitra, S. Mittal and S. Rahimi, "Transfer Learning Applied to Computer Vision Problems: Survey on Current Progress, Limitations, and Opportunities," arXiv, 2024.

DOI: https://doi.org/10.48550/arXiv.2409.07736

Z. Zhao, L. Alzubaidi, J. Zhang, Y. Duan and Y. Gu, "A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations," Expert Systems with Applications, vol. 242, p. 122807, 2024.

DOI: https://doi.org/10.1016/j.eswa.2023.122807

A. Hosna, E. Merry, J. Gyalmo, Z. Alom, Z. Aung, and M.A. Azim, "Transfer learning: a friendly introduction," Journal of Big Data, vol. 9, no. 1, p. 102, 2022.

DOI: https://doi.org/10.1186/s40537-022-00652-w

K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv, 2014.

DOI: https://doi.org/10.48550/arXiv.1409.1556

S. Verma, "An automated face mask detection system using transfer learning based neural network to preventing viral infection," Expert Systems, vol. 41, no. 3, p. e13507, 2024.

DOI: https://doi.org/10.1111/exsy.13507

S.A. Mostafa, "A YOLO-based deep learning model for Real-Time face mask detection via drone surveillance in public spaces," Information Sciences, vol. 676, p. 120865, 2024.

DOI: https://doi.org/10.1016/j.ins.2024.120865

A. Kantarci, F. Ofli, M. Imran and H.K. Ekenel, "Bias-Aware Face Mask Detection Dataset," Multimedia Tools and Applications, 2024.

DOI: https://doi.org/10.1007/s11042-024-20226-7

Y. Suryawanshi, V. Meshram, V. Meshram, K. Patil, and P. Chumchu, "Face mask wearing image dataset: A comprehensive benchmark for image-based face mask detection models.," Data in Brief, vol. 51, p. 109755, 2023.

DOI: https://doi.org/10.1016/j.dib.2023.109755

N. Ma, X. Zhang, H. Zheng and J. Sun, "ShuffleNetV2: Practical Guidelines for Efficient CNN Architecture Design," arXiv, 2018.

DOI: https://doi.org/10.48550/arXiv.1807.11164

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv, 2020.

DOI: https://doi.org/10.48550/arXiv.2010.11929

S. Mehta and M. Rastegari, "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer," arXiv, 2021.

DOI: https://doi.org/10.48550/arXiv.2110.02178

M. Rodrigo, C. Cuevas and N. García, "Comprehensive comparison between vision transformers and convolutional neural networks for face recognition tasks.," Scientific Reports, vol. 14, no. 1, p. 21392, 2024.

DOI: https://doi.org/10.1038/s41598-024-72254-w

H. Cai, J. Li, M. Hu, C. Gan and S. Han, "EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction," arXiv, 2022.

DOI: https://doi.org/10.48550/arXiv.2205.14756

Y. Li, G. Yuan, Y. Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y. Wang and J. Ren, "EfficientFormer: Vision Transformers at MobileNet Speed," arXiv, 2022.

DOI: https://doi.org/10.48550/arXiv.2206.01191

Z. Chen, J. Chen, G. Ding, and H. Huang, "A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition," Multimedia Systems, vol. 29, no. 1, pp. 129–138, 2023.

DOI: https://doi.org/10.1007/s00530-022-00973-z

M. Maaz, A. Shaker, H. Cholakkal, S. Khan, S.W. Zamir, R.M. Anwer and F.S. Khan, "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications," arXiv, 2022.

DOI: https://doi.org/10.48550/arXiv.2206.10589

A. Wang, H. Chen, Z. Lin, J. Han and G. Ding, "RepViT: Revisiting Mobile CNN From ViT Perspective," arXiv, 2023.

DOI: https://doi.org/10.48550/arXiv.2307.09283

A. Jangra, "Face Mask Detection ~12K Images Dataset," 2020.

URL: https://www.kaggle.com/datasets/ashishjangra27/face-mask-12k-images-dataset (accessed: 12 June 2025)

O. Gurav, "Face Mask Detection Dataset," Kaggle, 2020.

URL: https://www.kaggle.com/datasets/omkargurav/face-mask-dataset (accessed: 9 June 2025)

K.L. Lew, K.S. Sim and Z. Ting, "Deep Learning Approach EEG Signal Classification," International Journal on Informatics Visualization, vol. 8, no. 3–2, pp. 1693–1702, 2024.

DOI: https://doi.org/10.62527/joiv.8.3-2.2959

Article Sidebar

Main Article Content

Abstract

Article Details

References