Scratch Train for Lightweight Models for Face Mask Detection

Main Article Content

Kai Liang Lew
Lazaroo Shane
Chean Khim Toa
Tetuko Kurniawan

Abstract

Automated systems for detecting face mask use in public became urgent during the COVID-19 pandemic. Most existing mask detection research fine-tunes ImageNet pre-trained backbones on relatively small datasets of masks. This approach raises concerns about model performance in situations with limited computational resources or when external pre-trained weights are not accessible. Additionally, there is a limited comparative analysis of recent lightweight architectures under consistent training conditions for mask detection tasks. This paper evaluates four state-of-the-art lightweight architectures for binary mask detection, including RepViT, ShuffleNetV2, EdgeNeXt-Small, and EfficientFormer. These models are trained from scratch using identical training protocols on two datasets containing 7,553 and 12,000 RGB images, respectively. The performances are then assessed using standardised metrics, including accuracy, precision, recall, and F1-score. Results demonstrate that ShuffleNet V2 achieves an optimal balance between classification accuracy and computational efficiency, delivering 0.987 accuracy on Dataset 1 and 0.998 accuracy on Dataset 2 while maintaining the fastest inference time of 0.464-0.667 milliseconds and the smallest model size of 1.26 million parameters. RepViT and EdgeNeXt-Small achieve slightly higher accuracy but require significant computational resources. EfficientFormer consistently underperforms across all evaluation metrics. These findings indicate that extremely lightweight CNNs can excel at mask detection when trained from scratch, making ShuffleNet V2 the ideal choice for resource-constrained deployment scenarios.


Manuscript received: 12 Jun 2025 | Revised: 30 Jul 2025 | Accepted: 24 Sep 2025 | Published: 31 Mar 2026

Article Details

How to Cite
Lew, K. L., Lazaroo Shane, Chean Khim Toa, & Tetuko Kurniawan. (2026). Scratch Train for Lightweight Models for Face Mask Detection. International Journal on Robotics, Automation and Sciences, 8(1), 87–95. https://doi.org/10.33093/ijoras.2026.8.1.9
Section
Article

References

C. Dewi, D. Manongga, Hendry, E. Mailoa and K. D. Hartomo, "Deep Learning and YOLOv8 Utilized in an Accurate Face Mask Detection System," Big Data Cognitive Computing, vol. 8, no. 1, p. 9, 2024.

DOI: https://doi.org/10.3390/bdcc8010009

R.A.S. Naseri, A. Kurnaz and H.M. Farhan, "Optimised face detector-based intelligent face mask detection model in IoT using deep learning approach," Applied Soft Computing, vol. 134, p. 109933, 2023.

DOI: https://doi.org/10.1016/j.asoc.2022.109933

B.U.H. Sheikh and A. Zafar, "RRFMDS: Rapid Real-Time Face Mask Detection System for Effective COVID-19 Monitoring," SN Computer Science, vol. 4, no. 3, p. 288, 2023.

DOI: https://doi.org/10.1007/s42979-023-01738-9

Vibhuti, N. Jindal, H. Singh and P.S. Rana, "Face mask detection in COVID-19: a strategic review," Multimedia Tools and Applications, vol. 81, no. 28, pp. 40013–40042, 2022.

DOI: https://doi.org/10.1007/s11042-022-12999-6

Y. Himeur, S. Al-Maadeed, I. Varlamis, N. Al-Maadeed, K. Abualsaud and A. Mohamed, "Face Mask Detection in Smart Cities Using Deep and Transfer Learning: Lessons Learned from the COVID-19 Pandemic," Systems, vol. 11, no. 2, p. 107, 2023.

DOI: https://doi.org/10.3390/systems11020107

J.V.B. Benifa et al., "FMDNet: An Efficient System for Face Mask Detection Based on Lightweight Model during COVID-19 Pandemic in Public Areas," Sensors, vol. 23, no. 13, p. 6090, 2023.

DOI: https://doi.org/10.3390/s23136090

H. Wang, Y. Gu and H. Li, "Research on Face Detection and Recognition with Face Mask Based on FaceNet," Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence, pp. 618–623, 2023.

DOI: https://doi.org/10.1145/3584376.3584485

H. Goyal, K. Sidana, C. Singh, A. Jain and S. Jindal, "A real time face mask detection system using convolutional neural network," Multimedia Tools and Applications, vol. 81, no. 11, pp. 14999–15015, 2022.

DOI: https://doi.org/10.1007/s11042-022-12166-x

A. Panda, D. Panigrahi, S. Mitra, S. Mittal and S. Rahimi, "Transfer Learning Applied to Computer Vision Problems: Survey on Current Progress, Limitations, and Opportunities," arXiv, 2024.

DOI: https://doi.org/10.48550/arXiv.2409.07736

Z. Zhao, L. Alzubaidi, J. Zhang, Y. Duan and Y. Gu, "A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations," Expert Systems with Applications, vol. 242, p. 122807, 2024.

DOI: https://doi.org/10.1016/j.eswa.2023.122807

A. Hosna, E. Merry, J. Gyalmo, Z. Alom, Z. Aung, and M.A. Azim, "Transfer learning: a friendly introduction," Journal of Big Data, vol. 9, no. 1, p. 102, 2022.

DOI: https://doi.org/10.1186/s40537-022-00652-w

K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv, 2014.

DOI: https://doi.org/10.48550/arXiv.1409.1556

S. Verma, "An automated face mask detection system using transfer learning based neural network to preventing viral infection," Expert Systems, vol. 41, no. 3, p. e13507, 2024.

DOI: https://doi.org/10.1111/exsy.13507

S.A. Mostafa, "A YOLO-based deep learning model for Real-Time face mask detection via drone surveillance in public spaces," Information Sciences, vol. 676, p. 120865, 2024.

DOI: https://doi.org/10.1016/j.ins.2024.120865

A. Kantarci, F. Ofli, M. Imran and H.K. Ekenel, "Bias-Aware Face Mask Detection Dataset," Multimedia Tools and Applications, 2024.

DOI: https://doi.org/10.1007/s11042-024-20226-7

Y. Suryawanshi, V. Meshram, V. Meshram, K. Patil, and P. Chumchu, "Face mask wearing image dataset: A comprehensive benchmark for image-based face mask detection models.," Data in Brief, vol. 51, p. 109755, 2023.

DOI: https://doi.org/10.1016/j.dib.2023.109755

N. Ma, X. Zhang, H. Zheng and J. Sun, "ShuffleNetV2: Practical Guidelines for Efficient CNN Architecture Design," arXiv, 2018.

DOI: https://doi.org/10.48550/arXiv.1807.11164

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv, 2020.

DOI: https://doi.org/10.48550/arXiv.2010.11929

S. Mehta and M. Rastegari, "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer," arXiv, 2021.

DOI: https://doi.org/10.48550/arXiv.2110.02178

M. Rodrigo, C. Cuevas and N. García, "Comprehensive comparison between vision transformers and convolutional neural networks for face recognition tasks.," Scientific Reports, vol. 14, no. 1, p. 21392, 2024.

DOI: https://doi.org/10.1038/s41598-024-72254-w

H. Cai, J. Li, M. Hu, C. Gan and S. Han, "EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction," arXiv, 2022.

DOI: https://doi.org/10.48550/arXiv.2205.14756

Y. Li, G. Yuan, Y. Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y. Wang and J. Ren, "EfficientFormer: Vision Transformers at MobileNet Speed," arXiv, 2022.

DOI: https://doi.org/10.48550/arXiv.2206.01191

Z. Chen, J. Chen, G. Ding, and H. Huang, "A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition," Multimedia Systems, vol. 29, no. 1, pp. 129–138, 2023.

DOI: https://doi.org/10.1007/s00530-022-00973-z

M. Maaz, A. Shaker, H. Cholakkal, S. Khan, S.W. Zamir, R.M. Anwer and F.S. Khan, "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications," arXiv, 2022.

DOI: https://doi.org/10.48550/arXiv.2206.10589

A. Wang, H. Chen, Z. Lin, J. Han and G. Ding, "RepViT: Revisiting Mobile CNN From ViT Perspective," arXiv, 2023.

DOI: https://doi.org/10.48550/arXiv.2307.09283

A. Jangra, "Face Mask Detection ~12K Images Dataset," 2020.

URL: https://www.kaggle.com/datasets/ashishjangra27/face-mask-12k-images-dataset (accessed: 12 June 2025)

O. Gurav, "Face Mask Detection Dataset," Kaggle, 2020.

URL: https://www.kaggle.com/datasets/omkargurav/face-mask-dataset (accessed: 9 June 2025)

K.L. Lew, K.S. Sim and Z. Ting, "Deep Learning Approach EEG Signal Classification," International Journal on Informatics Visualization, vol. 8, no. 3–2, pp. 1693–1702, 2024.

DOI: https://doi.org/10.62527/joiv.8.3-2.2959