A Hybrid Convolutional Neural Network and Vision Transformer Framework for Robust Counterfeit Logo Detection in Brand Protection Systems
P. Yamini,
N. Mahendar,
N. Anirudh,
M. Ajay
Counterfeit logos on products, packaging, and digital media cause significant economic losses to
brands and undermine consumer trust. Manual inspection is inefficient and error-prone for largescale
monitoring. This paper proposes a hybrid Convolutional Neural Network (CNN) and Vision
Transformer (ViT) framework for robust counterfeit logo detection in brand protection systems. The
model combines CNN's local feature extraction (via ResNet or EfficientNet backbone) with ViT's global
attention mechanisms to capture both fine-grained forgery artifacts (e.g., texture inconsistencies,
edge distortions) and holistic structural deviations. Preprocessing includes data augmentation and
normalization; classification uses a fusion layer for final genuine/fake/binary or multi-class output.
Evaluated on benchmark datasets (FlickrLogos-32, custom counterfeit logos) and real-world images,
the framework achieves high accuracy (96.8%), precision (96.5%), recall (96.2%), F1-score (96.3%),
and low false positive rate. It demonstrates superior robustness to lighting variations, distortions,
and partial occlusions compared to standalone CNN or ViT models. The system supports real-time
integration in e-commerce, supply chain monitoring, and anti-counterfeiting platforms while preserving
computational efficiency.