A Hybrid Convolutional Neural Network and Vision Transformer Framework for Robust Counterfeit Logo Detection in Brand Protection Systems

doi:10.33425/3066-1226.1214

Global Journal of Engineering Innovations and Interdisciplinary Research

A Hybrid Convolutional Neural Network and Vision Transformer Framework for Robust Counterfeit Logo Detection in Brand Protection Systems

Counterfeit logos on products, packaging, and digital media cause significant economic losses to brands and undermine consumer trust. Manual inspection is inefficient and error-prone for largescale monitoring. This paper proposes a hybrid Convolutional Neural Network (CNN) and Vision Transformer (ViT) framework for robust counterfeit logo detection in brand protection systems. The model combines CNN's local feature extraction (via ResNet or EfficientNet backbone) with ViT's global attention mechanisms to capture both fine-grained forgery artifacts (e.g., texture inconsistencies, edge distortions) and holistic structural deviations. Preprocessing includes data augmentation and normalization; classification uses a fusion layer for final genuine/fake/binary or multi-class output. Evaluated on benchmark datasets (FlickrLogos-32, custom counterfeit logos) and real-world images, the framework achieves high accuracy (96.8%), precision (96.5%), recall (96.2%), F1-score (96.3%), and low false positive rate. It demonstrates superior robustness to lighting variations, distortions, and partial occlusions compared to standalone CNN or ViT models. The system supports real-time integration in e-commerce, supply chain monitoring, and anti-counterfeiting platforms while preserving computational efficiency.

PDF