A Study on Neural Network Pruning Techniques for Efficient Model Deployment

Prasadu Gurram; Y. Sindhura; Bandari Ravi

doi:10.33425/3066-1226.1179

Global Journal of Engineering Innovations and Interdisciplinary Research

A Study on Neural Network Pruning Techniques for Efficient Model Deployment

Prasadu Gurram, Y. Sindhura, Bandari Ravi

10.33425/3066-1226.1179

In this study, we explore the effectiveness of various neural network pruning techniques for optimizing model deployment in resource-constrained environments. The increasing demand for efficient neural networks, particularly in edge devices and mobile applications, necessitates methods that reduce model complexity without significantly compromising performance. We investigate five pruning strategies—weight pruning, neuron pruning, structured pruning, unstructured pruning, and layer pruning—comparing their impact on model size, inference time, and accuracy. Our experimental results demonstrate that while all pruning methods achieve substantial reductions in model size and computational cost, they differ in their trade-offs between efficiency and accuracy. Structured and unstructured pruning emerge as particularly effective, offering a balance between model compactness and performance retention, making them suitable for deployment on hardware-optimized architectures. These findings provide valuable insights into selecting appropriate pruning strategies for specific applications, guiding the development of more efficient neural networks in practical scenarios.

PDF