A Study on Neural Network Pruning Techniques for Efficient Model Deployment
Prasadu Gurram,
Y. Sindhura,
Bandari Ravi
In this study, we explore the effectiveness of various neural network pruning techniques for optimizing
model deployment in resource-constrained environments. The increasing demand for efficient neural
networks, particularly in edge devices and mobile applications, necessitates methods that reduce
model complexity without significantly compromising performance. We investigate five pruning
strategies—weight pruning, neuron pruning, structured pruning, unstructured pruning, and layer
pruning—comparing their impact on model size, inference time, and accuracy. Our experimental
results demonstrate that while all pruning methods achieve substantial reductions in model size and
computational cost, they differ in their trade-offs between efficiency and accuracy. Structured and
unstructured pruning emerge as particularly effective, offering a balance between model compactness
and performance retention, making them suitable for deployment on hardware-optimized architectures.
These findings provide valuable insights into selecting appropriate pruning strategies for specific
applications, guiding the development of more efficient neural networks in practical scenarios.