Neural Network Compression - The Functional Perspective (+ Extensions)

Israel Mason-Williams presented and discussed the paper Neural Network Compression: The Functional Perspective (+ Extensions) which was accepted at PML4LRS Workshop at ICLR 2024 (Mason-Williams, 2024) and with extensions accepted at NeurIPS Sci4DL Workshop (Mason-Williams et al., 2024).

Abstract

Compression techniques, such as Knowledge distillation, Pruning, and Quantization reduce the computational costs of model inference and enable on-edge machine learning. The efficacy of compression methods is often evaluated through the proxy of accuracy and loss to understand similarity of the compressed model. This study aims to explore the functional divergence between compressed and uncompressed models. The results indicate that Quantization and Pruning create models that are functionally similar to the original model. In contrast, Knowledge distillation creates models that do not functionally approximate their teacher models. The compressed model resembles the dissimilarity of function observed in independently trained models. Therefore, it is verified, via a functional understanding, that Knowledge distillation is not a compression method. Thus, leading to the definition of Knowledge distillation as a training regulariser given that no knowledge is distilled from a teacher to a student.

Neural Network Compression - The Functional Perspective (+ Extensions)

References

Enjoy Reading This Article?