热门搜索 :
考研考公
您的当前位置:首页正文

Knowledge Distill via NST

来源:东饰资讯网

Approach

Knowledge Transfer (KT), which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the popular solutions. In this paper, we propose a novel knowledge transfer method by treating it as a distribution matching problem. Particularly, we match the distributions of neuron selectivity patterns between teacher and student networks. To achieve this goal, we devise a new KT loss function by minimizing the Maximum Mean Discrepancy (MMD) metric between these distributions.

  • Notations


  • Maximum Mean Discrepancy

  • Neuron Selectivity Transfer
    The regions with high activations from a neuron may share some task related similarities, in order to capture these similarities, there should be also neurons mimic these activation patterns in student networks.

Considering the activation of each spatial position as one feature, then the flattened activation map of each filter is an sample the space of neuron selectivities of dimension HW . This sample distribution reflects how a CNN interpret an input image: where does the CNN focus on?

Then we can define Neuron Selectivity Transfer loss as:

linear kernel

Experiment


References:
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer,Naiyan Wang,2017, ArXiv

Top