Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Paper has been accepted by Neurocomputing 415(2020): 106–113.

Authors: Yuang Liu, Wei Zhang and Jun Wang.

Links: [ pdf ] [ code ]

Requirements

PyTorch >= 1.0.0
Jupyter
visdom

Introduction

Knowledge distillation (KD) is an effective learning paradigm for improving the performance of light-weight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework (AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets (high-level knowledge) and (ii) enabling the intermediate-level hints (intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors.

Citation

@article{LIU2020106,
    title = {Adaptive multi-teacher multi-level knowledge distillation},
    author = {Yuang Liu and Wei Zhang and Jun Wang},
    journal = {Neurocomputing},
    volume = {415},
    pages = {106 -- 113},
    year = {2020},
    issn = {0925 -- 2312},
}

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Related tags

Overview

Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Requirements

Introduction

Citation

Owner

Frank Liu

Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

Edison AT is software Depression Assistant personal.

Weakly-supervised semantic image segmentation with CNNs using point supervision

Code for ICE-BeeM paper - NeurIPS 2020

Lab Materials for MIT 6.S191: Introduction to Deep Learning

Neural Fixed-Point Acceleration for Convex Optimization

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

OpenL3: Open-source deep audio and image embeddings

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

This code implements constituency parse tree aggregation

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

Sound Event Detection with FilterAugment

Auditing Black-Box Prediction Models for Data Minimization Compliance

A minimalist tool to display a network graph.

Mahadi-Now - This Is Pakistani Just Now Login Tools

[ECCV2020] Content-Consistent Matching for Domain Adaptive Semantic Segmentation

Creative Applications of Deep Learning w/ Tensorflow

Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)

Affine / perspective transformation in Pose Estimation with Tensorflow 2

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization