# Knowledge distillation: A good teacher is patient and consistent *by Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov* ## Introduction We publish all teacher models, and configurations for the main experiments of the paper, as well as training logs and student models. Please read the main [big_vision README](/README.md) to learn how to run configs, and remember that each config file contains an example invocation in the top-level comment. ## Results We provide the following [colab to read and plot the logfiles](https://colab.research.google.com/drive/1nMykzUzsfQ_uAxfj3k35DYsATnG_knPl?usp=sharing) of a few runs that we reproduced on Cloud. ### ImageNet-1k The file [bit_i1k.py](bit_i1k.py) is the configuration which reproduces our distillation runs on ImageNet-1k reported in Figures 1 and 5(left) and the first row of Table1. We release both student and teacher models: | Model | Download link | Resolution | ImageNet top-1 acc. (paper) | | :--- | :---: | :---: | :---: | | BiT-R50x1 | [link](https://storage.googleapis.com/bit_models/distill/R50x1_160.npz) | 160 | 80.5 | | BiT-R50x1 | [link](https://storage.googleapis.com/bit_models/distill/R50x1_224.npz) | 224 | 82.8 | | BiT-R152x2 | [link](https://storage.googleapis.com/bit_models/distill/R152x2_T_224.npz) | 224 | 83.0 | | BiT-R152x2 | [link](https://storage.googleapis.com/bit_models/distill/R152x2_T_384.npz) | 384 | 84.3 | ### Flowers/Pet/Food/Sun The files [bigsweep_flowers_pet.py](bigsweep_flowers_pet.py) and [bigsweep_food_sun.py](bigsweep_food_sun.py) can be used to reproduce the distillation runs on these datasets and shown in Figures 3,4,9-12, and Table4. While our open-source release does not currently support doing hyper-parameter sweeps, we still provide an example of the sweeps at the end of the configs for reference. ### Teacher models Links to all teacher models we used can be found in [common.py](common.py).