pranavSIT's picture
added pali inference
74e8f2f
# Knowledge distillation: A good teacher is patient and consistent
*by Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov*
## Introduction
We publish all teacher models, and configurations for the main experiments of
the paper, as well as training logs and student models.
Please read the main [big_vision README](/README.md) to learn how to run
configs, and remember that each config file contains an example invocation in
the top-level comment.
## Results
We provide the following [colab to read and plot the logfiles](https://colab.research.google.com/drive/1nMykzUzsfQ_uAxfj3k35DYsATnG_knPl?usp=sharing)
of a few runs that we reproduced on Cloud.
### ImageNet-1k
The file [bit_i1k.py](bit_i1k.py) is the configuration which reproduces our
distillation runs on ImageNet-1k reported in Figures 1 and 5(left) and the first
row of Table1.
We release both student and teacher models:
| Model | Download link | Resolution | ImageNet top-1 acc. (paper) |
| :--- | :---: | :---: | :---: |
| BiT-R50x1 | [link](https://storage.googleapis.com/bit_models/distill/R50x1_160.npz) | 160 | 80.5 |
| BiT-R50x1 | [link](https://storage.googleapis.com/bit_models/distill/R50x1_224.npz) | 224 | 82.8 |
| BiT-R152x2 | [link](https://storage.googleapis.com/bit_models/distill/R152x2_T_224.npz) | 224 | 83.0 |
| BiT-R152x2 | [link](https://storage.googleapis.com/bit_models/distill/R152x2_T_384.npz) | 384 | 84.3 |
### Flowers/Pet/Food/Sun
The files [bigsweep_flowers_pet.py](bigsweep_flowers_pet.py) and
[bigsweep_food_sun.py](bigsweep_food_sun.py) can be used to reproduce the
distillation runs on these datasets and shown in Figures 3,4,9-12, and Table4.
While our open-source release does not currently support doing hyper-parameter
sweeps, we still provide an example of the sweeps at the end of the configs
for reference.
### Teacher models
Links to all teacher models we used can be found in [common.py](common.py).