Turka
/

distilbert-base-uncased-finetuned-squad-d5716d28

Question Answering

Inference Endpoints

Model card Files Files and versions Community

distilbert-base-uncased-finetuned-squad-d5716d28 / README.md

Turka's picture

Add config.json

d9983ab verified 8 months ago

|

1.58 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- question-answering
	datasets:
	- squad
	metrics:
	- squad
	thumbnail: https://github.com/karanchahal/distiller/blob/master/distiller.jpg
	---

	# DistilBERT with a second step of distillation

	## Model description

	This model replicates the "DistilBERT (D)" model from Table 2 of the [DistilBERT paper](https://arxiv.org/pdf/1910.01108.pdf). In this approach, a DistilBERT student is fine-tuned on SQuAD v1.1, but with a BERT model (also fine-tuned on SQuAD v1.1) acting as a teacher for a second step of task-specific distillation.

	In this version, the following pre-trained models were used:

	* Student: `distilbert-base-uncased`
	* Teacher: `lewtun/bert-base-uncased-finetuned-squad-v1`

	## Training data

	This model was trained on the SQuAD v1.1 dataset which can be obtained from the `datasets` library as follows:

	```python
	from datasets import load_dataset
	squad = load_dataset('squad')
	```

	## Training procedure

	## Eval results

	\| \| Exact Match \| F1 \|
	\|------------------\|-------------\|------\|
	\| DistilBERT paper \| 79.1 \| 86.9 \|
	\| Ours \| 78.4 \| 86.5 \|

	The scores were calculated using the `squad` metric from `datasets`.

	### BibTeX entry and citation info

	```bibtex
	@misc{sanh2020distilbert,
	title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
	author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
	year={2020},
	eprint={1910.01108},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```