carbonnnnn
/

T2L1DISTILBERT

Text Classification

Model card Files Files and versions

T2L1DISTILBERT / README.md

carbonnnnn's picture

Update README.md

f27ca70 over 2 years ago

|

history blame contribute delete

2.61 kB


	# Finetuned DistilBERT

	This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was
	introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found
	[here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation). This model is uncased: it does
	not make a difference between english and English.

	## Model description

	DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
	self-supervised fashion, using the BERT base model as a teacher. This model is further finetuned on the DB_PEDIA Dataset which can be found
	[here](https://huggingface.co/datasets/DeveloperOats/DBPedia_Classes). This dataset consists of 342,782 Wikipedia articles that have been cleaned and classified into hierarchical classes.
	The classification system spans three levels, with 9 classes at the first level, 70 classes at the second level,
	and 219 classes at the third level.

	## Intended uses & limitations
	You can use the model to extract structured content and organizing it into taxonomic categories.

	The model outputs the classification according to the label number which can be mapped by the following lines in the code snippet:

	labelint = ['LABEL_0', 'LABEL_1', 'LABEL_2', 'LABEL_3', 'LABEL_4', 'LABEL_5', 'LABEL_6', 'LABEL_7', 'LABEL_8']

	labeltxt = np.loadtxt("TASK2/label_vals/l1.txt", dtype="str")

	(where labeltxt is : Agent, Device, Event, Place, Species, SportsSeason, TopicalConcept, UnitOfWork, Work)




	### How to use

	You can use this model directly with a pipeline:
	```python
	from transformers import pipeline
	import numpy as np

	text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."
	classifier = pipeline("text-classification", model="carbonnnnn/T2L1DISTILBERT")
	labeltxt = np.loadtxt("TASK2/label_vals/l1.txt", dtype="str")
	labelint = ['LABEL_0', 'LABEL_1', 'LABEL_2', 'LABEL_3', 'LABEL_4', 'LABEL_5', 'LABEL_6', 'LABEL_7', 'LABEL_8']

	output = classifier(text)[0]['label']

	for i in range(len(labelint)):
	if output == labelint[i]:
	print("Output is : " + str(labeltxt[i]))

	```


	### Limitations and bias

	Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
	predictions. It also inherits some of
	[the bias of its teacher model](https://huggingface.co/bert-base-uncased#limitations-and-bias).


	## Evaluation results