This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text. The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text.
The model achieves the following stats on the validation set:
Metric | Value |
---|---|
Loss | 0.0788 |
F1 Score | 0.8619 |
Precision | 0.8362 |
Recall | 0.8893 |
Accuracy | 0.9792 |
- Downloads last month
- 26
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.