README.md · KnutJaegersberg/wikipedia_categories

metadata

pipeline_tag: sentence-similarity
tags:
  - feature-extraction
  - sentence-similarity
  - setfit
  - e5
license: mit
datasets:
  - KnutJaegersberg/wikipedia_categories
  - KnutJaegersberg/wikipedia_categories_labels

This English model (e5-large as basis) predicts wikipedia categories (roundabout 37 labels). It is trained on the concatenation of the headlines of the lower level categories articles in few shot setting (i.e. 8 subcategories with their headline concatenations per level 2 category). Accuracy on test data split is 85 %. Note that these numbers are just an indicator that training worked, it will differ in production settings, which is why this classifier is meant for corpus exploration.
Use the wikipedia_categories_labels dataset as key.

from setfit import SetFitModel

Download from Hub and run inference model = SetFitModel.from_pretrained("KnutJaegersberg/wikipedia_categories_setfit")

Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])