File size: 10,565 Bytes
5679226
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
library_name: setfit
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
metrics:
- accuracy
widget:
- text: Who was Cleopatra? She was a queen of ancient Egypt.
- text: Did you go anywhere interesting this weekend? Yes, I went to the zoo.
- text: Can robots think like humans? Not exactly, but AI can mimic some thinking
    processes.
- text: Can you name an adjective? 'Quick' is an adjective because it describes.
- text: How does the water cycle work? Water evaporates, condenses into clouds, and
    then precipitates back to the ground.
pipeline_tag: text-classification
inference: true
base_model: BAAI/bge-small-en-v1.5
---

# SetFit with BAAI/bge-small-en-v1.5

This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
2. Training a classification head with features from the fine-tuned Sentence Transformer.

## Model Details

### Model Description
- **Model Type:** SetFit
- **Sentence Transformer body:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
- **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
- **Maximum Sequence Length:** 512 tokens
- **Number of Classes:** 7 classes
<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)

### Model Labels
| Label      | Examples                                                                                                                                                                                                                                                                                                                                                                            |
|:-----------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| English    | <ul><li>"Can you tell me about your favorite book? I love 'Harry Potter' because it's full of magic and adventure."</li><li>'What did you learn about poems today? We learned about rhymes and how they create a rhythm in poems.'</li><li>"Can you make a sentence using the word 'enigmatic'? The old man's smile was enigmatic, making me wonder what secrets he hid."</li></ul> |
| Math       | <ul><li>"What is 8 times 9? It's 72."</li><li>'How do you find the area of a rectangle? Multiply the length by the width.'</li><li>"What's the difference between a prime number and a composite number? A prime number has only two factors, 1 and itself, while a composite number has more than two factors."</li></ul>                                                          |
| Art        | <ul><li>'What colors do you mix to make green? Yellow and blue make green.'</li><li>'Who painted the Mona Lisa? Leonardo da Vinci painted it.'</li><li>"What's the difference between sculpture and pottery? Sculpture is the art of making figures while pottery is specifically making vessels from clay."</li></ul>                                                              |
| Science    | <ul><li>"What is photosynthesis? It's the process by which plants make their food using sunlight."</li><li>'Can you name the planets in our solar system? Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.'</li><li>"What's the difference between a solid and a liquid? A solid has a fixed shape while a liquid takes the shape of its container."</li></ul>    |
| History    | <ul><li>'Who was the first president of the United States? George Washington was the first president.'</li><li>'Can you tell me about the Egyptian pyramids? They were massive tombs built for pharaohs, the biggest is the Pyramid of Giza.'</li><li>'What was the Renaissance? It was a period of great cultural and scientific advancement in Europe.'</li></ul>                 |
| Technology | <ul><li>"What is the Internet? It's a global network of computers that can share information."</li><li>'Can you name a famous computer scientist? Alan Turing is known as one of the fathers of computer science.'</li><li>"What does 'AI' stand for? It stands for Artificial Intelligence."</li></ul>                                                                             |
| NONE       | <ul><li>'What did you have for lunch today? I had a sandwich and some fruit.'</li><li>'Do you like playing outside? Yes, I love playing soccer with my friends.'</li><li>"What's your favorite TV show? I love watching 'SpongeBob SquarePants'."</li></ul>                                                                                                                         |

## Uses

### Direct Use for Inference

First install the SetFit library:

```bash
pip install setfit
```

Then you can load this model and run inference.

```python
from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("bew/setfit-subject-model-basic")
# Run inference
preds = model("Who was Cleopatra? She was a queen of ancient Egypt.")
```

<!--
### Downstream Use

*List how someone could finetune this model on their own dataset.*
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Set Metrics
| Training set | Min | Median  | Max |
|:-------------|:----|:--------|:----|
| Word count   | 6   | 14.1333 | 30  |

| Label      | Training Sample Count |
|:-----------|:----------------------|
| Art        | 10                    |
| English    | 10                    |
| History    | 10                    |
| Math       | 10                    |
| NONE       | 15                    |
| Science    | 10                    |
| Technology | 10                    |

### Training Hyperparameters
- batch_size: (32, 32)
- num_epochs: (10, 10)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: False

### Training Results
| Epoch  | Step | Training Loss | Validation Loss |
|:------:|:----:|:-------------:|:---------------:|
| 0.0067 | 1    | 0.1987        | -               |
| 0.3333 | 50   | 0.1814        | -               |
| 0.6667 | 100  | 0.128         | -               |
| 1.0    | 150  | 0.0146        | -               |
| 1.3333 | 200  | 0.006         | -               |
| 1.6667 | 250  | 0.0037        | -               |
| 2.0    | 300  | 0.0031        | -               |
| 2.3333 | 350  | 0.0027        | -               |
| 2.6667 | 400  | 0.0024        | -               |
| 3.0    | 450  | 0.0024        | -               |
| 3.3333 | 500  | 0.002         | -               |
| 3.6667 | 550  | 0.002         | -               |
| 4.0    | 600  | 0.0017        | -               |
| 4.3333 | 650  | 0.0019        | -               |
| 4.6667 | 700  | 0.0018        | -               |
| 5.0    | 750  | 0.0014        | -               |
| 5.3333 | 800  | 0.0013        | -               |
| 5.6667 | 850  | 0.0014        | -               |
| 6.0    | 900  | 0.0014        | -               |
| 6.3333 | 950  | 0.0014        | -               |
| 6.6667 | 1000 | 0.0016        | -               |
| 7.0    | 1050 | 0.0013        | -               |
| 7.3333 | 1100 | 0.0013        | -               |
| 7.6667 | 1150 | 0.0012        | -               |
| 8.0    | 1200 | 0.0014        | -               |
| 8.3333 | 1250 | 0.001         | -               |
| 8.6667 | 1300 | 0.0012        | -               |
| 9.0    | 1350 | 0.0014        | -               |
| 9.3333 | 1400 | 0.0012        | -               |
| 9.6667 | 1450 | 0.0012        | -               |
| 10.0   | 1500 | 0.0011        | -               |

### Framework Versions
- Python: 3.10.12
- SetFit: 1.0.3
- Sentence Transformers: 2.3.1
- Transformers: 4.35.2
- PyTorch: 2.1.0+cu121
- Datasets: 2.17.0
- Tokenizers: 0.15.2

## Citation

### BibTeX
```bibtex
@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->