teknology commited on
Commit
0a574cb
·
verified ·
1 Parent(s): e35cc6e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -3
README.md CHANGED
@@ -1,3 +1,46 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - microsoft/deberta-v3-base
6
+ pipeline_tag: text-classification
7
+ license: mit
8
+ ---
9
+ Binary classification model for ad-detection on QA Systems.
10
+
11
+ ## Sample usage
12
+
13
+ ```python
14
+ import torch
15
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
16
+ classifier_model_path = "teknology/ad-classifier-v0.4"
17
+ tokenizer = AutoTokenizer.from_pretrained(classifier_model_path)
18
+ model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path)
19
+ model.eval()
20
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
21
+ model.to(device)
22
+ def classify(passages):
23
+ inputs = tokenizer(
24
+ passages, padding=True, truncation=True, max_length=512, return_tensors="pt"
25
+ )
26
+ inputs = {k: v.to(device) for k, v in inputs.items()}
27
+ with torch.no_grad():
28
+ outputs = model(**inputs)
29
+ logits = outputs.logits
30
+ predictions = torch.argmax(logits, dim=-1)
31
+ return predictions.cpu().tolist()
32
+ preds = classify(["sample_text_1", "sample_text_2"])
33
+ ```
34
+
35
+
36
+ ## Version
37
+
38
+ Previous versions can be found at:
39
+ - v0.0: https://huggingface.co/jmvcoelho/ad-classifier-v0.0
40
+ Trained with the official data from Webis Generated Native Ads 2024
41
+ - v0.1: https://huggingface.co/jmvcoelho/ad-classifier-v0.1
42
+ Trained with v0.0 data + new synthetic data
43
+ - v0.2: https://huggingface.co/jmvcoelho/ad-classifier-v0.2
44
+ Similar to v0.1, but include more diversity in ad placement startegies through prompting.
45
+ - v0.3: Continued from v0.2, added a new synthetic dataset generated based on Wikipedia articles.
46
+ - **v0.4**: Same training data composition as v0.3, but curriculum learning with the mixed data.