PhilipMay commited on
Commit
8431077
·
1 Parent(s): d1fdc35

add training details

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -18,12 +18,39 @@ This is a [sentence-transformers](https://www.SBERT.net) model:
18
  It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space.
19
  The model is intended to be used together with [SetFit](https://github.com/huggingface/setfit)
20
  to improve German few-shot text classification.
 
 
21
 
22
  This model is based on [deepset/gbert-large](https://huggingface.co/deepset/gbert-large).
23
  Many thanks to [deepset](https://www.deepset.ai/)!
24
 
25
  ## Training
26
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Evaluation Results
29
  We use the [NLU Few-shot Benchmark - English and German](https://huggingface.co/datasets/deutsche-telekom/NLU-few-shot-benchmark-en-de)
 
18
  It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space.
19
  The model is intended to be used together with [SetFit](https://github.com/huggingface/setfit)
20
  to improve German few-shot text classification.
21
+ It has has a sibling model called
22
+ [deutsche-telekom/gbert-large-paraphrase-cosine](https://huggingface.co/deutsche-telekom/gbert-large-paraphrase-cosine).
23
 
24
  This model is based on [deepset/gbert-large](https://huggingface.co/deepset/gbert-large).
25
  Many thanks to [deepset](https://www.deepset.ai/)!
26
 
27
  ## Training
28
+
29
+ **Loss Function**\
30
+ We have used [BatchHardSoftMarginTripletLoss](https://www.sbert.net/docs/package_reference/losses.html#batchhardsoftmargintripletloss) with eucledian distance as the loss function:
31
+
32
+ ``` python
33
+ train_loss = losses.BatchHardSoftMarginTripletLoss(
34
+ model=model,
35
+ distance_metric=BatchHardTripletLossDistanceFunction.eucledian_distance,
36
+ )
37
+ ```
38
+
39
+ **Training Data**\
40
+ The model is trained on a carefully filtered dataset of
41
+ [deutsche-telekom/ger-backtrans-paraphrase](https://huggingface.co/datasets/deutsche-telekom/ger-backtrans-paraphrase).
42
+ We deleted the following pairs of sentences:
43
+ - `min_char_len` less than 15
44
+ - `jaccard_similarity` greater than 0.3
45
+ - `de_token_count` greater than 30
46
+ - `en_de_token_count` greater than 30
47
+ - `cos_sim` less than 0.85
48
+
49
+ **Hyperparameters**
50
+ - learning_rate: 5.5512022294147105e-06
51
+ - num_epochs: 7
52
+ - train_batch_size: 68
53
+ - num_gpu: ???
54
 
55
  ## Evaluation Results
56
  We use the [NLU Few-shot Benchmark - English and German](https://huggingface.co/datasets/deutsche-telekom/NLU-few-shot-benchmark-en-de)