|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: sentence-transformers |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:6300 |
|
- loss:MatryoshkaLoss |
|
- loss:MultipleNegativesRankingLoss |
|
base_model: BAAI/bge-small-en-v1.5 |
|
datasets: [] |
|
metrics: |
|
- cosine_accuracy@1 |
|
- cosine_accuracy@3 |
|
- cosine_accuracy@5 |
|
- cosine_accuracy@10 |
|
- cosine_precision@1 |
|
- cosine_precision@3 |
|
- cosine_precision@5 |
|
- cosine_precision@10 |
|
- cosine_recall@1 |
|
- cosine_recall@3 |
|
- cosine_recall@5 |
|
- cosine_recall@10 |
|
- cosine_ndcg@10 |
|
- cosine_mrr@10 |
|
- cosine_map@100 |
|
widget: |
|
- source_sentence: We offer dual motor powertrain vehicles, which use two electric |
|
motors to maximize traction and performance in an all-wheel-drive configuration, |
|
as well as vehicle powertrain technology featuring three electric motors for further |
|
increased performance in certain versions of Model S and Model X, Cybertruck, |
|
and the Tesla Semi. |
|
sentences: |
|
- What is the purpose of The Home Depot Foundation? |
|
- What are the features of the company's vehicle powertrain technology? |
|
- Where can public access the company's SEC filings? |
|
- source_sentence: The litigation requests a declaration that the IRA violates Janssen’s |
|
rights under the First Amendment and the Fifth Amendment to the Constitution. |
|
sentences: |
|
- What changes occurred in the valuation of equity warrants from 2021 to 2023? |
|
- What constitutional rights does Janssen claim the Inflation Reduction Act violates? |
|
- What was the cash paid for amounts included in the measurement of operating lease |
|
liabilities for the years 2021, 2022, and 2023? |
|
- source_sentence: After-tax earnings of other energy businesses decreased $332 million |
|
(24.5%) in 2023 compared to 2022. The decline reflected lower earnings at Northern |
|
Powergrid due to unfavorable results at a natural gas exploration project, including |
|
the write-off of capitalized exploration costs and lower gas production volumes |
|
and prices, as well as from higher deferred income tax expense related to the |
|
enactment of the Energy Profits Levy income tax in the United Kingdom. The earnings |
|
decline was also attributable to lower earnings from renewable energy and retail |
|
services businesses. The decline in renewable energy and retail services earnings |
|
was primarily due to lower income tax benefits, higher operating expenses, lower |
|
solar and wind generation at owned projects and the impact of unfavorable changes |
|
in valuations of derivatives contracts, partially offset by debt extinguishment |
|
gains. |
|
sentences: |
|
- What were the reasons for the decline in after-tax earnings of other energy businesses |
|
in 2023? |
|
- What was the main reason for the increase in the company's valuation allowance |
|
during fiscal 2023? |
|
- What were the net purchase amounts of treasury shares for the years ended December |
|
31, 2022, and 2023? |
|
- source_sentence: The Phase 3 OAKTREE trial of obeldesivir in non-hospitalized participants |
|
without risk factors for developing severe COVID-19 did not meet its primary endpoint |
|
of improvement in time to symptom alleviation. Obeldesivir was well-tolerated |
|
in this large study population. |
|
sentences: |
|
- How did the P&C combined ratios trend from 2021 to 2023? |
|
- What are some of the digital tools Walmart uses to improve associate productivity, |
|
engagement, and performance? |
|
- What was the result of the Phase 3 OAKTREE trial of obeldesivir conducted by Gilead? |
|
- source_sentence: The issuance of preferred stock could have the effect of restricting |
|
dividends on the Company’s common stock, diluting the voting power of its common |
|
stock, impairing the liquidation rights of its common stock, or delaying or preventing |
|
a change in control. |
|
sentences: |
|
- What is the impact of issuing preferred stock according to the Company's description? |
|
- For how long did Jeffrey P. Bezos serve as President at Amazon? |
|
- Where in an Annual Report on Form 10-K is 'Note 13 — Commitments and Contingencies |
|
— Litigation and Other Legal Matters' included? |
|
pipeline_tag: sentence-similarity |
|
model-index: |
|
- name: BGE small Financial Matryoshka |
|
results: |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 384 |
|
type: dim_384 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.6642857142857143 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.8242857142857143 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.8614285714285714 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9085714285714286 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.6642857142857143 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.2747619047619047 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.17228571428571426 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09085714285714284 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.6642857142857143 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.8242857142857143 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.8614285714285714 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9085714285714286 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.7905933695158355 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.7523809523809522 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.7562726267140966 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 256 |
|
type: dim_256 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.6657142857142857 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.8242857142857143 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.8628571428571429 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9114285714285715 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.6657142857142857 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.2747619047619047 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.17257142857142854 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09114285714285712 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.6657142857142857 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.8242857142857143 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.8628571428571429 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9114285714285715 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.7919632560554437 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.7534053287981859 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.756861587821826 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 128 |
|
type: dim_128 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.6528571428571428 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.8071428571428572 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.8485714285714285 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.6528571428571428 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.26904761904761904 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.16971428571428568 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.6528571428571428 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.8071428571428572 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.8485714285714285 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.778048727585675 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.7388730158730156 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.7424840237912022 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 64 |
|
type: dim_64 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.6357142857142857 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.7757142857142857 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.8128571428571428 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.8585714285714285 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.6357142857142857 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.25857142857142856 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.16257142857142853 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.08585714285714285 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.6357142857142857 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.7757142857142857 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.8128571428571428 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.8585714285714285 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.7490553533476035 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.7138038548752832 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.7189504452927022 |
|
name: Cosine Map@100 |
|
--- |
|
|
|
# BGE small Financial Matryoshka |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <!-- at revision 5c38ec7c405ec4b44b94cc5a9bb96e735b38267a --> |
|
- **Maximum Sequence Length:** 512 tokens |
|
- **Output Dimensionality:** 384 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
- **Language:** en |
|
- **License:** apache-2.0 |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel |
|
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
(2): Normalize() |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Download from the 🤗 Hub |
|
model = SentenceTransformer("haophancs/bge-small-financial-matryoshka") |
|
# Run inference |
|
sentences = [ |
|
'The issuance of preferred stock could have the effect of restricting dividends on the Company’s common stock, diluting the voting power of its common stock, impairing the liquidation rights of its common stock, or delaying or preventing a change in control.', |
|
"What is the impact of issuing preferred stock according to the Company's description?", |
|
'For how long did Jeffrey P. Bezos serve as President at Amazon?', |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) |
|
# [3, 384] |
|
|
|
# Get the similarity scores for the embeddings |
|
similarities = model.similarity(embeddings, embeddings) |
|
print(similarities.shape) |
|
# [3, 3] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_384` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.6643 | |
|
| cosine_accuracy@3 | 0.8243 | |
|
| cosine_accuracy@5 | 0.8614 | |
|
| cosine_accuracy@10 | 0.9086 | |
|
| cosine_precision@1 | 0.6643 | |
|
| cosine_precision@3 | 0.2748 | |
|
| cosine_precision@5 | 0.1723 | |
|
| cosine_precision@10 | 0.0909 | |
|
| cosine_recall@1 | 0.6643 | |
|
| cosine_recall@3 | 0.8243 | |
|
| cosine_recall@5 | 0.8614 | |
|
| cosine_recall@10 | 0.9086 | |
|
| cosine_ndcg@10 | 0.7906 | |
|
| cosine_mrr@10 | 0.7524 | |
|
| **cosine_map@100** | **0.7563** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_256` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.6657 | |
|
| cosine_accuracy@3 | 0.8243 | |
|
| cosine_accuracy@5 | 0.8629 | |
|
| cosine_accuracy@10 | 0.9114 | |
|
| cosine_precision@1 | 0.6657 | |
|
| cosine_precision@3 | 0.2748 | |
|
| cosine_precision@5 | 0.1726 | |
|
| cosine_precision@10 | 0.0911 | |
|
| cosine_recall@1 | 0.6657 | |
|
| cosine_recall@3 | 0.8243 | |
|
| cosine_recall@5 | 0.8629 | |
|
| cosine_recall@10 | 0.9114 | |
|
| cosine_ndcg@10 | 0.792 | |
|
| cosine_mrr@10 | 0.7534 | |
|
| **cosine_map@100** | **0.7569** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_128` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.6529 | |
|
| cosine_accuracy@3 | 0.8071 | |
|
| cosine_accuracy@5 | 0.8486 | |
|
| cosine_accuracy@10 | 0.9 | |
|
| cosine_precision@1 | 0.6529 | |
|
| cosine_precision@3 | 0.269 | |
|
| cosine_precision@5 | 0.1697 | |
|
| cosine_precision@10 | 0.09 | |
|
| cosine_recall@1 | 0.6529 | |
|
| cosine_recall@3 | 0.8071 | |
|
| cosine_recall@5 | 0.8486 | |
|
| cosine_recall@10 | 0.9 | |
|
| cosine_ndcg@10 | 0.778 | |
|
| cosine_mrr@10 | 0.7389 | |
|
| **cosine_map@100** | **0.7425** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_64` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:----------| |
|
| cosine_accuracy@1 | 0.6357 | |
|
| cosine_accuracy@3 | 0.7757 | |
|
| cosine_accuracy@5 | 0.8129 | |
|
| cosine_accuracy@10 | 0.8586 | |
|
| cosine_precision@1 | 0.6357 | |
|
| cosine_precision@3 | 0.2586 | |
|
| cosine_precision@5 | 0.1626 | |
|
| cosine_precision@10 | 0.0859 | |
|
| cosine_recall@1 | 0.6357 | |
|
| cosine_recall@3 | 0.7757 | |
|
| cosine_recall@5 | 0.8129 | |
|
| cosine_recall@10 | 0.8586 | |
|
| cosine_ndcg@10 | 0.7491 | |
|
| cosine_mrr@10 | 0.7138 | |
|
| **cosine_map@100** | **0.719** | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### Unnamed Dataset |
|
|
|
|
|
* Size: 6,300 training samples |
|
* Columns: <code>positive</code> and <code>anchor</code> |
|
* Approximate statistics based on the first 1000 samples: |
|
| | positive | anchor | |
|
|:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| |
|
| type | string | string | |
|
| details | <ul><li>min: 9 tokens</li><li>mean: 45.74 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 20.77 tokens</li><li>max: 43 tokens</li></ul> | |
|
* Samples: |
|
| positive | anchor | |
|
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------| |
|
| <code>The company believes that trademarks have significant value for marketing products, e-commerce, stores, and business, with the possibility of indefinite renewal as long as the trademarks are in use.</code> | <code>What are the benefits of registering trademarks for the company's business?</code> | |
|
| <code>The consolidated financial statements and accompanying notes listed in Part IV, Item 15(a)(1) of this Annual Report on Form 10-K are included immediately following Part IV hereof and incorporated by reference herein.</code> | <code>How are the consolidated financial statements and accompanying notes incorporated into the Annual Report on Form 10-K?</code> | |
|
| <code>During the year ended December 31, 2023, the Company repurchased and subsequently retired 2,029,894 shares of common stock from the open market at an average cost of $103.45 per share for a total of $210.0 million.</code> | <code>How many shares of common stock did the Company repurchase and subsequently retire during the year ended December 31, 2023?</code> | |
|
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: |
|
```json |
|
{ |
|
"loss": "MultipleNegativesRankingLoss", |
|
"matryoshka_dims": [ |
|
384, |
|
256, |
|
128, |
|
64 |
|
], |
|
"matryoshka_weights": [ |
|
1, |
|
1, |
|
1, |
|
1 |
|
], |
|
"n_dims_per_step": -1 |
|
} |
|
``` |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: epoch |
|
- `per_device_train_batch_size`: 32 |
|
- `per_device_eval_batch_size`: 16 |
|
- `gradient_accumulation_steps`: 16 |
|
- `learning_rate`: 2e-05 |
|
- `num_train_epochs`: 4 |
|
- `lr_scheduler_type`: cosine |
|
- `warmup_ratio`: 0.1 |
|
- `bf16`: True |
|
- `tf32`: True |
|
- `load_best_model_at_end`: True |
|
- `optim`: adamw_torch_fused |
|
- `batch_sampler`: no_duplicates |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: epoch |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 32 |
|
- `per_device_eval_batch_size`: 16 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 16 |
|
- `eval_accumulation_steps`: None |
|
- `learning_rate`: 2e-05 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1.0 |
|
- `num_train_epochs`: 4 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: cosine |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.1 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 42 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: True |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: True |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 0 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: False |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: True |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch_fused |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: False |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `dispatch_batches`: None |
|
- `split_batches`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `batch_sampler`: no_duplicates |
|
- `multi_dataset_batch_sampler`: proportional |
|
|
|
</details> |
|
|
|
### Training Logs |
|
| Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_384_cosine_map@100 | dim_64_cosine_map@100 | |
|
|:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| |
|
| 0.8122 | 10 | 1.7741 | - | - | - | - | |
|
| 0.9746 | 12 | - | 0.7042 | 0.7262 | 0.7327 | 0.6639 | |
|
| 1.6244 | 20 | 0.7817 | - | - | - | - | |
|
| 1.9492 | 24 | - | 0.7322 | 0.7477 | 0.7498 | 0.7136 | |
|
| 2.4365 | 30 | 0.5816 | - | - | - | - | |
|
| 2.9239 | 36 | - | 0.7387 | 0.7563 | 0.7549 | 0.7165 | |
|
| 3.2487 | 40 | 0.5121 | - | - | - | - | |
|
| **3.8985** | **48** | **-** | **0.7425** | **0.7569** | **0.7563** | **0.719** | |
|
|
|
* The bold row denotes the saved checkpoint. |
|
|
|
### Framework Versions |
|
- Python: 3.12.2 |
|
- Sentence Transformers: 3.0.1 |
|
- Transformers: 4.41.2 |
|
- PyTorch: 2.2.0+cu121 |
|
- Accelerate: 0.31.0 |
|
- Datasets: 2.19.1 |
|
- Tokenizers: 0.19.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### MatryoshkaLoss |
|
```bibtex |
|
@misc{kusupati2024matryoshka, |
|
title={Matryoshka Representation Learning}, |
|
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, |
|
year={2024}, |
|
eprint={2205.13147}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|
|
#### MultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{henderson2017efficient, |
|
title={Efficient Natural Language Response Suggestion for Smart Reply}, |
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
|
year={2017}, |
|
eprint={1705.00652}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |