Sentence Similarity
Transformers
Safetensors
multilingual
nllb-llm2vec
feature-extraction
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
custom_code
fdschmidt93
commited on
docs: add info about tokenizer src_lang
Browse files
README.md
CHANGED
@@ -37,6 +37,8 @@ tags:
|
|
37 |
|
38 |
This model has only been trained on self-supervised data and not yet been fine-tuned on any downstream task! This version is expected to perform better than self-supervised adaptation in the original paper, as LoRAs are merged into the model prior to task fine-tuning. The backbone of this model is [LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse](https://huggingface.co/McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse). We use the encoder of [NLLB-600M](https://huggingface.co/facebook/nllb-200-distilled-600M).
|
39 |
|
|
|
|
|
40 |
## Usage
|
41 |
```python
|
42 |
import torch
|
|
|
37 |
|
38 |
This model has only been trained on self-supervised data and not yet been fine-tuned on any downstream task! This version is expected to perform better than self-supervised adaptation in the original paper, as LoRAs are merged into the model prior to task fine-tuning. The backbone of this model is [LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse](https://huggingface.co/McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse). We use the encoder of [NLLB-600M](https://huggingface.co/facebook/nllb-200-distilled-600M).
|
39 |
|
40 |
+
> ⚠️ Make sure that you correctly set the `src_lang` (i.e., `AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", src_lang=LANG_CODE)` for the language you are using NLLB-LLM2Vec with! You can find a list of supported languages [here](https://huggingface.co/facebook/nllb-200-distilled-600M/blob/main/special_tokens_map.json)
|
41 |
+
|
42 |
## Usage
|
43 |
```python
|
44 |
import torch
|