BGE-M3 Lingustic Transfer (Catalan-French)
This is a BGE-M3 model post-trained on French translated to Catalan Queries and French Documents from MMARCO/v2.
This model was fine-tuned for the "Improving Low-Resource Retrieval Effectiveness using Zero-Shot Linguistic Similarity Transfer" ECIR2025 paper. The source code for the paper can be found here
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Training Details
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.4.0.post301
- Accelerate: 0.32.1
- Datasets: 2.19.1
- Tokenizers: 0.19.1
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for andreaschari/bge-m3-lt-cafr
Base model
BAAI/bge-m3