andreaschari
/

colbert-xm-lt-afdt

Sentence Similarity

passage-retrieval

Model card Files Files and versions Community

colbert-xm-lt-afdt / README.md

andreaschari's picture

Update README.md

214d151 verified 2 days ago

|

history blame contribute delete

1.28 kB

	---
	pipeline_tag: sentence-similarity
	datasets:
	- sentence-transformers/msmarco-hard-negatives
	- unicamp-dl/mmarco
	tags:
	- colbert
	- passage-retrieval
	library_name: colbert-ai
	base_model:
	- antoinelouis/colbert-xm
	- facebook/xmod-base
	license: mit
	language:
	- multilingual
	- af
	- am
	- ar
	- az
	- be
	- bg
	- bn
	- ca
	- cs
	- cy
	- da
	- de
	- el
	- en
	- eo
	- es
	- et
	- eu
	- fa
	- fi
	- fr
	- ga
	- gl
	- gu
	- ha
	- he
	- hi
	- hr
	- hu
	- hy
	- id
	- is
	- it
	- ja
	- ka
	- kk
	- km
	- kn
	- ko
	- ku
	- ky
	- la
	- lo
	- lt
	- lv
	- mk
	- ml
	- mn
	- mr
	- ms
	- my
	- ne
	- nl
	- 'no'
	- or
	- pa
	- pl
	- ps
	- pt
	- ro
	- ru
	- sa
	- si
	- sk
	- sl
	- so
	- sq
	- sr
	- sv
	- sw
	- ta
	- te
	- th
	- tl
	- tr
	- uk
	- ur
	- uz
	- vi
	- zh
	---

	[XMOD-base](https://huggingface.co/facebook/xmod-base) fine-tuned using [ColBERT-XM](https://huggingface.co/antoinelouis/colbert-xm) methodology on Dutch Translated to Afrikaans Queries and Dutch Documents from mMARCO/v2.

	Essentially, it's ColBERT-XM but fine-tuned on Afrikaans-Dutch mMARCOv2 in contrast to MSMARCO of the original ColBERT-XM model.

	This model was fine-tuned for the "Improving Low-Resource Retrieval Effectiveness using Zero-Shot Linguistic Similarity Transfer" ECIR2025 paper. The source code for the paper can be found [here](https://github.com/andreaschari/linguistic-transfer)