answerdotai
/

JaColBERTv2.4

Sentence Similarity

Model card Files Files and versions Community

JaColBERTv2.4 / README.md

bclavie's picture

Create README.md

7637b60 verified 6 months ago

|

700 Bytes

	---
	inference: false
	datasets:
	- answerdotai/MMARCO-japanese-32-scored-triplets
	- unicamp-dl/mmarco
	language:
	- ja
	pipeline_tag: sentence-similarity
	tags:
	- ColBERT
	base_model:
	- cl-tohoku/bert-base-japanese-v3
	- bclavie/JaColBERT
	license: mit
	library_name: RAGatouille
	---

	Model weights for the JaColBERTv2.4 checkpoint, which is the pre-post-training version of JaColBERTv2.5, using an entirely overhauled training recipe and trained on just 40% of the data of JaColBERTv2.

	This model largely outperforms all previous approaches, including JaColBERTV2 multilingual models such as BGE-M3, on all datasets.

	This page will be updated with the full details and the model report in the next few days.