Evaluation Dataset
Are there any public evaluation results on these models. I was unable to find the linked paper on arXiv ?
Hi! Not yet, 2501.99999
is indeed just a placeholder. I aim to publish some tangible results sometime in January, but I cannot make precise promises.
Which datasets were you looking forward to find evaluations for?
I'm interested in MedSTS and any benchmark to for multilingual STS such as.
STSB makes a lot of sense, but it also isn't cross-lingual which is a bit of a shame. I wonder if I could make my own cross-lingual eval there but I have very limited bandwidth for this unfortunately.
MedSTS I'm definitely interested ^_^ but I would want a totally different model to achieve decent scores here, I think the scope of this model is too general for MedSTS to perform well.
I think STSB would be a good starting point even if it's not cross lingual at least it will give the per language scores. I understand your bandwidth is limited. I just want to say this is really awesome and I'm really impressed by how fast you adapted everything.