Evaluation Dataset

#5
by pythiccoder - opened

Are there any public evaluation results on these models. I was unable to find the linked paper on arXiv ?

Parallia org

Hi! Not yet, 2501.99999 is indeed just a placeholder. I aim to publish some tangible results sometime in January, but I cannot make precise promises.

Which datasets were you looking forward to find evaluations for?

I'm interested in MedSTS and any benchmark to for multilingual STS such as.

https://huggingface.co/datasets/PhilipMay/stsb_multi_mt

Parallia org

STSB makes a lot of sense, but it also isn't cross-lingual which is a bit of a shame. I wonder if I could make my own cross-lingual eval there but I have very limited bandwidth for this unfortunately.

MedSTS I'm definitely interested ^_^ but I would want a totally different model to achieve decent scores here, I think the scope of this model is too general for MedSTS to perform well.

I think STSB would be a good starting point even if it's not cross lingual at least it will give the per language scores. I understand your bandwidth is limited. I just want to say this is really awesome and I'm really impressed by how fast you adapted everything.

Sign up or log in to comment