Evaluation Dataset

by pythiccoder - opened Jan 13

Discussion

pythiccoder

Jan 13

Are there any public evaluation results on these models. I was unable to find the linked paper on arXiv ?

FremyCompany

Parallia org Jan 13

Hi! Not yet, 2501.99999 is indeed just a placeholder. I aim to publish some tangible results sometime in January, but I cannot make precise promises.

Which datasets were you looking forward to find evaluations for?

pythiccoder

Jan 13

I'm interested in MedSTS and any benchmark to for multilingual STS such as.

https://huggingface.co/datasets/PhilipMay/stsb_multi_mt

FremyCompany

Parallia org Jan 13

STSB makes a lot of sense, but it also isn't cross-lingual which is a bit of a shame. I wonder if I could make my own cross-lingual eval there but I have very limited bandwidth for this unfortunately.

MedSTS I'm definitely interested ^_^ but I would want a totally different model to achieve decent scores here, I think the scope of this model is too general for MedSTS to perform well.

pythiccoder

Jan 13

•

edited Jan 13

I think STSB would be a good starting point even if it's not cross lingual at least it will give the per language scores. I understand your bandwidth is limited. I just want to say this is really awesome and I'm really impressed by how fast you adapted everything.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment