It seems like the model I submitted has failed.

#275
by yeontaek - opened

It seems like the model I submitted has been marked as a fail.

Could you please let me know the reason and if it's possible to resubmit?

https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/e1a9810a8e0cec8671325dc20ff32bc23d41be99

Open LLM Leaderboard org

Hi, seems like we had a small failure on our side, I resubmitted your model. Feel free to reopen the discussion if the evaluation fails again.

SaylorTwift changed discussion status to closed

@SaylorTwift

Hello, it seems like the model you resubmitted has failed again.

Is there a problem with the model, or could it be an issue with the evaluation cluster?

I would appreciate it if you could resubmit it.

yeontaek changed discussion status to open
Open LLM Leaderboard org

Hi @yeontaek ,
Looking at the logs, your model failed in the same way the two times, while loading checkpoint shard 14 out of 15 (which caused a SIGTERM error). One time could have been a hardware failure, two times on the same checkpoint is likely an error on your model.
Did you follow all the steps in the about (notably did you update your weights as safetensors)?

Open LLM Leaderboard org

Feel free to reopen once you updated your model :)

clefourrier changed discussion status to closed

Sign up or log in to comment