nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Oct 16, 2024

Outside of training in the base model, was this trained with examples in other languages than just English? To me seems there might be a improvement in Japanese but not sure

nonetrix

Oct 16, 2024

•

edited Oct 16, 2024

Actually more testing it seems worse in some cases? I asked it to translate a song and it just outputted this when I corrected it and told it not to use romaji



[1]




[2]





[3]

And as sanity check I tested on Huggingchat

Translated Japanese Lyrics:

[Verse 1]

[Verse 2]

[Chorus]

[Bridge]

[Chorus]

This reminds me of what Llama 3 not to be confused with Llama 3.1 would do annoyingly but 3.1 fixed, I suspected that Japanese data was literally find and replaced in the whole training dataset for some reason. But when it doesn't do that, it's quite good maybe? I am not sure why the heck Meta thought that was a good idea, but besides the point 3.1 didn't do this anymore I thought

nonetrix

Oct 18, 2024

•

edited Oct 18, 2024

Well if anyone is okay with a wildly inefficient fix(?) I combined two models in a way that likely just made it dumber... Is it good? I'm not sure, these kinds of merges or really hit and miss, will try other methods too I just think doing it this way is funny and cursed. I have to quantize it to 2 bits brb and likely damage the model even more because science and also I am GPU poor. Not sure if NVIDIA is okay with me kindaish promoting my own merges, hopefully it's fine, I don't recommend using it really
https://huggingface.co/nonetrix/llama-3.1-70B-nemotron-agent-ja-120B

nvidia
/

Llama-3.1-Nemotron-70B-Instruct-HF

Other lanuage ablity