phi3-mini with extended tokenizer with 52k-unicode-hindi
training only embedding layers with final loss -> ~1.18