Afrizal Hasbi Azizy commited on
Commit
bbb4952
·
verified ·
1 Parent(s): 161de81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -35,12 +35,11 @@ inference: false
35
  <img src="https://imgur.com/9nG5J1T.png" alt="Kancil" width="600" height="300">
36
  <p><em>Kancil is a fine-tuned version of Llama 3 8B using synthetic QA dataset generated with Llama 3 70B. Version zero of Kancil is the first generative Indonesian LLM gain functional instruction performance using solely synthetic data.</em></p>
37
  <p><strong><a href="https://colab.research.google.com/drive/1526QJYfk32X1CqYKX7IA_FFcIHLXbOkx?usp=sharing" style="color: blue; font-family: Tahoma;">❕Go straight to the colab demo❕</a></strong></p>
38
- <p><em style="color: seagreen;">Compatibility erorrs are fixed! The colab should work fine now.</em></p>
39
  </center>
40
 
41
  Selamat datang!
42
 
43
- I am ultra-overjoyed to introduce you... the 🦌 Kancil! It's a fine-tuned version of Llama 3 8B with the Tumpeng, an instruction dataset of 6.7 million words. Both the model and dataset is openly available in Huggingface.
44
 
45
  📚 The dataset was synthetically generated from Llama 3 70B. A big problem with existing Indonesian instruction dataset is they're in reality not-very-good-translations of English datasets. Llama 3 70B can generate fluent Indonesian! (with minor caveats 😔)
46
 
 
35
  <img src="https://imgur.com/9nG5J1T.png" alt="Kancil" width="600" height="300">
36
  <p><em>Kancil is a fine-tuned version of Llama 3 8B using synthetic QA dataset generated with Llama 3 70B. Version zero of Kancil is the first generative Indonesian LLM gain functional instruction performance using solely synthetic data.</em></p>
37
  <p><strong><a href="https://colab.research.google.com/drive/1526QJYfk32X1CqYKX7IA_FFcIHLXbOkx?usp=sharing" style="color: blue; font-family: Tahoma;">❕Go straight to the colab demo❕</a></strong></p>
 
38
  </center>
39
 
40
  Selamat datang!
41
 
42
+ I am ultra-overjoyed to introduce you... the 🦌 Kancil! It's a fine-tuned version of Llama 3 8B with the Tumpeng, an instruction dataset of 14.8 million words. Both the model and dataset is openly available in Huggingface.
43
 
44
  📚 The dataset was synthetically generated from Llama 3 70B. A big problem with existing Indonesian instruction dataset is they're in reality not-very-good-translations of English datasets. Llama 3 70B can generate fluent Indonesian! (with minor caveats 😔)
45