Afrizal Hasbi Azizy
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -35,12 +35,11 @@ inference: false
|
|
35 |
<img src="https://imgur.com/9nG5J1T.png" alt="Kancil" width="600" height="300">
|
36 |
<p><em>Kancil is a fine-tuned version of Llama 3 8B using synthetic QA dataset generated with Llama 3 70B. Version zero of Kancil is the first generative Indonesian LLM gain functional instruction performance using solely synthetic data.</em></p>
|
37 |
<p><strong><a href="https://colab.research.google.com/drive/1526QJYfk32X1CqYKX7IA_FFcIHLXbOkx?usp=sharing" style="color: blue; font-family: Tahoma;">❕Go straight to the colab demo❕</a></strong></p>
|
38 |
-
<p><em style="color: seagreen;">Compatibility erorrs are fixed! The colab should work fine now.</em></p>
|
39 |
</center>
|
40 |
|
41 |
Selamat datang!
|
42 |
|
43 |
-
I am ultra-overjoyed to introduce you... the 🦌 Kancil! It's a fine-tuned version of Llama 3 8B with the Tumpeng, an instruction dataset of
|
44 |
|
45 |
📚 The dataset was synthetically generated from Llama 3 70B. A big problem with existing Indonesian instruction dataset is they're in reality not-very-good-translations of English datasets. Llama 3 70B can generate fluent Indonesian! (with minor caveats 😔)
|
46 |
|
|
|
35 |
<img src="https://imgur.com/9nG5J1T.png" alt="Kancil" width="600" height="300">
|
36 |
<p><em>Kancil is a fine-tuned version of Llama 3 8B using synthetic QA dataset generated with Llama 3 70B. Version zero of Kancil is the first generative Indonesian LLM gain functional instruction performance using solely synthetic data.</em></p>
|
37 |
<p><strong><a href="https://colab.research.google.com/drive/1526QJYfk32X1CqYKX7IA_FFcIHLXbOkx?usp=sharing" style="color: blue; font-family: Tahoma;">❕Go straight to the colab demo❕</a></strong></p>
|
|
|
38 |
</center>
|
39 |
|
40 |
Selamat datang!
|
41 |
|
42 |
+
I am ultra-overjoyed to introduce you... the 🦌 Kancil! It's a fine-tuned version of Llama 3 8B with the Tumpeng, an instruction dataset of 14.8 million words. Both the model and dataset is openly available in Huggingface.
|
43 |
|
44 |
📚 The dataset was synthetically generated from Llama 3 70B. A big problem with existing Indonesian instruction dataset is they're in reality not-very-good-translations of English datasets. Llama 3 70B can generate fluent Indonesian! (with minor caveats 😔)
|
45 |
|