Huang Liang Hsun's picture

Huang Liang Hsun PRO

lianghsun

AI & ML interests

Founder of Twinkle AI. Focused on applying deep learning in legal and scientific domains, with expertise in NLP and model fine-tuning.

Recent Activity

updated a model about 1 hour ago
twinkle-ai/Llama-3.2-3B-F1-Instruct
replied to their post about 18 hours ago
With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project: 📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? 🤔 → See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1️⃣ Parallelized evaluation of samples 2️⃣ Multi-round testing for stability 3️⃣ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆 If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗
updated a collection 1 day ago
🏎️ Formosa-1 Series
View all activity

Organizations

shareAI's profile picture Hugging Face for Legal's profile picture Model Collapse's profile picture Taiwan Llama's profile picture Twinkle AI's profile picture

lianghsun's activity

New activity in lianghsun/super-cot-preview 15 days ago
New activity in minyichen/HuggingFaceH4_MATH_R1 16 days ago
New activity in minyichen/tw-instruct-R1-200k 19 days ago
New activity in lianghsun/new-identity 21 days ago

Upload 3 files

1
#5 opened 21 days ago by
minyichen
New activity in lianghsun/tw-political-correctness-chat 21 days ago

Upload datasets.jsonl

#2 opened 21 days ago by
minyichen
New activity in lianghsun/new-identity 24 days ago

Upload identity.json

#4 opened 24 days ago by
minyichen

Upload 2 files

1
#2 opened 24 days ago by
minyichen
New activity in google/gemma-3-4b-it 24 days ago
New activity in lianghsun/Llama-3.2-Taiwan-3B-Instruct 3 months ago

playground是壞的

4
#2 opened 3 months ago by
metalnow

🚩 Report: Legal issue(s)

#1 opened 3 months ago by
wayne1998
New activity in lianghsun/free-gpt-4o-chat 4 months ago

free-gpt-4o-chat

#1 opened 4 months ago by
avadhuta
New activity in lianghsun/Llama-3.2-Taiwan-3B 4 months ago
New activity in lianghsun/Llama-3.2-Taiwan-3B 4 months ago