Huang Liang Hsun PRO

lianghsun

https://www.lianghsun.dev

AI & ML interests

Founder of Twinkle AI. Focused on applying deep learning in legal and scientific domains, with expertise in NLP and model fine-tuning.

Recent Activity

updated a dataset about 12 hours ago

twinkle-ai/tw-reasoning-instruct-50k

updated a model about 16 hours ago

twinkle-ai/Llama-3.2-3B-F1-Instruct

replied to their post 1 day ago

With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project: 📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? 🤔 → See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1️⃣ Parallelized evaluation of samples 2️⃣ Multi-round testing for stability 3️⃣ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆 If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗

View all activity

Organizations

lianghsun's activity

liked a model 5 days ago

twinkle-ai/Llama-3.2-3B-F1-Instruct

Text Generation • Updated about 16 hours ago • 6 • 14

liked a dataset 5 days ago

Wellstw/tw_edu_idoms

Viewer • Updated 12 days ago • 5.83k • 62 • 1

liked a model 7 days ago

lianghsun/Marble-3B-Instruct

Text Generation • Updated 7 days ago • 2

liked 3 datasets 8 days ago

liked 2 datasets 11 days ago

SUFE-AIFLM-Lab/FinEval

Updated Aug 22, 2023 • 322 • 15

mlabonne/FineTome-100k

Viewer • Updated Jul 29, 2024 • 100k • 20.7k • 194

liked a dataset 16 days ago

lightblue/reasoning-multilingual-R1-Llama-70B-train

Viewer • Updated Jan 31 • 2.48k • 271 • 34

liked 7 datasets 17 days ago

tyouisen/aclue

Updated Jan 29, 2024 • 137 • 8

CohereForAI/Global-MMLU

Viewer • Updated 16 days ago • 602k • 18.8k • 116

tidarren/ptt-riddle

Viewer • Updated Jun 9, 2024 • 17.1k • 31 • 1

STEM-AI-mtl/City_map

Viewer • Updated Apr 6, 2024 • 605 • 37 • 4

milkshake721/2.1M-wiki-STEM

Viewer • Updated Oct 13, 2023 • 2.1M • 55 • 3

fmars/wiki_stem

Viewer • Updated Aug 18, 2023 • 676k • 117 • 5

Abzu/arxiv_stem

Viewer • Updated Aug 3, 2023 • 2.3M • 177 • 1

liked a dataset 22 days ago

yuhuanstudio/wikipedia-pretrain-zh-tw

Viewer • Updated 5 days ago • 1.47M • 168 • 4

liked a dataset 24 days ago

yuhuanstudio/twdict_pretrain

Viewer • Updated 5 days ago • 169k • 108 • 3

liked 2 datasets 29 days ago

zake7749/kyara-zh-sample-1M

Viewer • Updated Jan 15 • 1.02M • 95 • 2

wuulong/purchasing_exam_questions

Viewer • Updated 27 days ago • 3.7k • 209 • 1