Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
83530.5
TFLOPS
31
8
230
Huang Liang Hsun
PRO
lianghsun
Follow
Rainingsnow's profile picture
dapumptu's profile picture
ranger21515's profile picture
55 followers
·
19 following
https://www.lianghsun.dev
lianghsun
lianghsunhuang
AI & ML interests
Founder of Twinkle AI. Focused on applying deep learning in legal and scientific domains, with expertise in NLP and model fine-tuning.
Recent Activity
updated
a dataset
about 12 hours ago
twinkle-ai/tw-reasoning-instruct-50k
updated
a model
about 16 hours ago
twinkle-ai/Llama-3.2-3B-F1-Instruct
replied
to
their
post
1 day ago
With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project: 📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? 🤔 → See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1️⃣ Parallelized evaluation of samples 2️⃣ Multi-round testing for stability 3️⃣ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆 If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗
View all activity
Organizations
lianghsun
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
5 days ago
twinkle-ai/Llama-3.2-3B-F1-Instruct
Text Generation
•
Updated
about 16 hours ago
•
6
•
14
liked
a dataset
5 days ago
Wellstw/tw_edu_idoms
Viewer
•
Updated
12 days ago
•
5.83k
•
62
•
1
liked
a model
7 days ago
lianghsun/Marble-3B-Instruct
Text Generation
•
Updated
7 days ago
•
2
liked
3 datasets
8 days ago
zake7749/chinese-sft-stem-zh-hant
Viewer
•
Updated
Sep 7, 2024
•
12.3k
•
86
•
3
zake7749/kyara-chinese-math-sft-s0-30K
Viewer
•
Updated
Sep 7, 2024
•
30k
•
144
•
4
glaiveai/reasoning-v1-20m
Viewer
•
Updated
18 days ago
•
22.2M
•
9.69k
•
169
liked
2 datasets
11 days ago
SUFE-AIFLM-Lab/FinEval
Updated
Aug 22, 2023
•
322
•
15
mlabonne/FineTome-100k
Viewer
•
Updated
Jul 29, 2024
•
100k
•
20.7k
•
194
liked
a dataset
16 days ago
lightblue/reasoning-multilingual-R1-Llama-70B-train
Viewer
•
Updated
Jan 31
•
2.48k
•
271
•
34
liked
7 datasets
17 days ago
tyouisen/aclue
Updated
Jan 29, 2024
•
137
•
8
CohereForAI/Global-MMLU
Viewer
•
Updated
16 days ago
•
602k
•
18.8k
•
116
tidarren/ptt-riddle
Viewer
•
Updated
Jun 9, 2024
•
17.1k
•
31
•
1
STEM-AI-mtl/City_map
Viewer
•
Updated
Apr 6, 2024
•
605
•
37
•
4
milkshake721/2.1M-wiki-STEM
Viewer
•
Updated
Oct 13, 2023
•
2.1M
•
55
•
3
fmars/wiki_stem
Viewer
•
Updated
Aug 18, 2023
•
676k
•
117
•
5
Abzu/arxiv_stem
Viewer
•
Updated
Aug 3, 2023
•
2.3M
•
177
•
1
liked
a dataset
22 days ago
yuhuanstudio/wikipedia-pretrain-zh-tw
Viewer
•
Updated
5 days ago
•
1.47M
•
168
•
4
liked
a dataset
24 days ago
yuhuanstudio/twdict_pretrain
Viewer
•
Updated
5 days ago
•
169k
•
108
•
3
liked
2 datasets
29 days ago
zake7749/kyara-zh-sample-1M
Viewer
•
Updated
Jan 15
•
1.02M
•
95
•
2
wuulong/purchasing_exam_questions
Viewer
•
Updated
27 days ago
•
3.7k
•
209
•
1
Load more