Huang Liang Hsun PRO

lianghsun

https://www.lianghsun.dev

AI & ML interests

Founder of Twinkle AI. Focused on applying deep learning in legal and scientific domains, with expertise in NLP and model fine-tuning.

Recent Activity

updated a dataset about 19 hours ago

twinkle-ai/tw-reasoning-instruct-50k

updated a model about 24 hours ago

twinkle-ai/Llama-3.2-3B-F1-Instruct

replied to their post 1 day ago

With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project: 📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? 🤔 → See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1️⃣ Parallelized evaluation of samples 2️⃣ Multi-round testing for stability 3️⃣ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆 If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗

View all activity

Organizations

lianghsun's activity

upvoted a collection 2 months ago

Granite 3.1 Language Models

Collection

A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 9 items • Updated Feb 24 • 59

upvoted a collection 4 months ago

chinese-dataset

Collection

收藏夹 • 9 items • Updated Mar 11, 2024 • 2

upvoted a paper 4 months ago

Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

Paper • 2410.10739 • Published Oct 14, 2024 • 2

upvoted 2 collections 5 months ago

Taide

Collection

Synthetic dataset generated by Taide • 4 items • Updated Aug 30, 2024 • 1

MobileLLM

Collection

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 111

upvoted 2 collections 6 months ago

Taiwan-Legal-Bench

Collection

This repository offers a dataset for evaluating legal models based on Taiwan’s laws, including legal questions, provisions, and case law. • 4 items • Updated Dec 9, 2024 • 1

Llama-3.2-Taiwan-Legal-SLM

Collection

Based on the lianghsun/Llama-3.2-Taiwan-*B model, the fine-tuning was conducted using datasets related to the laws and judgments of Taiwan. • 3 items • Updated Dec 9, 2024 • 1