Qiyuan Zhang's picture

3 14

Qiyuan Zhang

DonJoey

·

AI & ML interests

None yet

Recent Activity

authored a paper 13 days ago

NILE: Internal Consistency Alignment in Large Language Models

upvoted a collection 14 days ago

Tulu 3 Datasets

upvoted a paper 17 days ago

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

View all activity

Organizations

None yet

DonJoey's activity

upvoted a collection 14 days ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated 6 days ago • 64

upvoted a paper 17 days ago

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Paper • 2410.02743 • Published Oct 3, 2024 • 7

upvoted a paper 19 days ago

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Paper • 2412.14922 • Published 25 days ago • 85

upvoted a paper 20 days ago

NILE: Internal Consistency Alignment in Large Language Models

Paper • 2412.16686 • Published 22 days ago • 8

upvoted a paper 23 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 24 days ago • 339

upvoted a paper 26 days ago

Reliable, Reproducible, and Really Fast Leaderboards with Evalica

Paper • 2412.11314 • Published 28 days ago • 2

upvoted 2 papers about 1 month ago

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 41

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Paper • 2411.16594 • Published Nov 25, 2024 • 37

upvoted a paper about 2 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 58

upvoted 5 papers 3 months ago

Response Tuning: Aligning Large Language Models without Instruction

Paper • 2410.02465 • Published Oct 3, 2024 • 12

Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 63

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Paper • 2410.04698 • Published Oct 7, 2024 • 13

RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

Paper • 2410.05193 • Published Oct 7, 2024 • 13

Collaborative Performance Prediction for Large Language Models

Paper • 2407.01300 • Published Jul 1, 2024 • 2