lm-human-preference-details

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

vwxyzjn authored a paper 11 months ago

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

vwxyzjn authored a paper about 1 year ago

Zephyr: Direct Distillation of LM Alignment

vwxyzjn authored a paper about 1 year ago

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

View all activity

lm-human-preference-details's activity

vwxyzjn

authored a paper 11 months ago

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Paper • 2402.03046 • Published Feb 5 • 6

vwxyzjn

authored 2 papers about 1 year ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 123

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Paper • 2310.00036 • Published Sep 29, 2023 • 2

vwxyzjn

updated a Space about 1 year ago

Rlhf Demo

vwxyzjn

updated 16 models about 1 year ago

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed1

Text Generation • Updated Oct 6, 2023 • 21

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 21

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed3

Text Generation • Updated Oct 6, 2023 • 21

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed4

Text Generation • Updated Oct 6, 2023 • 21

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 6, 2023 • 21

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed3

Text Generation • Updated Oct 6, 2023 • 23

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 23

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 6, 2023 • 19

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed4

Text Generation • Updated Oct 6, 2023 • 27

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 23

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 5, 2023 • 24

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed4

Text Generation • Updated Oct 5, 2023 • 22

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed3

Text Generation • Updated Oct 5, 2023 • 23

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed1

Text Generation • Updated Oct 5, 2023 • 22

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__descriptiveness_offline_5k.json__seed1

Text Generation • Updated Oct 5, 2023 • 21

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__descriptiveness_offline_5k.json__seed3

Text Generation • Updated Oct 5, 2023 • 20