Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress Paper • 2408.14960 • Published Aug 27, 2024
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning Paper • 2410.10801 • Published Oct 14, 2024 • 1
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 12
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 18
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models Paper • 2406.03368 • Published Jun 5, 2024
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
view post Post 2378 Tried my hand at simplifying the derivations of Direct Preference Optimization.I cover how one can reformulate RLHF into DPO. The idea of implicit reward modeling is chef's kiss.Blog: https://huggingface.co/blog/ariG23498/rlhf-to-dpo See translation 👍 4 4 + Reply