The infrastructure powering IBM's Gen AI model development Paper • 2407.05467 • Published Jul 7, 2024 • 2
FlexAttention for Efficient High-Resolution Vision-Language Models Paper • 2407.20228 • Published Jul 29, 2024 • 1
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26, 2024 • 48
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26, 2024 • 48
Autonomous Tree-search Ability of Large Language Models Paper • 2310.10686 • Published Oct 14, 2023 • 2
SALMON: Self-Alignment with Principle-Following Reward Models Paper • 2310.05910 • Published Oct 9, 2023 • 2
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention Paper • 2304.03282 • Published Apr 6, 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision Paper • 2305.03047 • Published May 4, 2023 • 1
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training Paper • 2306.17165 • Published Jun 29, 2023 • 1
Gated Linear Attention Transformers with Hardware-Efficient Training Paper • 2312.06635 • Published Dec 11, 2023 • 6
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble Paper • 2401.16635 • Published Jan 30, 2024 • 1
Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models Paper • 2401.10716 • Published Jan 19, 2024 • 1
Diversity Measurement and Subset Selection for Instruction Tuning Datasets Paper • 2402.02318 • Published Feb 4, 2024 • 2
API Pack: A Massive Multilingual Dataset for API Call Generation Paper • 2402.09615 • Published Feb 14, 2024