No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces Paper • 2502.04959 • Published 3 days ago • 7
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 3 days ago • 20
view article Article Fine-tuning LLMs with Singular Value Decomposition By fractalego • Jun 2, 2024 • 11
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders Paper • 2501.18052 • Published 12 days ago • 6
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published 11 days ago • 25
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published Nov 20, 2024 • 18
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation Paper • 2410.18565 • Published Oct 24, 2024 • 46
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference Paper • 2312.10193 • Published Dec 15, 2023 • 1
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 70
Approximating Two-Layer Feedforward Networks for Efficient Transformers Paper • 2310.10837 • Published Oct 16, 2023 • 11