ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance Paper • 2504.08716 • Published 9 days ago • 9
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 11 days ago • 70
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 12 days ago • 101
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 12 days ago • 144
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 13 days ago • 163
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper • 2504.02605 • Published 17 days ago • 43