REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark Paper • 2406.11927 • Published Jun 17, 2024 • 11
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Paper • 2406.12649 • Published Jun 18, 2024 • 16
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17, 2024 • 34
🏟️ Long Code Arena Collection All the resources for our Long Code Arena benchmark! • 13 items • Updated Jun 19, 2024 • 5
Long Code Arena: a Set of Benchmarks for Long-Context Code Models Paper • 2406.11612 • Published Jun 17, 2024 • 25