Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators Paper • 2503.19877 • Published 16 days ago
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models Paper • 2410.17578 • Published Oct 23, 2024 • 1
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9, 2024 • 2
Knowledge Unlearning for Mitigating Privacy Risks in Language Models Paper • 2210.01504 • Published Oct 4, 2022
Gradient Ascent Post-training Enhances Language Model Generalization Paper • 2306.07052 • Published Jun 12, 2023
Can Large Language Models Infer and Disagree Like Humans? Paper • 2305.13788 • Published May 23, 2023
Stable Language Model Pre-training by Reducing Embedding Variability Paper • 2409.07787 • Published Sep 12, 2024
Diffusion Models Through a Global Lens: Are They Culturally Inclusive? Paper • 2502.08914 • Published Feb 13
When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts Paper • 2503.16826 • Published 20 days ago
Can LVLMs and Automatic Metrics Capture Underlying Preferences of Blind and Low-Vision Individuals for Navigational Aid? Paper • 2502.14883 • Published Feb 15
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions Paper • 2503.13369 • Published 24 days ago • 7
FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates Paper • 2503.07216 • Published Mar 10 • 31
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap Paper • 2309.12382 • Published Sep 21, 2023
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis Paper • 1904.01906 • Published Apr 3, 2019
Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models Paper • 2305.15080 • Published May 24, 2023
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation Paper • 2401.06591 • Published Jan 12, 2024 • 4