prometheus-eval

university

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper 1 day ago

M-Prometheus: A Suite of Open Multilingual LLM Judges

seungone authored a paper 1 day ago

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

DKYoon authored a paper 2 days ago

M-Prometheus: A Suite of Open Multilingual LLM Judges

View all activity

prometheus-eval's activity

seungone

authored 2 papers 1 day ago

M-Prometheus: A Suite of Open Multilingual LLM Judges

Paper • 2504.04953 • Published 3 days ago

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Paper • 2503.19877 • Published 16 days ago

DKYoon

authored 5 papers 2 days ago

M-Prometheus: A Suite of Open Multilingual LLM Judges

Paper • 2504.04953 • Published 3 days ago

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

Paper • 2410.17578 • Published Oct 23, 2024 • 1

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Paper • 2406.05761 • Published Jun 9, 2024 • 2

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Paper • 2210.01504 • Published Oct 4, 2022

Gradient Ascent Post-training Enhances Language Model Generalization

Paper • 2306.07052 • Published Jun 12, 2023

seungone

in prometheus-eval/BiGGen-Bench 7 days ago

Add link to paper

#2 opened 15 days ago by

suehyunpark

updated a dataset 24 days ago

prometheus-eval/BiGGen-Bench

Viewer • Updated 7 days ago • 765 • 106 • 12

jinheon

authored a paper about 1 month ago

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 44

HerrHruby

updated a dataset about 1 month ago

prometheus-eval/filtered_bon_setting_deepseek_distill_7b

Viewer • Updated Mar 7 • 7.23k • 74

HerrHruby

published a dataset about 1 month ago

prometheus-eval/filtered_bon_setting_deepseek_distill_7b

Viewer • Updated Mar 7 • 7.23k • 74

jinulee-v

updated a dataset about 1 month ago

prometheus-eval/filtered_bon_setting_64

Viewer • Updated Mar 2 • 305k • 375

amphora

authored a paper about 1 month ago

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

Paper • 2502.17407 • Published Feb 24 • 25

seungone

published a dataset about 1 month ago

prometheus-eval/filtered_bon_setting_64

Viewer • Updated Mar 2 • 305k • 375

seungone

updated a dataset about 1 month ago

prometheus-eval/filtered_bon_setting_64

Viewer • Updated Mar 2 • 305k • 375

jinheon

authored a paper 3 months ago

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10 • 72

seungone

authored a paper 3 months ago

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Paper • 2412.10424 • Published Dec 10, 2024 • 2

jinheon

authored a paper 3 months ago

Revisiting In-Context Learning with Long Context Language Models

Paper • 2412.16926 • Published Dec 22, 2024 • 33

seungone

authored a paper 3 months ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 9