Spaces:

MJ-Bench
/

README

Running

App Files Files Community

Zhaorun commited on Jul 9, 2024

Commit

ab92717

verified ·

1 Parent(s): d56fc81

Update README.md

Browse files

Files changed (1) hide show

README.md +27 -2

README.md CHANGED Viewed

@@ -7,9 +7,34 @@ sdk: static
 pinned: false
 ---
-[Project page](https://mj-bench.github.io/)
-Multimodal judges reward models play a pivotal role in Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). They serve as judges, providing crucial feedback to align foundation models (FMs) with desired behaviors. However, the evaluation of these multimodal judges often lacks thoroughness, leading to potential misalignment and unsafe fine-tuning outcomes.
 To address this, we introduce **MJ-Bench**, a novel benchmark designed to evaluate multimodal judges using a comprehensive preference dataset. MJ-Bench assesses feedback for image generation models across four key perspectives: ***alignment***, ***safety***, ***image quality***, and ***bias***.

 pinned: false
 ---
+# :woman_judge: [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?]()
+<!-- <h3 align="center"><a href="https://arxiv.org/abs/2407.04842" style="color:#9C276A">
+**MJ-BENCH**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?</a></h3> -->
+<h5 align="center"> If our project helps you, please consider giving us a star ⭐ 🥹🙏 </h2>
+<h5 align="center">
+<div align="center">
+  <img src="https://github.com/MJ-Bench/MJ-Bench.github.io/blob/main/static/images/dataset_overview.png" width="80%">
+</div>
+[![project](https://img.shields.io/badge/🥳-Project-9C276A.svg)](https://mj-bench.github.io/)
+[![hf_space](https://img.shields.io/badge/🤗-Huggingface-9C276A.svg)](https://huggingface.co/MJ-Bench)
+[![hf_leaderboard](https://img.shields.io/badge/🤗-Leaderboard-9C276A.svg)](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard)
+[![Dataset](https://img.shields.io/badge/🤗-Dataset-9C276A.svg)](https://huggingface.co/datasets/MJ-Bench/MJ-Bench)<br>
+[![arXiv](https://img.shields.io/badge/Arxiv-2407.04842-AD1C18.svg?logo=arXiv)]([https://arxiv.org/abs/2407.04842](https://arxiv.org/abs/2407.04842))
+[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FMJ-Bench%2FMJ-Bench&count_bg=%23C25AE6&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visitor&edge_flat=false)](https://hits.seeyoufarm.com)
+[![GitHub issues](https://img.shields.io/github/issues/MJ-Bench/MJ-Bench?color=critical&label=Issues)](https://github.com/MJ-Bench/MJ-Bench/issues)
+[![GitHub Stars](https://img.shields.io/github/stars/MJ-Bench/MJ-Bench?style=social)](https://github.com/MJ-Bench/MJ-Bench/stargazers)
+  <br>
+</h5>
+Multimodal judges play a pivotal role in Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). They serve as judges, providing crucial feedback to align foundation models (FMs) with desired behaviors. However, the evaluation of these multimodal judges often lacks thoroughness, leading to potential misalignment and unsafe fine-tuning outcomes.
 To address this, we introduce **MJ-Bench**, a novel benchmark designed to evaluate multimodal judges using a comprehensive preference dataset. MJ-Bench assesses feedback for image generation models across four key perspectives: ***alignment***, ***safety***, ***image quality***, and ***bias***.