Software Engineering Arena

community

AI & ML interests

Model Evaluation

Recent Activity

zhiminy  updated a dataset 5 days ago
SE-Arena/conversations
zhiminy  updated a dataset 5 days ago
SE-Arena/votes
zhiminy  updated a Space 5 days ago
SE-Arena/Software-Engineering-Arena
View all activity

SE-Arena's activity

zhiminy 
posted an update 5 days ago
view post
Post
1837
# 🚀 SE Arena: Evaluating Foundation Models for Software Engineering

**SE Arena** is the first open-source platform for evaluating foundation models in real-world software engineering workflows.

## What makes it unique?

- **RepoChat**: Automatically injects repository context (issues, commits, PRs) into conversations for more realistic evaluations
- **Multi-round interactions**: Tests models through iterative workflows, not just single prompts
- **Novel metrics**: Includes a "consistency score" that measures model determinism through self-play matches

Try it now: SE-Arena/Software-Engineering-Arena

## Why it matters

Traditional evaluation frameworks don't capture how developers actually use models in their daily work. SE Arena creates a testing environment that mirrors real engineering workflows, helping you choose the right model for your specific software development needs.

From debugging to requirement refinement, see which models truly excel at software engineering tasks!
zhiminy 
posted an update 5 months ago
zhiminy 
posted an update 10 months ago
zhiminy 
posted an update 10 months ago
view post
Post
2002
Hey everyone!

Our team just dropped something cool! 🎉 We've published a new paper on arxiv diving into the foundation model leaderboards across different platforms. We've analyzed the content, operational workflows, and common issues of these leaderboards. From this, we came up with two new concepts: Leaderboard Operations (LBOps) and leaderboard smells.

We also put together an awesome list with nearly 300 of the latest leaderboards, development tools, and publishing organizations. You can check it out here: https://github.com/SAILResearch/awesome-foundation-model-leaderboards

If you find it useful or interesting, give us a follow or drop a comment. We'd love to hear your thoughts and get your support! ✨

Link to the paper: https://arxiv.org/abs/2407.04065