xingyaoww commited on
Commit
f6b86e1
Β·
verified Β·
1 Parent(s): 5d12649

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -45
README.md CHANGED
@@ -26,7 +26,9 @@ Model/Data associated with Paper:
26
  </p>
27
 
28
  <p align="center">
29
- <a href="assets/paper.pdf">πŸ“ƒ Paper</a>
 
 
30
  β€’
31
  <a href="https://huggingface.co/SWE-Gym" >πŸ€— Data & Models</a>
32
  </p>
@@ -49,52 +51,9 @@ Our baselines achieve new open SOTA - 32%/26% on SWE-Bench Verified/Lite, with p
49
  ![SWE-Gym Scaling](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/scaling.jpg)
50
  *SWE-Gym enables scalable improvements for software engineering agents at both training and inference time. Our current results is primarity bottlenecked by training and inference compute, rather than the size of our environment.*
51
 
52
- ## SWE-Gym Environment
53
-
54
- We create SWE-Gym, the first environment for training SWE agents, with **2.4K real tasks from 11 Python repos** & a Lite split of 234 instances. SWE-Gym combines real-world Python tasks, repository context, executable environments, and test verification to train agents for solving software engineering problems.
55
-
56
- ![SWE-Gym Repo Distribution](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/swe-gym.jpg)
57
-
58
-
59
- ## SWE-Gym trains LMs as agents
60
-
61
- When fine-tuned on less than 500 agent-environment interaction trajectories sampled from it from GPT-4o and Claude 3.5 Sonnet, we achieve **+14%** absolute gains on SWE-Bench Verified with an 32B LM-powered OpenHands agent.
62
-
63
- ![OpenHands Performance diff before and after training](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/oh-agent.jpg)
64
-
65
-
66
- ## SWE-Gym enables self-improvement
67
-
68
- SWE-Gym is also effective across agent scaffolds. With rejection sampling fine-tuning and MeatlessTools scaffold, our 32B and 7B models achieve 20% and 10% respectively on SWE-Bench Lite through self-improvement.
69
-
70
- <p align="center">
71
- <img src="https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/ml-agent.jpg" width="80%" alt="Moatless self-improvement">
72
- </p>
73
-
74
-
75
-
76
- ## SWE-Gym enables inference-time scaling
77
-
78
- SWE-Gym enables inference-time scaling through verifiers trained on agent trajectories.
79
- These verifiers identify most promising solutions via best-of-n selection, together with our learned agents, they achieve 32%/26% on SWE-Bench Verified/Lite, a new open SoTA.
80
-
81
-
82
- ![Inference Time Scaling for Moatless Agent](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/inference-ml.jpg)
83
- *Inference Time Scaling for Moatless Agent*
84
-
85
- ![Inference Time Scaling for OpenHands Agent](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/inference-oh.jpg)
86
- *Inference Time Scaling for OpenHands Agent*
87
-
88
-
89
- ## Our baselines on SWE-Gym shows strong scaling trends
90
-
91
- Lastly, our ablations reveal strong scaling trends - performance is now bottlenecked by train and inference compute, rather than the size of our dataset. Pushing and improving these scaling trends further is an exciting direction for future work.
92
-
93
- ![](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/scaling.jpg)
94
-
95
  ## Reproducing Results
96
 
97
- See [docs/OpenHands.md](docs/OpenHands.md) and [docs/MoatlessTools.md](docs/MoatlessTools.md) for instructions on reproducing results with our training and inference-time results for OpenHands and MoatlessTools agents.
98
 
99
  ## πŸ“š Citation
100
 
 
26
  </p>
27
 
28
  <p align="center">
29
+ <a href="https://github.com/SWE-Gym/SWE-Gym">πŸ’» Code </a>
30
+ β€’
31
+ <a href="https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/paper.pdf">πŸ“ƒ Paper</a>
32
  β€’
33
  <a href="https://huggingface.co/SWE-Gym" >πŸ€— Data & Models</a>
34
  </p>
 
51
  ![SWE-Gym Scaling](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/scaling.jpg)
52
  *SWE-Gym enables scalable improvements for software engineering agents at both training and inference time. Our current results is primarity bottlenecked by training and inference compute, rather than the size of our environment.*
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ## Reproducing Results
55
 
56
+ See [docs/OpenHands.md](https://github.com/SWE-Gym/SWE-Gym/tree/main/docs/OpenHands.md) and [docs/MoatlessTools.md](https://github.com/SWE-Gym/SWE-Gym/tree/main/docs/MoatlessTools.md) for instructions on reproducing results with our training and inference-time results for OpenHands and MoatlessTools agents.
57
 
58
  ## πŸ“š Citation
59