GUI-Agent

xlbqc 's Collections

大模型 RL

GUI-Agent

paper_collect

updated 3 days ago

GUI Agent相关论文方案简要分析

Upvote

Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration

Paper • 2502.17110 • Published Feb 24 • 12

Note Mobile,Alibaba,one-shot 通过对操作视频处理提取关键帧长图作为后续动作推理的实例推理过程通过推理、反思、critile 提升推理动作质量
WebGames: Challenging General-Purpose Web-Browsing AI Agents

Paper • 2502.18356 • Published Feb 25 • 12

Note benchmark,Web 提出一个涵盖40个web交互的操作，最新的GPT-4o等模型的成功了在40%左右。文中的web交互操作可以本地运行主要关注一个稍微复杂的网页单一操作，而不是满足人类的模糊需求分析。
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Paper • 2502.18906 • Published Feb 26 • 12

Note RL,无环境训练,奖励模型训练方法通过GPT-4o标注数据训练奖励模型，然后使用奖励模型训练action模型
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Paper • 2503.02268 • Published Mar 4 • 10

Note 记忆,高级动作,效率通过在执行过程中抽象高级动作（频率）记录到记忆中，之后可以通过高级加速gui操作过程记忆使用nodej4图数据库
API Agents vs. GUI Agents: Divergence and Convergence

Paper • 2503.11069 • Published Mar 14 • 35

Note APIAgent, GUI Agents 对比了API和GUI Agent的优缺点，并提出未来混合策略Agent可能是方向。跟人是一样的，为什么会有IT系统，将逻辑封装到后台，减少人在前台的操作。但是有一些过于个性化需求，人类也可以自动将多个不同的操作整合到一起操作。
STEVE: AStep Verification Pipeline for Computer-use Agent Training

Paper • 2503.12532 • Published Mar 16 • 14

Note UI-Agent训练介绍如何收集数据，如何训练（lora 基础模型选择）进行模型训练过程
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published 22 days ago • 58

Note 使用强化学习训练GUImodel
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published Feb 20 • 20

Note GUI-Agent,PC-Agent 1、感知增强APM,通过pywinauto/OCR来提升感知效果--->(类ARIA、A11y 和图像打框） 2、提出一个分层plann框架指令、子任务、行动
Automating the Enterprise with Foundation Models

Paper • 2405.03710 • Published May 3, 2024 • 1

Note 人类指导基于人类的操作示例，生成大模型步骤SOP
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published 7 days ago • 24

Note GUI代理评估模型的基准用于评估 gui代理执行效果（LLM-as-a-judge）的基准
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Paper • 2504.10127 • Published 4 days ago • 15

Note LAM训练在训练和GUI轨迹微调中间加一步中间训练过程。此过程的训练数据主要是一些在gui导航过程中需要使用的能力数据（比如表格、图表等还混入了一些gui轨迹防止后续梯度爆炸）。理解通过中间过程增强模型在GUI导航的时候需要的能力，提升表现

Upvote