arxiv:2502.18528

ARACNE: An LLM-Based Autonomous Shell Pentesting Agent

Published on Feb 24

Authors:

Abstract

We introduce ARACNE, a fully autonomous LLM-based pentesting agent tailored for SSH services that can execute commands on real Linux shell systems. Introduces a new agent architecture with multi-LLM model support. Experiments show that ARACNE can reach a 60\% success rate against the autonomous defender ShelLM and a 57.58\% success rate against the Over The Wire Bandit CTF challenges, improving over the state-of-the-art. When winning, the average number of actions taken by the agent to accomplish the goals was less than 5. The results show that the use of multi-LLM is a promising approach to increase accuracy in the actions.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2502.18528 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2502.18528 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2502.18528 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.