SandboxLM

SandboxLM is a language model fine-tuned on a carefully curated synthetic dataset using the GPT-2 architecture. This model was specifically created to act as a special advisory to AI agents using shell commands, helping them operate securely by identifying potentially harmful shell commands. SandboxLM aims to assist in improving the safety and security of AI-driven shell operations. It was inspired by the author's need to complement a tool like AGit

This is a preview release of SanboxLM and while immediately useful, is ripe yet for "production". Feedback is welcome.

Model Description

SandboxLM is built on the GPT-2 architecture, a Transformer-based language model. The model has been fine-tuned on a dataset designed to help identify and classify shell commands as either safe or potentially dangerous. This makes it suitable for security advisory tasks, particularly in environments where AI agents are used to execute shell commands.

Attention has been given to make it immediately useful:

SandboxLM is optimized for CPU inference, as the author uses an 2019 Intel MacBook for his local work.
SandboxLM was trained to output JSON for maximum interoperabilty.
Effort was made to train it on many permutations of different shell commands to increase generalization (however no gurantee is made).

Use At Your Own Risk

The products/services/information provided herein are offered on an "as is" and "as available" basis, without any warranties or representations, express or implied. The user assumes all responsibility and risk for the use of these products/services/information. We do not guarantee the accuracy, completeness, or usefulness of any information provided and expressly disclaim all liability for any damages or losses arising from their use. By utilizing these products/services/information, you acknowledge and agree that you do so entirely at your own risk.

Usage

To use this model, install the transformers library and load the model and tokenizer as follows:

from transformers import pipeline
sboxlm = pipeline('text-generation', model='sivang/sandboxlm', tokenizer='sivang/sandboxlm', temperature=0.1)
import json

json.loads(sboxlm("command: rm -rf /")[0]['generated_text'].split("verdict:")[1])

{'risk': 'dangerous',
 'explain': 'Deletes the root folder on the system, rendering it unusable and all data is lost.'}

Limitations

While SandboxLM performs realtively well in detecting potentially harmful shell commands (and can make some even surprisingly accurate prediction even when the context it provides seems to hellucinat!), it may not catch all edge cases or obscure security risks. It should not be solely relied upon for mission-critical systems. It is recommended to combine it with other security measures to ensure the safety of shell operations. Additionally, since it was trained on specific datasets, it may reflect any biases present in those datasets.