metadata
base_model: mistralai/Mistral-7B-Instruct-v0.2
library_name: peft
datasets:
- tumeteor/Security-TTP-Mapping
language:
- en
Model Card for Model ID
This Model is built based on Mistral-7B which take attack scenario as input and it outputs techniques used by attacker
Model Details
Model Description
This Model is built based on Mistral-7B which take attack scenario as input and it outputs techniques used by attacker
- Developed by: Harish Santhanalakshmi Ganesan
- Funded by [optional]: None
- Shared by [optional]: None
- Model type: LLM
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model [optional]: mistralai/Mistral-7B-Instruct-v0.2
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "rootxhacker/mistralai-7B-attack2ttp"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_4bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
def get_completion(query: str, model, tokenizer) -> str:
device = "cuda:0"
prompt_template = """
here is intruction you need to map Attack scenario with TTPs
### Question:
{query}
### Answer:
"""
prompt = prompt_template.format(query=query)
encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
model_inputs = encodeds.to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(generated_ids)
return (decoded[0])
Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)
[More Information Needed]
Training Details
Training Data
https://huggingface.co/datasets/tumeteor/Security-TTP-Mapping
[More Information Needed]
Citation [optional]
@inproceedings{nguyen-srndic-neth-ttpm,
title = "Noise Contrastive Estimation-based Matching Framework for Low-resource Security Attack Pattern Recognition",
author = "Nguyen, Tu and Šrndić, Nedim and Neth, Alexander",
booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics",
month = mar,
year = "2024",
publisher = "Association for Computational Linguistics",
abstract = "Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.",
}