metadata

base_model: mistralai/Mistral-7B-Instruct-v0.2
library_name: peft
datasets:
  - tumeteor/Security-TTP-Mapping
language:
  - en

Model Card for Model ID

This Model is built based on Mistral-7B which take attack scenario as input and it outputs techniques used by attacker

Model Details

Model Description

This Model is built based on Mistral-7B which take attack scenario as input and it outputs techniques used by attacker

Developed by: Harish Santhanalakshmi Ganesan
Funded by [optional]: None
Shared by [optional]: None
Model type: LLM
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model [optional]: mistralai/Mistral-7B-Instruct-v0.2

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model


import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "rootxhacker/mistralai-7B-attack2ttp"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_4bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

def get_completion(query: str, model, tokenizer) -> str:
  device = "cuda:0"

  prompt_template = """
  here is intruction you need to map Attack scenario with TTPs
  ### Question:
  {query}

  ### Answer:
  """
  prompt = prompt_template.format(query=query)

  encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

  model_inputs = encodeds.to(device)


  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  decoded = tokenizer.batch_decode(generated_ids)
  return (decoded[0])

Load the Lora model

model = PeftModel.from_pretrained(model, peft_model_id)

[More Information Needed]

Training Details

Training Data

https://huggingface.co/datasets/tumeteor/Security-TTP-Mapping

[More Information Needed]

Citation [optional]

@inproceedings{nguyen-srndic-neth-ttpm,
    title = "Noise Contrastive Estimation-based Matching Framework for Low-resource Security Attack Pattern Recognition",
    author = "Nguyen, Tu and Šrndić, Nedim and Neth, Alexander",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics",
    month = mar,
    year = "2024",
    publisher = "Association for Computational Linguistics",
    abstract = "Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.",
}