guidobenb commited on
Commit
674c62c
1 Parent(s): 609e127

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -3
README.md CHANGED
@@ -33,18 +33,48 @@ It achieves the following results on the evaluation set:
33
 
34
  ## Model description
35
 
36
- More information needed
 
 
 
 
 
 
37
 
38
  ## Intended uses & limitations
 
 
 
 
 
 
39
 
40
- More information needed
 
 
41
 
42
  ## Training and evaluation data
43
 
44
- More information needed
 
 
 
 
 
45
 
46
  ## Training procedure
47
 
 
 
 
 
 
 
 
 
 
 
 
48
  ### Training hyperparameters
49
 
50
  The following hyperparameters were used during training:
 
33
 
34
  ## Model description
35
 
36
+ VERISBERTA is an advanced language model designed to improve threat intelligence analysis in the field of critical infrastructures.
37
+ He specializes in interpreting security incident narratives, using domain-specific vocabulary when trained with real incident data extracted from
38
+ Verizon's cybersecurity incident database.
39
+
40
+ This model is based on the darkBERT model and has been fine-tuned with data from VCDB to identify key entities and terms.
41
+ VERISBERTA aims to be a useful tool for cybersecurity professionals, facilitating the collection and analysis of critical
42
+ threat intelligence data in critical infrastructures.
43
 
44
  ## Intended uses & limitations
45
+ A machine learning model has been developed for the classification and identification of named entities (NER) in the context of cybersecurity incidents, using the VERIS vocabulary (Vocabulary for Event Recording
46
+ and Incident Sharing) and its 4A categories (actor, asset, action and attribute). The model is based on the BERT architecture and has been pre-trained on a corpus
47
+ prepared especially for this work with narratives extracted from VCDB, which allows it to better understand the VERIS language and the characteristics of this
48
+ environment. The model has demonstrated good performance in the evaluation tasks, reaching an Accuracy of 0.88.
49
+
50
+ ## Future lines of work
51
 
52
+ Different techniques can be explored to improve the performance of the NER model, such as the use of more advanced text preprocessing techniques or
53
+ the incorporation of other machine learning models. The VERIS vocabulary can be expanded to include new named entities relevant to the analysis of cybersecurity
54
+ incidents. The capabilities of the model can be extended with new tasks such as text-classification to identify types of CIA attributes in incident narratives by analyzing other models available in HF that are more specific to this type of problem.
55
 
56
  ## Training and evaluation data
57
 
58
+ The VCDB is a free, public repository of publicly disclosed security incidents encoded in VERIS format. The dataset contains
59
+ information on a wide range of incidents, including malware attacks, intrusions, data breaches, and denial-of-service (DoS) attacks,
60
+ and a wide range of real-world security incidents, which can help CIT teams better understand current and emerging threats.
61
+ The VCDB can be used to analyze trends in security incidents, such as the most common types of attacks, threat actors, and
62
+ target sectors. It can also be used to train threat intelligence models that can help identify and prevent security
63
+ incidents, which is the purpose of this paper.
64
 
65
  ## Training procedure
66
 
67
+ trainer = Trainer(
68
+ model,
69
+ args,
70
+ train_dataset=tokenized_datasets["train"],
71
+ eval_dataset=tokenized_datasets["test"],
72
+ data_collator=data_collator,
73
+ tokenizer=tokenizer,
74
+ compute_metrics=compute_metrics
75
+ )
76
+ trainer.train()
77
+
78
  ### Training hyperparameters
79
 
80
  The following hyperparameters were used during training: