moazx commited on
Commit
e51dfae
·
verified ·
1 Parent(s): 3e74489

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -75
README.md CHANGED
@@ -1,25 +1,23 @@
1
- Here’s the updated **Model Card** for your **Code Vulnerability Classifier** in the same format as the provided example:
2
-
3
  ---
4
-
5
- license: mit
6
- language:
7
- - en
8
- pipeline_tag: text-classification
9
-
10
  ---
11
 
12
- # Code Vulnerability Classifier
13
 
14
- This repository contains the fine-tuned model `moazx/Code-Vulnerability-Classifier_app` for classifying code snippets into "Vulnerable" or "Non-vulnerable" categories. The model is based on a transformer architecture and trained on the **DiverseVul** dataset, which includes 150 different Common Weakness Enumeration (CWE) types.
15
 
16
  ## Model Description
17
 
18
- The `Code-Vulnerability-Classifier` model is fine-tuned to classify code snippets into two categories:
19
- - **Vulnerable**: Code that contains potential security vulnerabilities.
20
- - **Non-vulnerable**: Code that is safe and free from vulnerabilities.
21
 
22
- The model is trained on a diverse dataset of real-world code snippets, covering a wide range of vulnerabilities such as buffer overflows, injection flaws, memory leaks, and more.
 
 
 
 
23
 
24
  ## Sample Output
25
 
@@ -27,87 +25,55 @@ Below are some examples of the model's output:
27
 
28
  ### Example 1
29
 
30
- **Input Code**:
31
- ```c
32
- static int cirrus_bitblt_videotovideo_patterncopy(CirrusVGAState * s) {
33
- return cirrus_bitblt_common_patterncopy(s, s->vram_ptr + (s->cirrus_blt_srcaddr & ~7));
34
- }
35
- ```
 
36
 
37
- **Expected Classification**: Vulnerable
38
- **Predicted Classification**: Vulnerable
39
- - Probability (Vulnerable): 0.98
40
- - Probability (Non-vulnerable): 0.02
41
 
42
  ### Example 2
43
 
44
- **Input Code**:
45
- ```c
46
- static void loongarch_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb) {
47
- LoongArchCPU *cpu = LOONGARCH_CPU(cs);
48
- CPULoongArchState *env = &cpu->env;
49
- env->pc = tb->pc;
50
- }
51
- ```
 
 
52
 
53
- **Expected Classification**: Non-vulnerable
54
- **Predicted Classification**: Non-vulnerable
55
- - Probability (Vulnerable): 0.03
56
- - Probability (Non-vulnerable): 0.97
57
 
58
  ## Training and Dataset
59
 
60
- The model was trained using the **DiverseVul** dataset, which includes:
61
- - 18,945 vulnerable functions
62
- - 330,492 non-vulnerable functions
63
- - 150 different CWE types
64
 
65
- The training process involved the following steps:
66
- 1. **Data Preprocessing**: Code snippets were tokenized and truncated/padded to a fixed length.
67
- 2. **Fine-Tuning**: A pre-trained transformer model was fine-tuned on the DiverseVul dataset.
68
- 3. **Evaluation**: The model was evaluated using accuracy, precision, recall, and F1 score.
69
 
70
  ## Usage
71
 
72
- To use the `moazx/Code-Vulnerability-Classifier_app` model, you can load it using the Hugging Face `transformers` library. Below is an example of how to use the model to classify a code snippet:
 
73
 
74
  ```python
75
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
76
  import torch
77
 
78
  # Load the tokenizer and model
79
- tokenizer = AutoTokenizer.from_pretrained("moazx/Code-Vulnerability-Classifier_app")
80
- model = AutoModelForSequenceClassification.from_pretrained("moazx/Code-Vulnerability-Classifier_app")
81
-
82
- # Encode the input code
83
- input_code = """
84
- static int cirrus_bitblt_videotovideo_patterncopy(CirrusVGAState * s) {
85
- return cirrus_bitblt_common_patterncopy(s, s->vram_ptr + (s->cirrus_blt_srcaddr & ~7));
86
- }
87
- """
88
- inputs = tokenizer(input_code, truncation=True, padding='max_length', max_length=512, return_tensors="pt")
89
-
90
- # Get the model predictions
91
- with torch.no_grad():
92
- outputs = model(**inputs)
93
- probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
94
-
95
- # Print the results
96
- vulnerable_prob = probabilities[0][1].item()
97
- non_vulnerable_prob = probabilities[0][0].item()
98
- predicted_class = "Vulnerable" if vulnerable_prob > non_vulnerable_prob else "Non-vulnerable"
99
-
100
- print(f"Predicted Classification: {predicted_class}")
101
- print(f"Probability (Vulnerable): {vulnerable_prob:.2f}")
102
- print(f"Probability (Non-vulnerable): {non_vulnerable_prob:.2f}")
103
  ```
104
 
105
  ## Acknowledgements
106
 
107
- The model was trained on the **DiverseVul** dataset, which provides a comprehensive collection of vulnerable and non-vulnerable code snippets. Special thanks to the Hugging Face team for their open-source tools and the creators of the DiverseVul dataset for their contributions to the field of code vulnerability analysis.
108
-
109
- For more details on the training process and dataset, please refer to the [DiverseVul Dataset Documentation](https://github.com/example/diverse-vul).
110
-
111
- ---
112
 
113
- This model card follows the same structure as your example and provides all the necessary details about your **Code Vulnerability Classifier**. You can host it on the Hugging Face model page or include it in your project documentation.
 
 
 
1
  ---
2
+ license: mit
3
+ pipeline_tag: text-classification
 
 
 
 
4
  ---
5
 
6
+ # AraBERT-Restaurant-Sentiment
7
 
8
+ This repository contains the fine-tuned model `moazx/AraBERT-Restaurant-Sentiment` for classifying Arabic restaurant reviews into positive and negative sentiments. The model is based on the AraBERT architecture and trained on a dataset of 800 Arabic restaurant reviews, collected and labeled using ChatGPT.
9
 
10
  ## Model Description
11
 
12
+ The `AraBERT-Restaurant-Sentiment` model is fine-tuned to classify Arabic restaurant reviews into two categories:
13
+ - Positive
14
+ - Negative
15
 
16
+ The dataset used for training consists of 400 positive and 400 negative reviews, covering multiple Arabic dialects.
17
+
18
+ ## Sample Output
19
+
20
+ Below are some examples of the model's output:
21
 
22
  ## Sample Output
23
 
 
25
 
26
  ### Example 1
27
 
28
+ Input: المطعم ما عجبني، الطعم مو حلو والخدمة كانت سيئة جداً، والموظفين ما كانوا محترمين. الأسعار غالية مقارنة بالجودة. ما بنصح فيه.
29
+
30
+ Expected Classification: سلبي
31
+
32
+ Predicted Classification: سلبي
33
+
34
+ Probability (Negative): 0.98
35
 
36
+ Probability (Positive): 0.02
 
 
 
37
 
38
  ### Example 2
39
 
40
+ Input: المطعم يجنن والاكل تحفة
41
+
42
+ Expected Classification: إيجابي
43
+
44
+ Predicted Classification: إيجابي
45
+
46
+ Probability (Negative): 0.01
47
+
48
+ Probability (Positive): 0.99
49
+
50
 
 
 
 
 
51
 
52
  ## Training and Dataset
53
 
54
+ The model was trained using the notebook available at [Kaggle](https://www.kaggle.com/code/moazeldsokyx/arabert-arabic-sentiment-analysis). The training process involves the following steps:
 
 
 
55
 
56
+ 1. Data Collection: 800 reviews (400 positive, 400 negative) were collected and labeled using ChatGPT.
57
+ 2. Preprocessing: Text normalization, tokenization, and other preprocessing steps were applied.
58
+ 3. Fine-Tuning: The AraBERT model was fine-tuned on the prepared dataset.
59
+ 4. Evaluation: The model was evaluated using accuracy and other relevant metrics to ensure its performance.
60
 
61
  ## Usage
62
 
63
+ To use the `moazx/AraBERT-Restaurant-Sentiment` model, you can load it using the Hugging Face `transformers` library. Below is an example of how to use the model to classify a restaurant review:
64
+
65
 
66
  ```python
67
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
68
  import torch
69
 
70
  # Load the tokenizer and model
71
+ tokenizer = AutoTokenizer.from_pretrained("moazx/AraBERT-Restaurant-Sentiment")
72
+ model = AutoModelForSequenceClassification.from_pretrained("moazx/AraBERT-Restaurant-Sentiment")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ```
74
 
75
  ## Acknowledgements
76
 
77
+ The dataset used for training the model was collected and labeled using ChatGPT. Special thanks to the creators of AraBERT and the Hugging Face team for their continuous support and development of open-source NLP tools.
 
 
 
 
78
 
79
+ For more details on the training process and dataset, please refer to the [Kaggle Notebook](https://www.kaggle.com/code/moazeldsokyx/arabert-arabic-sentiment-analysis).