Update README.md
Browse files
README.md
CHANGED
@@ -1,25 +1,23 @@
|
|
1 |
-
Here’s the updated **Model Card** for your **Code Vulnerability Classifier** in the same format as the provided example:
|
2 |
-
|
3 |
---
|
4 |
-
|
5 |
-
|
6 |
-
language:
|
7 |
-
- en
|
8 |
-
pipeline_tag: text-classification
|
9 |
-
|
10 |
---
|
11 |
|
12 |
-
#
|
13 |
|
14 |
-
This repository contains the fine-tuned model `moazx/
|
15 |
|
16 |
## Model Description
|
17 |
|
18 |
-
The `
|
19 |
-
-
|
20 |
-
-
|
21 |
|
22 |
-
The
|
|
|
|
|
|
|
|
|
23 |
|
24 |
## Sample Output
|
25 |
|
@@ -27,87 +25,55 @@ Below are some examples of the model's output:
|
|
27 |
|
28 |
### Example 1
|
29 |
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
|
|
36 |
|
37 |
-
|
38 |
-
**Predicted Classification**: Vulnerable
|
39 |
-
- Probability (Vulnerable): 0.98
|
40 |
-
- Probability (Non-vulnerable): 0.02
|
41 |
|
42 |
### Example 2
|
43 |
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
|
|
|
|
52 |
|
53 |
-
**Expected Classification**: Non-vulnerable
|
54 |
-
**Predicted Classification**: Non-vulnerable
|
55 |
-
- Probability (Vulnerable): 0.03
|
56 |
-
- Probability (Non-vulnerable): 0.97
|
57 |
|
58 |
## Training and Dataset
|
59 |
|
60 |
-
The model was trained using the
|
61 |
-
- 18,945 vulnerable functions
|
62 |
-
- 330,492 non-vulnerable functions
|
63 |
-
- 150 different CWE types
|
64 |
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
|
70 |
## Usage
|
71 |
|
72 |
-
To use the `moazx/
|
|
|
73 |
|
74 |
```python
|
75 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
76 |
import torch
|
77 |
|
78 |
# Load the tokenizer and model
|
79 |
-
tokenizer = AutoTokenizer.from_pretrained("moazx/
|
80 |
-
model = AutoModelForSequenceClassification.from_pretrained("moazx/
|
81 |
-
|
82 |
-
# Encode the input code
|
83 |
-
input_code = """
|
84 |
-
static int cirrus_bitblt_videotovideo_patterncopy(CirrusVGAState * s) {
|
85 |
-
return cirrus_bitblt_common_patterncopy(s, s->vram_ptr + (s->cirrus_blt_srcaddr & ~7));
|
86 |
-
}
|
87 |
-
"""
|
88 |
-
inputs = tokenizer(input_code, truncation=True, padding='max_length', max_length=512, return_tensors="pt")
|
89 |
-
|
90 |
-
# Get the model predictions
|
91 |
-
with torch.no_grad():
|
92 |
-
outputs = model(**inputs)
|
93 |
-
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
|
94 |
-
|
95 |
-
# Print the results
|
96 |
-
vulnerable_prob = probabilities[0][1].item()
|
97 |
-
non_vulnerable_prob = probabilities[0][0].item()
|
98 |
-
predicted_class = "Vulnerable" if vulnerable_prob > non_vulnerable_prob else "Non-vulnerable"
|
99 |
-
|
100 |
-
print(f"Predicted Classification: {predicted_class}")
|
101 |
-
print(f"Probability (Vulnerable): {vulnerable_prob:.2f}")
|
102 |
-
print(f"Probability (Non-vulnerable): {non_vulnerable_prob:.2f}")
|
103 |
```
|
104 |
|
105 |
## Acknowledgements
|
106 |
|
107 |
-
The
|
108 |
-
|
109 |
-
For more details on the training process and dataset, please refer to the [DiverseVul Dataset Documentation](https://github.com/example/diverse-vul).
|
110 |
-
|
111 |
-
---
|
112 |
|
113 |
-
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: text-classification
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
+
# AraBERT-Restaurant-Sentiment
|
7 |
|
8 |
+
This repository contains the fine-tuned model `moazx/AraBERT-Restaurant-Sentiment` for classifying Arabic restaurant reviews into positive and negative sentiments. The model is based on the AraBERT architecture and trained on a dataset of 800 Arabic restaurant reviews, collected and labeled using ChatGPT.
|
9 |
|
10 |
## Model Description
|
11 |
|
12 |
+
The `AraBERT-Restaurant-Sentiment` model is fine-tuned to classify Arabic restaurant reviews into two categories:
|
13 |
+
- Positive
|
14 |
+
- Negative
|
15 |
|
16 |
+
The dataset used for training consists of 400 positive and 400 negative reviews, covering multiple Arabic dialects.
|
17 |
+
|
18 |
+
## Sample Output
|
19 |
+
|
20 |
+
Below are some examples of the model's output:
|
21 |
|
22 |
## Sample Output
|
23 |
|
|
|
25 |
|
26 |
### Example 1
|
27 |
|
28 |
+
Input: المطعم ما عجبني، الطعم مو حلو والخدمة كانت سيئة جداً، والموظفين ما كانوا محترمين. الأسعار غالية مقارنة بالجودة. ما بنصح فيه.
|
29 |
+
|
30 |
+
Expected Classification: سلبي
|
31 |
+
|
32 |
+
Predicted Classification: سلبي
|
33 |
+
|
34 |
+
Probability (Negative): 0.98
|
35 |
|
36 |
+
Probability (Positive): 0.02
|
|
|
|
|
|
|
37 |
|
38 |
### Example 2
|
39 |
|
40 |
+
Input: المطعم يجنن والاكل تحفة
|
41 |
+
|
42 |
+
Expected Classification: إيجابي
|
43 |
+
|
44 |
+
Predicted Classification: إيجابي
|
45 |
+
|
46 |
+
Probability (Negative): 0.01
|
47 |
+
|
48 |
+
Probability (Positive): 0.99
|
49 |
+
|
50 |
|
|
|
|
|
|
|
|
|
51 |
|
52 |
## Training and Dataset
|
53 |
|
54 |
+
The model was trained using the notebook available at [Kaggle](https://www.kaggle.com/code/moazeldsokyx/arabert-arabic-sentiment-analysis). The training process involves the following steps:
|
|
|
|
|
|
|
55 |
|
56 |
+
1. Data Collection: 800 reviews (400 positive, 400 negative) were collected and labeled using ChatGPT.
|
57 |
+
2. Preprocessing: Text normalization, tokenization, and other preprocessing steps were applied.
|
58 |
+
3. Fine-Tuning: The AraBERT model was fine-tuned on the prepared dataset.
|
59 |
+
4. Evaluation: The model was evaluated using accuracy and other relevant metrics to ensure its performance.
|
60 |
|
61 |
## Usage
|
62 |
|
63 |
+
To use the `moazx/AraBERT-Restaurant-Sentiment` model, you can load it using the Hugging Face `transformers` library. Below is an example of how to use the model to classify a restaurant review:
|
64 |
+
|
65 |
|
66 |
```python
|
67 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
68 |
import torch
|
69 |
|
70 |
# Load the tokenizer and model
|
71 |
+
tokenizer = AutoTokenizer.from_pretrained("moazx/AraBERT-Restaurant-Sentiment")
|
72 |
+
model = AutoModelForSequenceClassification.from_pretrained("moazx/AraBERT-Restaurant-Sentiment")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
```
|
74 |
|
75 |
## Acknowledgements
|
76 |
|
77 |
+
The dataset used for training the model was collected and labeled using ChatGPT. Special thanks to the creators of AraBERT and the Hugging Face team for their continuous support and development of open-source NLP tools.
|
|
|
|
|
|
|
|
|
78 |
|
79 |
+
For more details on the training process and dataset, please refer to the [Kaggle Notebook](https://www.kaggle.com/code/moazeldsokyx/arabert-arabic-sentiment-analysis).
|