moazx
/

Code-Vulnerability-Classifier_app

Text Classification

Safetensors

bert

code

Model card Files Files and versions Community

moazx commited on 25 days ago

Commit

e51dfae

verified ·

1 Parent(s): 3e74489

Update README.md

Browse files

Files changed (1) hide show

README.md +41 -75

README.md CHANGED Viewed

@@ -1,25 +1,23 @@
-Here’s the updated **Model Card** for your **Code Vulnerability Classifier** in the same format as the provided example:
 ---
-license: mit
-language:
-- en
-pipeline_tag: text-classification
 ---
-# Code Vulnerability Classifier
-This repository contains the fine-tuned model `moazx/Code-Vulnerability-Classifier_app` for classifying code snippets into "Vulnerable" or "Non-vulnerable" categories. The model is based on a transformer architecture and trained on the **DiverseVul** dataset, which includes 150 different Common Weakness Enumeration (CWE) types.
 ## Model Description
-The `Code-Vulnerability-Classifier` model is fine-tuned to classify code snippets into two categories:
-- **Vulnerable**: Code that contains potential security vulnerabilities.
-- **Non-vulnerable**: Code that is safe and free from vulnerabilities.
-The model is trained on a diverse dataset of real-world code snippets, covering a wide range of vulnerabilities such as buffer overflows, injection flaws, memory leaks, and more.
 ## Sample Output
@@ -27,87 +25,55 @@ Below are some examples of the model's output:
 ### Example 1
-**Input Code**:
-```c
-static int cirrus_bitblt_videotovideo_patterncopy(CirrusVGAState * s) {
-    return cirrus_bitblt_common_patterncopy(s, s->vram_ptr + (s->cirrus_blt_srcaddr & ~7));
-}
-```
-**Expected Classification**: Vulnerable
-**Predicted Classification**: Vulnerable
-- Probability (Vulnerable): 0.98
-- Probability (Non-vulnerable): 0.02
 ### Example 2
-**Input Code**:
-```c
-static void loongarch_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb) {
-    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
-    CPULoongArchState *env = &cpu->env;
-    env->pc = tb->pc;
-}
-```
-**Expected Classification**: Non-vulnerable
-**Predicted Classification**: Non-vulnerable
-- Probability (Vulnerable): 0.03
-- Probability (Non-vulnerable): 0.97
 ## Training and Dataset
-The model was trained using the **DiverseVul** dataset, which includes:
-- 18,945 vulnerable functions
-- 330,492 non-vulnerable functions
-- 150 different CWE types
-The training process involved the following steps:
-1. **Data Preprocessing**: Code snippets were tokenized and truncated/padded to a fixed length.
-2. **Fine-Tuning**: A pre-trained transformer model was fine-tuned on the DiverseVul dataset.
-3. **Evaluation**: The model was evaluated using accuracy, precision, recall, and F1 score.
 ## Usage
-To use the `moazx/Code-Vulnerability-Classifier_app` model, you can load it using the Hugging Face `transformers` library. Below is an example of how to use the model to classify a code snippet:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
 # Load the tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained("moazx/Code-Vulnerability-Classifier_app")
-model = AutoModelForSequenceClassification.from_pretrained("moazx/Code-Vulnerability-Classifier_app")
-# Encode the input code
-input_code = """
-static int cirrus_bitblt_videotovideo_patterncopy(CirrusVGAState * s) {
-    return cirrus_bitblt_common_patterncopy(s, s->vram_ptr + (s->cirrus_blt_srcaddr & ~7));
-}
-"""
-inputs = tokenizer(input_code, truncation=True, padding='max_length', max_length=512, return_tensors="pt")
-# Get the model predictions
-with torch.no_grad():
-    outputs = model(**inputs)
-    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
-# Print the results
-vulnerable_prob = probabilities[0][1].item()
-non_vulnerable_prob = probabilities[0][0].item()
-predicted_class = "Vulnerable" if vulnerable_prob > non_vulnerable_prob else "Non-vulnerable"
-print(f"Predicted Classification: {predicted_class}")
-print(f"Probability (Vulnerable): {vulnerable_prob:.2f}")
-print(f"Probability (Non-vulnerable): {non_vulnerable_prob:.2f}")
 ```
 ## Acknowledgements
-The model was trained on the **DiverseVul** dataset, which provides a comprehensive collection of vulnerable and non-vulnerable code snippets. Special thanks to the Hugging Face team for their open-source tools and the creators of the DiverseVul dataset for their contributions to the field of code vulnerability analysis.
-For more details on the training process and dataset, please refer to the [DiverseVul Dataset Documentation](https://github.com/example/diverse-vul).
----
-This model card follows the same structure as your example and provides all the necessary details about your **Code Vulnerability Classifier**. You can host it on the Hugging Face model page or include it in your project documentation.

 ---
+license: mit
+pipeline_tag: text-classification
 ---
+# AraBERT-Restaurant-Sentiment
+This repository contains the fine-tuned model `moazx/AraBERT-Restaurant-Sentiment` for classifying Arabic restaurant reviews into positive and negative sentiments. The model is based on the AraBERT architecture and trained on a dataset of 800 Arabic restaurant reviews, collected and labeled using ChatGPT.
 ## Model Description
+The `AraBERT-Restaurant-Sentiment` model is fine-tuned to classify Arabic restaurant reviews into two categories:
+- Positive
+- Negative
+The dataset used for training consists of 400 positive and 400 negative reviews, covering multiple Arabic dialects.
+## Sample Output
+Below are some examples of the model's output:
 ## Sample Output
 ### Example 1
+Input: المطعم ما عجبني، الطعم مو حلو والخدمة كانت سيئة جداً، والموظفين ما كانوا محترمين. الأسعار غالية مقارنة بالجودة. ما بنصح فيه.
+Expected Classification: سلبي
+Predicted Classification: سلبي
+Probability (Negative): 0.98
+Probability (Positive): 0.02
 ### Example 2
+Input: المطعم يجنن والاكل تحفة
+Expected Classification: إيجابي
+Predicted Classification: إيجابي
+Probability (Negative): 0.01
+Probability (Positive): 0.99
 ## Training and Dataset
+The model was trained using the notebook available at [Kaggle](https://www.kaggle.com/code/moazeldsokyx/arabert-arabic-sentiment-analysis). The training process involves the following steps:
+1. Data Collection: 800 reviews (400 positive, 400 negative) were collected and labeled using ChatGPT.
+2. Preprocessing: Text normalization, tokenization, and other preprocessing steps were applied.
+3. Fine-Tuning: The AraBERT model was fine-tuned on the prepared dataset.
+4. Evaluation: The model was evaluated using accuracy and other relevant metrics to ensure its performance.
 ## Usage
+To use the `moazx/AraBERT-Restaurant-Sentiment` model, you can load it using the Hugging Face `transformers` library. Below is an example of how to use the model to classify a restaurant review:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
 # Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("moazx/AraBERT-Restaurant-Sentiment")
+model = AutoModelForSequenceClassification.from_pretrained("moazx/AraBERT-Restaurant-Sentiment")
 ```
 ## Acknowledgements
+The dataset used for training the model was collected and labeled using ChatGPT. Special thanks to the creators of AraBERT and the Hugging Face team for their continuous support and development of open-source NLP tools.
+For more details on the training process and dataset, please refer to the [Kaggle Notebook](https://www.kaggle.com/code/moazeldsokyx/arabert-arabic-sentiment-analysis).