houcine-bdk
/

chatMachine_v1

@@ -1,80 +1,121 @@
 ---
 language: en
 tags:
-- pytorch
 - gpt2
-- text-generation
-- nanoGPT
 license: mit
-datasets:
-- custom
-model-index:
-- name: chatMachineProto
-  results: []
 ---
-# NanoGPT Personal Experiment
-This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
 ## Model Description
-The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
-### Technical Details
-- Base Architecture: GPT-2
-- Training Infrastructure: 8x A100 80GB GPUs
-- Parameters: ~124M (similar to GPT-2 small)
-### Training Process
-The model underwent a multi-stage training process:
-1. Initial training on a subset of the OpenWebText dataset
-2. Experimentation with different hyperparameters and optimization techniques
-### Features
-- Clean, minimal implementation of the GPT architecture
-- Efficient training utilizing modern GPU capabilities
-- Configurable generation parameters (temperature, top-k sampling)
-- Support for both direct text generation and interactive chat
-## Use Cases
-This model is primarily an experimental project and can be used for:
-- Educational purposes to understand transformer architectures
-- Text generation experiments
-- Research into language model behavior
-- Interactive chat experiments
-## Limitations
-As this is a personal experiment, please note:
-- The model may produce inconsistent or incorrect outputs
-- It's not intended for production use
-- Responses may be unpredictable or contain biases
-- Performance may vary significantly depending on the input
-## Development Context
-This project was developed as part of my personal exploration into AI/ML, specifically focusing on:
-- Understanding transformer architectures
-- Learning about large-scale model training
-- Experimenting with different training approaches
-- Gaining hands-on experience with modern AI infrastructure
-## Acknowledgments
-This project builds upon the excellent work of:
-- The original GPT-2 paper by OpenAI
-- The nanoGPT implementation by Andrej Karpathy
-- The broader open-source AI community
-## Disclaimer
-This is a personal experimental project and should be treated as such. It's not intended for production use or as a replacement for more established language models. The primary goal was learning and experimentation.
----
-Feel free to explore the model and provide feedback. Remember that this is an experimental project, and results may vary significantly from more established models.

 ---
 language: en
 tags:
+- question-answering
+- squad
 - gpt2
+- fine-tuned
 license: mit
 ---
+# ChatMachine_v1: GPT-2 Fine-tuned on SQuAD
+This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.
 ## Model Description
+- **Base Model**: GPT-2 (124M parameters)
+- **Training Data**: Stanford Question Answering Dataset (SQuAD)
+- **Task**: Question Answering
+- **Framework**: PyTorch with Hugging Face Transformers
+## Training Details
+The model was fine-tuned using:
+- Mixed precision training (bfloat16)
+- Learning rate: 2e-5
+- Batch size: 16
+- Gradient accumulation steps: 8
+- Warmup steps: 1000
+- Weight decay: 0.1
+## Usage
+```python
+from transformers import GPT2LMHeadModel, GPT2Tokenizer
+# Load model and tokenizer
+model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
+tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+tokenizer.pad_token = tokenizer.eos_token
+# Format your input
+context = "Paris is the capital and largest city of France."
+question = "What is the capital of France?"
+input_text = f"Context: {context} Question: {question} Answer:"
+# Generate answer
+inputs = tokenizer(input_text, return_tensors="pt", padding=True)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=50,
+    temperature=0.3,
+    do_sample=True,
+    top_p=0.9,
+    num_beams=4,
+    early_stopping=True,
+    pad_token_id=tokenizer.pad_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+)
+# Extract answer
+answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
+print(f"Answer: {answer}")
+```
+## Performance and Limitations
+The model performs best with:
+- Simple, focused questions
+- Clear, concise context
+- Factual questions (who, what, when, where)
+Limitations:
+- May struggle with complex, multi-part questions
+- Performance depends on the clarity and relevance of the provided context
+- Best suited for short, focused answers rather than lengthy explanations
+## Example Questions
+```python
+test_cases = [
+    {
+        "context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
+        "question": "Who was the first president of the United States?"
+    },
+    {
+        "context": "The brain uses approximately 20 percent of the body's total energy consumption.",
+        "question": "How much of the body's energy does the brain use?"
+    }
+]
+```
+Expected outputs:
+- "George Washington"
+- "20 percent"
+## Training Infrastructure
+The model was trained on an RTX 4090 GPU using:
+- PyTorch with CUDA optimizations
+- Mixed precision training (bfloat16)
+- Gradient accumulation for effective batch size scaling
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{chatmachine_v1,
+  author = {Houcine BDK},
+  title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
+  year = {2024},
+  publisher = {Hugging Face},
+  journal = {Hugging Face Model Hub},
+  howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
+}
+```
+## License
+This model is released under the MIT License.