--- language: en tags: - question-answering - squad - gpt2 - fine-tuned license: mit --- # ChatMachine_v1: GPT-2 Fine-tuned on SQuAD This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information. ## Model Description - **Base Model**: GPT-2 (124M parameters) - **Training Data**: Stanford Question Answering Dataset (SQuAD) - **Task**: Question Answering - **Framework**: PyTorch with Hugging Face Transformers ## Training Details The model was fine-tuned using: - Mixed precision training (bfloat16) - Learning rate: 2e-5 - Batch size: 16 - Gradient accumulation steps: 8 - Warmup steps: 1000 - Weight decay: 0.1 ## Usage ```python from transformers import GPT2LMHeadModel, GPT2Tokenizer # Load model and tokenizer model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1") tokenizer = GPT2Tokenizer.from_pretrained("gpt2") tokenizer.pad_token = tokenizer.eos_token # Format your input context = "Paris is the capital and largest city of France." question = "What is the capital of France?" input_text = f"Context: {context} Question: {question} Answer:" # Generate answer inputs = tokenizer(input_text, return_tensors="pt", padding=True) outputs = model.generate( **inputs, max_new_tokens=50, temperature=0.3, do_sample=True, top_p=0.9, num_beams=4, early_stopping=True, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) # Extract answer answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip() print(f"Answer: {answer}") ``` ## Performance and Limitations The model performs best with: - Simple, focused questions - Clear, concise context - Factual questions (who, what, when, where) Limitations: - May struggle with complex, multi-part questions - Performance depends on the clarity and relevance of the provided context - Best suited for short, focused answers rather than lengthy explanations ## Example Questions ```python test_cases = [ { "context": "George Washington was the first president of the United States, serving from 1789 to 1797.", "question": "Who was the first president of the United States?" }, { "context": "The brain uses approximately 20 percent of the body's total energy consumption.", "question": "How much of the body's energy does the brain use?" } ] ``` Expected outputs: - "George Washington" - "20 percent" ## Training Infrastructure The model was trained on an RTX 4090 GPU using: - PyTorch with CUDA optimizations - Mixed precision training (bfloat16) - Gradient accumulation for effective batch size scaling ## Citation If you use this model, please cite: ```bibtex @misc{chatmachine_v1, author = {Houcine BDK}, title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}} } ``` ## License This model is released under the MIT License.