chatMachine_v1 / README.md
houcine-bdk's picture
Upload folder using huggingface_hub
27581ce verified
metadata
language: en
tags:
  - question-answering
  - squad
  - gpt2
  - fine-tuned
license: mit

ChatMachine_v1: GPT-2 Fine-tuned on SQuAD

This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.

Model Description

  • Base Model: GPT-2 (124M parameters)
  • Training Data: Stanford Question Answering Dataset (SQuAD)
  • Task: Question Answering
  • Framework: PyTorch with Hugging Face Transformers

Training Details

The model was fine-tuned using:

  • Mixed precision training (bfloat16)
  • Learning rate: 2e-5
  • Batch size: 16
  • Gradient accumulation steps: 8
  • Warmup steps: 1000
  • Weight decay: 0.1

Usage

from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Format your input
context = "Paris is the capital and largest city of France."
question = "What is the capital of France?"
input_text = f"Context: {context} Question: {question} Answer:"

# Generate answer
inputs = tokenizer(input_text, return_tensors="pt", padding=True)
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.3,
    do_sample=True,
    top_p=0.9,
    num_beams=4,
    early_stopping=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

# Extract answer
answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
print(f"Answer: {answer}")

Performance and Limitations

The model performs best with:

  • Simple, focused questions
  • Clear, concise context
  • Factual questions (who, what, when, where)

Limitations:

  • May struggle with complex, multi-part questions
  • Performance depends on the clarity and relevance of the provided context
  • Best suited for short, focused answers rather than lengthy explanations

Example Questions

test_cases = [
    {
        "context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
        "question": "Who was the first president of the United States?"
    },
    {
        "context": "The brain uses approximately 20 percent of the body's total energy consumption.",
        "question": "How much of the body's energy does the brain use?"
    }
]

Expected outputs:

  • "George Washington"
  • "20 percent"

Training Infrastructure

The model was trained on an RTX 4090 GPU using:

  • PyTorch with CUDA optimizations
  • Mixed precision training (bfloat16)
  • Gradient accumulation for effective batch size scaling

Citation

If you use this model, please cite:

@misc{chatmachine_v1,
  author = {Houcine BDK},
  title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
}

License

This model is released under the MIT License.