houcine-bdk commited on
Commit
27581ce
verified
1 Parent(s): 9bbe442

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +110 -69
README.md CHANGED
@@ -1,80 +1,121 @@
1
  ---
2
  language: en
3
  tags:
4
- - pytorch
 
5
  - gpt2
6
- - text-generation
7
- - nanoGPT
8
  license: mit
9
- datasets:
10
- - custom
11
- model-index:
12
- - name: chatMachineProto
13
- results: []
14
  ---
15
 
16
- # NanoGPT Personal Experiment
17
 
18
- This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
19
 
20
  ## Model Description
21
 
22
- The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
23
-
24
- ### Technical Details
25
-
26
- - Base Architecture: GPT-2
27
- - Training Infrastructure: 8x A100 80GB GPUs
28
- - Parameters: ~124M (similar to GPT-2 small)
29
-
30
- ### Training Process
31
-
32
- The model underwent a multi-stage training process:
33
- 1. Initial training on a subset of the OpenWebText dataset
34
- 2. Experimentation with different hyperparameters and optimization techniques
35
-
36
- ### Features
37
-
38
- - Clean, minimal implementation of the GPT architecture
39
- - Efficient training utilizing modern GPU capabilities
40
- - Configurable generation parameters (temperature, top-k sampling)
41
- - Support for both direct text generation and interactive chat
42
-
43
- ## Use Cases
44
-
45
- This model is primarily an experimental project and can be used for:
46
- - Educational purposes to understand transformer architectures
47
- - Text generation experiments
48
- - Research into language model behavior
49
- - Interactive chat experiments
50
-
51
- ## Limitations
52
-
53
- As this is a personal experiment, please note:
54
- - The model may produce inconsistent or incorrect outputs
55
- - It's not intended for production use
56
- - Responses may be unpredictable or contain biases
57
- - Performance may vary significantly depending on the input
58
-
59
- ## Development Context
60
-
61
- This project was developed as part of my personal exploration into AI/ML, specifically focusing on:
62
- - Understanding transformer architectures
63
- - Learning about large-scale model training
64
- - Experimenting with different training approaches
65
- - Gaining hands-on experience with modern AI infrastructure
66
-
67
- ## Acknowledgments
68
-
69
- This project builds upon the excellent work of:
70
- - The original GPT-2 paper by OpenAI
71
- - The nanoGPT implementation by Andrej Karpathy
72
- - The broader open-source AI community
73
-
74
- ## Disclaimer
75
-
76
- This is a personal experimental project and should be treated as such. It's not intended for production use or as a replacement for more established language models. The primary goal was learning and experimentation.
77
-
78
- ---
79
-
80
- Feel free to explore the model and provide feedback. Remember that this is an experimental project, and results may vary significantly from more established models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language: en
3
  tags:
4
+ - question-answering
5
+ - squad
6
  - gpt2
7
+ - fine-tuned
 
8
  license: mit
 
 
 
 
 
9
  ---
10
 
11
+ # ChatMachine_v1: GPT-2 Fine-tuned on SQuAD
12
 
13
+ This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.
14
 
15
  ## Model Description
16
 
17
+ - **Base Model**: GPT-2 (124M parameters)
18
+ - **Training Data**: Stanford Question Answering Dataset (SQuAD)
19
+ - **Task**: Question Answering
20
+ - **Framework**: PyTorch with Hugging Face Transformers
21
+
22
+ ## Training Details
23
+
24
+ The model was fine-tuned using:
25
+ - Mixed precision training (bfloat16)
26
+ - Learning rate: 2e-5
27
+ - Batch size: 16
28
+ - Gradient accumulation steps: 8
29
+ - Warmup steps: 1000
30
+ - Weight decay: 0.1
31
+
32
+ ## Usage
33
+
34
+ ```python
35
+ from transformers import GPT2LMHeadModel, GPT2Tokenizer
36
+
37
+ # Load model and tokenizer
38
+ model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
39
+ tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
40
+ tokenizer.pad_token = tokenizer.eos_token
41
+
42
+ # Format your input
43
+ context = "Paris is the capital and largest city of France."
44
+ question = "What is the capital of France?"
45
+ input_text = f"Context: {context} Question: {question} Answer:"
46
+
47
+ # Generate answer
48
+ inputs = tokenizer(input_text, return_tensors="pt", padding=True)
49
+ outputs = model.generate(
50
+ **inputs,
51
+ max_new_tokens=50,
52
+ temperature=0.3,
53
+ do_sample=True,
54
+ top_p=0.9,
55
+ num_beams=4,
56
+ early_stopping=True,
57
+ pad_token_id=tokenizer.pad_token_id,
58
+ eos_token_id=tokenizer.eos_token_id,
59
+ )
60
+
61
+ # Extract answer
62
+ answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
63
+ print(f"Answer: {answer}")
64
+ ```
65
+
66
+ ## Performance and Limitations
67
+
68
+ The model performs best with:
69
+ - Simple, focused questions
70
+ - Clear, concise context
71
+ - Factual questions (who, what, when, where)
72
+
73
+ Limitations:
74
+ - May struggle with complex, multi-part questions
75
+ - Performance depends on the clarity and relevance of the provided context
76
+ - Best suited for short, focused answers rather than lengthy explanations
77
+
78
+ ## Example Questions
79
+
80
+ ```python
81
+ test_cases = [
82
+ {
83
+ "context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
84
+ "question": "Who was the first president of the United States?"
85
+ },
86
+ {
87
+ "context": "The brain uses approximately 20 percent of the body's total energy consumption.",
88
+ "question": "How much of the body's energy does the brain use?"
89
+ }
90
+ ]
91
+ ```
92
+
93
+ Expected outputs:
94
+ - "George Washington"
95
+ - "20 percent"
96
+
97
+ ## Training Infrastructure
98
+
99
+ The model was trained on an RTX 4090 GPU using:
100
+ - PyTorch with CUDA optimizations
101
+ - Mixed precision training (bfloat16)
102
+ - Gradient accumulation for effective batch size scaling
103
+
104
+ ## Citation
105
+
106
+ If you use this model, please cite:
107
+
108
+ ```bibtex
109
+ @misc{chatmachine_v1,
110
+ author = {Houcine BDK},
111
+ title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
112
+ year = {2024},
113
+ publisher = {Hugging Face},
114
+ journal = {Hugging Face Model Hub},
115
+ howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
116
+ }
117
+ ```
118
+
119
+ ## License
120
+
121
+ This model is released under the MIT License.