|
# GPQA Generator: Fine-tuned Gemma 2B for Google GPQA (Graduate-Level Google-Proof Q&A Benchmark) dataset |
|
|
|
## Model Details |
|
|
|
- **Model Type:** Language Model |
|
- **Base Model:** unsloth/gemma-2-2b-bnb-4bit |
|
- **Fine-tuned by:** [Your Organization/Name] |
|
- **License:** [Specify the license] |
|
|
|
This model is a fine-tuned version of the Gemma 2B base model, specifically tailored to Google GPQA (Graduate-Level Google-Proof Q&A Benchmark) dataset. It produces graduate-level, context-rich multiple-choice questions along with one correct answer, three incorrect answers, and an explanation. |
|
|
|
## Intended Use |
|
|
|
This model is designed for educational content creators, assessment developers, and researchers who need to generate complex, Google-proof multiple-choice questions across various academic disciplines. |
|
|
|
### Primary Use Cases: |
|
|
|
- Generating challenging assessment questions for advanced students |
|
- Creating content for educational platforms and applications |
|
- Assisting in the development of standardized tests |
|
- Supporting research in question generation and educational assessment |
|
|
|
## How to Use |
|
|
|
### Setting Up |
|
|
|
1. Clone the repository containing the model and scripts. |
|
2. Ensure you have the required dependencies installed (httpx, transformers, etc.). |
|
|
|
### Running the Model |
|
|
|
1. Start the vLLM server: |
|
``` |
|
./run_vllm_2b.sh |
|
``` |
|
|
|
2. Generate questions using the `generate.py` script: |
|
|
|
For a single category: |
|
``` |
|
python generate.py --category "Your Category" --depth 4 |
|
``` |
|
|
|
To use predefined categories: |
|
``` |
|
python generate.py --use-array --depth 4 |
|
``` |
|
|
|
### Configuration |
|
|
|
- Modify the `CATEGORIES_TO_PROCESS` list in the script to add or change predefined categories. |
|
- Adjust the `max_depth` parameter to control the depth of subcategory exploration. |
|
- The script uses multi-threading for efficient processing. Adjust `num_threads` in `process_categories()` if needed. |
|
|
|
## Sample Output |
|
|
|
Here's an example of the generated output: |
|
|
|
```json |
|
{ |
|
"question": "A developer is working on a large project that uses Mercurial version control. They need to merge a branch containing bug fixes from another team. What is the recommended approach to avoid merging conflicts?", |
|
"answer": "Create a new branch from the source directory.", |
|
"incorrect_answer_1": "Merge the branches directly.", |
|
"incorrect_answer_2": "Skip the merge process altogether.", |
|
"incorrect_answer_3": "Use a third-party tool like Git.", |
|
"explanation": "Merging branches in Mercurial requires careful consideration to avoid conflicts. Here's a breakdown of the reasoning: \n1. Branch Creation: Creating a new branch allows the developer to isolate the changes from the other team without affecting the base branch.\n2. Conflict Detection: Comparing the histories of the branches helps identify potential conflicts that may arise during the merge.\n3. Conflict Resolution: Manual conflict resolution is essential to ensure the merge is successful. Mercurial provides tools like \"diff\" and \"merge\" commands for this purpose.\n4. Committing Changes: Once the merge is complete, the developer should commit the changes to their new branch.", |
|
"subcategories": ["Version Control", "Mercurial", "Merge Conflicts"], |
|
"category": "Mercurial", |
|
"depth": 0 |
|
} |
|
``` |
|
|
|
## Limitations |
|
|
|
- The model generates questions based on its training data, which may not always reflect the most current information in rapidly evolving fields. |
|
- While designed to be "Google-proof," the effectiveness may vary depending on the specific topic and how information is presented online. |
|
- The quality and accuracy of generated questions should be reviewed by subject matter experts before use in formal assessments. |
|
|
|
## Ethical Considerations |
|
|
|
- Users should be aware of potential biases in the generated content and review questions for fairness and inclusivity. |
|
- The model should not be used to generate misleading or factually incorrect information. |
|
- Respect copyright and intellectual property rights when using generated content. |
|
|
|
## Citation |
|
|
|
If you use this model in your research or applications, please cite it as follows: |
|
|
|
``` |
|
[Citation information to be added] |
|
``` |
|
|
|
## Contact |
|
|
|
For questions, feedback, or support, please contact [Your Contact Information]. |