Upload folder using huggingface_hub

5e62489 verified about 1 month ago

4.26 kB

	# GPQA Generator: Fine-tuned Gemma 2B for Google GPQA (Graduate-Level Google-Proof Q&A Benchmark) dataset

	## Model Details

	- Model Type: Language Model
	- Base Model: unsloth/gemma-2-2b-bnb-4bit
	- Fine-tuned by: [Your Organization/Name]
	- License: [Specify the license]

	This model is a fine-tuned version of the Gemma 2B base model, specifically tailored to Google GPQA (Graduate-Level Google-Proof Q&A Benchmark) dataset. It produces graduate-level, context-rich multiple-choice questions along with one correct answer, three incorrect answers, and an explanation.

	## Intended Use

	This model is designed for educational content creators, assessment developers, and researchers who need to generate complex, Google-proof multiple-choice questions across various academic disciplines.

	### Primary Use Cases:

	- Generating challenging assessment questions for advanced students
	- Creating content for educational platforms and applications
	- Assisting in the development of standardized tests
	- Supporting research in question generation and educational assessment

	## How to Use

	### Setting Up

	1. Clone the repository containing the model and scripts.
	2. Ensure you have the required dependencies installed (httpx, transformers, etc.).

	### Running the Model

	1. Start the vLLM server:
	```
	./run_vllm_2b.sh
	```

	2. Generate questions using the `generate.py` script:

	For a single category:
	```
	python generate.py --category "Your Category" --depth 4
	```

	To use predefined categories:
	```
	python generate.py --use-array --depth 4
	```

	### Configuration

	- Modify the `CATEGORIES_TO_PROCESS` list in the script to add or change predefined categories.
	- Adjust the `max_depth` parameter to control the depth of subcategory exploration.
	- The script uses multi-threading for efficient processing. Adjust `num_threads` in `process_categories()` if needed.

	## Sample Output

	Here's an example of the generated output:

	```json
	{
	"question": "A developer is working on a large project that uses Mercurial version control. They need to merge a branch containing bug fixes from another team. What is the recommended approach to avoid merging conflicts?",
	"answer": "Create a new branch from the source directory.",
	"incorrect_answer_1": "Merge the branches directly.",
	"incorrect_answer_2": "Skip the merge process altogether.",
	"incorrect_answer_3": "Use a third-party tool like Git.",
	"explanation": "Merging branches in Mercurial requires careful consideration to avoid conflicts. Here's a breakdown of the reasoning: \n1. Branch Creation: Creating a new branch allows the developer to isolate the changes from the other team without affecting the base branch.\n2. Conflict Detection: Comparing the histories of the branches helps identify potential conflicts that may arise during the merge.\n3. Conflict Resolution: Manual conflict resolution is essential to ensure the merge is successful. Mercurial provides tools like \"diff\" and \"merge\" commands for this purpose.\n4. Committing Changes: Once the merge is complete, the developer should commit the changes to their new branch.",
	"subcategories": ["Version Control", "Mercurial", "Merge Conflicts"],
	"category": "Mercurial",
	"depth": 0
	}
	```

	## Limitations

	- The model generates questions based on its training data, which may not always reflect the most current information in rapidly evolving fields.
	- While designed to be "Google-proof," the effectiveness may vary depending on the specific topic and how information is presented online.
	- The quality and accuracy of generated questions should be reviewed by subject matter experts before use in formal assessments.

	## Ethical Considerations

	- Users should be aware of potential biases in the generated content and review questions for fairness and inclusivity.
	- The model should not be used to generate misleading or factually incorrect information.
	- Respect copyright and intellectual property rights when using generated content.

	## Citation

	If you use this model in your research or applications, please cite it as follows:

	```
	[Citation information to be added]
	```

	## Contact

	For questions, feedback, or support, please contact [Your Contact Information].