pythia-70m-hybrid / README.md
MatteoKhan's picture
Update README.md
42723d9 verified
---
license: mit
language:
- en
base_model:
- EleutherAI/pythia-70m
- EleutherAI/pythia-70m-deduped
library_name: transformers
tags:
- mergekit
- merged-model
- pythia
- language-model
---
# πŸš€ Pythia-Hybrid-140M: Merging Efficiency & Power
## πŸ“Œ Overview
**Pythia-Hybrid-140M** is an **experimental hybrid language model** that merges the capabilities of two Pythia variants. Built using **MergeKit**, this model is designed to balance performance and efficiency while offering strong text generation capabilities.
πŸ”— **Created by**: Matteo Khan
πŸŽ“ **Affiliation**: Apprentice at TW3 Partners (Generative AI Research)
πŸ“ **License**: MIT
πŸ”— [Connect with me on LinkedIn](https://www.linkedin.com/in/matteo-khan-a10309263/)
πŸ” [Model on Hugging Face](https://huggingface.co/MatteoKhan/Pythia-Hybrid-140M)
## 🧠 Model Details
- **Model Type**: Hybrid Language Model (Merged)
- **Parent Models**:
- [Pythia-70M](https://huggingface.co/EleutherAI/pythia-70m)
- [Pythia-70M-Deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped)
- **Merging Technique**: Linear Merge (MergeKit)
## 🎯 Intended Use
This model is primarily intended for **research and experimentation** in hybrid model optimization. Potential use cases include:
- βœ… Text Generation
- βœ… Conversational AI
- βœ… Creative Writing Assistance
- βœ… Exploration of Model Merging Effects
## ⚠️ Limitations & Considerations
While **Pythia-Hybrid-140M** offers enhanced capabilities, it also inherits certain limitations from its parent models:
- ❌ May generate **inaccurate or misleading** information
- ⚠️ Potential for **biased, offensive, or harmful** content
- πŸ”„ Merging may introduce **unpredictable behaviors**
- πŸ“‰ Performance may **vary across different tasks**
## πŸ”¬ Merging Process & Configuration
This is **not a newly trained model**, but rather a merge of existing models using the following configuration:
```yaml
merge_method: linear
dtype: float16
models:
- model: "EleutherAI/pythia-70m"
parameters:
t: 1.0
weight: 0.5
- model: "EleutherAI/pythia-70m-deduped"
parameters:
t: 1.0
weight: 0.5
parameters:
normalize: true
int8_mask: false
layers:
- pattern: "model.*"
```
πŸ“Š **No formal evaluation** has been conducted yet. Users are encouraged to **benchmark and share feedback**!
## 🌍 Environmental Impact
By utilizing **model merging** rather than training from scratch, **Pythia-Hybrid-140M** significantly reduces computational and environmental costs.
## πŸš€ How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MatteoKhan/Pythia-Hybrid-140M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Example usage
prompt = "Write a short poem about artificial intelligence."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
**πŸ“ Pythia-70M**
```bibtex
@misc{biderman2023pythia,
title={Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling},
author={Stella Biderman et al.},
year={2023},
eprint={2304.01373},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
πŸ“© **Feedback & Contact**: Reach out via [Hugging Face](https://huggingface.co/MatteoKhan).
πŸŽ‰ **Happy Experimenting!** πŸš€