|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- EleutherAI/pythia-70m |
|
- EleutherAI/pythia-70m-deduped |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merged-model |
|
- pythia |
|
- language-model |
|
--- |
|
|
|
# π Pythia-Hybrid-140M: Merging Efficiency & Power |
|
|
|
## π Overview |
|
**Pythia-Hybrid-140M** is an **experimental hybrid language model** that merges the capabilities of two Pythia variants. Built using **MergeKit**, this model is designed to balance performance and efficiency while offering strong text generation capabilities. |
|
|
|
π **Created by**: Matteo Khan |
|
π **Affiliation**: Apprentice at TW3 Partners (Generative AI Research) |
|
π **License**: MIT |
|
|
|
π [Connect with me on LinkedIn](https://www.linkedin.com/in/matteo-khan-a10309263/) |
|
π [Model on Hugging Face](https://huggingface.co/MatteoKhan/Pythia-Hybrid-140M) |
|
|
|
## π§ Model Details |
|
- **Model Type**: Hybrid Language Model (Merged) |
|
- **Parent Models**: |
|
- [Pythia-70M](https://huggingface.co/EleutherAI/pythia-70m) |
|
- [Pythia-70M-Deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped) |
|
- **Merging Technique**: Linear Merge (MergeKit) |
|
|
|
## π― Intended Use |
|
This model is primarily intended for **research and experimentation** in hybrid model optimization. Potential use cases include: |
|
- β
Text Generation |
|
- β
Conversational AI |
|
- β
Creative Writing Assistance |
|
- β
Exploration of Model Merging Effects |
|
|
|
## β οΈ Limitations & Considerations |
|
While **Pythia-Hybrid-140M** offers enhanced capabilities, it also inherits certain limitations from its parent models: |
|
- β May generate **inaccurate or misleading** information |
|
- β οΈ Potential for **biased, offensive, or harmful** content |
|
- π Merging may introduce **unpredictable behaviors** |
|
- π Performance may **vary across different tasks** |
|
|
|
## π¬ Merging Process & Configuration |
|
This is **not a newly trained model**, but rather a merge of existing models using the following configuration: |
|
|
|
```yaml |
|
merge_method: linear |
|
dtype: float16 |
|
models: |
|
- model: "EleutherAI/pythia-70m" |
|
parameters: |
|
t: 1.0 |
|
weight: 0.5 |
|
- model: "EleutherAI/pythia-70m-deduped" |
|
parameters: |
|
t: 1.0 |
|
weight: 0.5 |
|
parameters: |
|
normalize: true |
|
int8_mask: false |
|
layers: |
|
- pattern: "model.*" |
|
``` |
|
|
|
π **No formal evaluation** has been conducted yet. Users are encouraged to **benchmark and share feedback**! |
|
|
|
## π Environmental Impact |
|
By utilizing **model merging** rather than training from scratch, **Pythia-Hybrid-140M** significantly reduces computational and environmental costs. |
|
|
|
## π How to Use |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "MatteoKhan/Pythia-Hybrid-140M" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
# Example usage |
|
prompt = "Write a short poem about artificial intelligence." |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=200) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
**π Pythia-70M** |
|
```bibtex |
|
@misc{biderman2023pythia, |
|
title={Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling}, |
|
author={Stella Biderman et al.}, |
|
year={2023}, |
|
eprint={2304.01373}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
π© **Feedback & Contact**: Reach out via [Hugging Face](https://huggingface.co/MatteoKhan). |
|
|
|
π **Happy Experimenting!** π |