|
--- |
|
license: mit |
|
tags: |
|
- mamba |
|
- pytorch |
|
- Test Generation |
|
- research abstract |
|
datasets: pt-sk/research_papers_short |
|
metrics: CrossEntropyLoss |
|
--- |
|
This model uses Mamba Architecture trained on a research abstract dataset. |
|
|
|
* Optimizer: AdamW |
|
* Leanring Rate: 0.001 |
|
|
|
|
|
|
|
Import the scripts from the code folder |
|
``` |
|
from model import Mamba, ModelArgs |
|
``` |
|
|
|
|
|
Loading Model |
|
``` |
|
mamba_model = Mamba.from_pretrained("pt-sk/mamba").to("cuda") |
|
``` |
|
|
|
|
|
Loading Tokenizer |
|
``` |
|
tokenizer = AutoTokenizer.from_pretrained('pt-sk/mamba') |
|
``` |
|
|
|
mamba_reserach file contains the state dict of optimizer and the model. |