70B-L3.3-Cirrus-x1 / README.md
Sao10K's picture
Update README.md
7d8b02e verified
|
raw
history blame
951 Bytes
metadata
library_name: transformers
base_model:
  - meta-llama/Llama-3.3-70B-Instruct
tags:
  - generated_from_trainer
model-index:
  - name: 70B-L3.3-Cirrus-x1
    results: []
license: llama3.3

yeah my mental when things do not go well

70B-L3.3-Cirrus-x1

I quite liked it, after messing around. Same data composition as Freya, applied differently.

Has occasional brainfarts which are fixed with a regen, the price for more creative outputs.

Recommended Model Settings | Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.

Prompt Format: Llama-3-Instruct
Temperature: 1.1
min_p: 0.05
Training time in total was ~22 Hours on a 8xH100 Node.
Then, ~3 Hours spent merging checkpoints and model experimentation on a 2xH200 Node.

https://sao10k.carrd.co/ for contact.