Adding `safetensors` variant of this model
#19 opened 9 months ago
by
SFconvertbot
Adding Evaluation Results
#18 opened 10 months ago
by
leaderboard-pr-bot
any plans for mixtral 128k?
#17 opened 10 months ago
by
sirus
Transformers fix to mixed precision at long context lengths
1
#16 opened about 1 year ago
by
nbroad
How much computation power(like gpus and gpu hour) you guys needed to finetune this?
1
#15 opened about 1 year ago
by
zohadev
Yarn-StableLM-Epoch?
#14 opened about 1 year ago
by
KnutJaegersberg
Instruction finetuning and train script, QLORA etc.
#13 opened about 1 year ago
by
aamir1122a
Add widget examples
#11 opened about 1 year ago
by
mishig
Using this model with Vllm
1
#10 opened about 1 year ago
by
haltux
Can't deploy to any provider an inference endpoint
2
#9 opened about 1 year ago
by
ejkkan
Pretraining from scratch?
#8 opened about 1 year ago
by
MengboZhou
Fine-tuned with all parameters?
1
#6 opened about 1 year ago
by
MengboZhou
VRAM usage for full 128k tokens
7
#5 opened about 1 year ago
by
Hypersniper
sliding_window = 131072? Sliding window attention doesn't work for 128?
1
#4 opened about 1 year ago
by
keyishen
smaller shards, pls
#2 opened about 1 year ago
by
lskywalker
Instruct Version?
8
#1 opened about 1 year ago
by
mrfakename