Resources

View closed (2)

Adding `safetensors` variant of this model

#19 opened 9 months ago by

SFconvertbot

Adding Evaluation Results

#18 opened 10 months ago by

leaderboard-pr-bot

any plans for mixtral 128k?

#17 opened 10 months ago by

sirus

Transformers fix to mixed precision at long context lengths

#16 opened about 1 year ago by

nbroad

How much computation power(like gpus and gpu hour) you guys needed to finetune this?

#15 opened about 1 year ago by

zohadev

Yarn-StableLM-Epoch?

#14 opened about 1 year ago by

KnutJaegersberg

Instruction finetuning and train script, QLORA etc.

#13 opened about 1 year ago by

aamir1122a

Add widget examples

#11 opened about 1 year ago by

mishig

Using this model with Vllm

#10 opened about 1 year ago by

haltux

Can't deploy to any provider an inference endpoint

#9 opened about 1 year ago by

ejkkan

Pretraining from scratch?

#8 opened about 1 year ago by

MengboZhou

Fine-tuned with all parameters？

#6 opened about 1 year ago by

MengboZhou

VRAM usage for full 128k tokens

#5 opened about 1 year ago by

Hypersniper

sliding_window = 131072? Sliding window attention doesn't work for 128?

#4 opened about 1 year ago by

keyishen

smaller shards, pls

#2 opened about 1 year ago by

lskywalker

Instruct Version?

#1 opened about 1 year ago by

mrfakename