Stop condition
#45
by
GemZq
- opened
The stopping condition is dictated by the model, which should learn when to output an end-of-sequence (EOS) token.But, the 8B-chinese-chat model generition output is repeating.Is it bug in my code ?
inputs = tokenizer([prompt], return_tensors="pt")
# Despite returning the usual output, the streamer will also print the generated text to stdout.
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=8192,
do_sample=True,
temperature=0.6,
top_p=0.9,
cache_implementation="quantized",
cache_config={"nbits": 4, "backend": "quanto"})