Spaces:
Running
Running
moss-moon-003-base 模型的 tokenizer 中,eos token
为 <|endoftext|>
,在训练SFT模型时需要将该 token 指定为 <eom>
token.
SFT 阶段
<eoh>
: end of human<eot>
: end of thoughts<eoc>
: end of commands<eom>
: end of moss
注意
moss的
def convert_tokens_to_string(self, tokens):
"""Converts a sequence of tokens (string) in a single string."""
text = "".join(tokens)
text = bytearray([self.byte_decoder[c] for c in text]).decode("utf-8", errors=self.errors)
return text