Spaces:
Running
Running
moss-moon-003-base 模型的 tokenizer 中,`eos token` 为 `<|endoftext|>`,在训练SFT模型时需要将该 token 指定为 `<eom>` token. | |
## SFT 阶段 | |
- `<eoh>`: end of human | |
- `<eot>`: end of thoughts | |
- `<eoc>`: end of commands | |
- `<eom>`: end of moss | |
## 注意 | |
moss的 | |
```py | |
def convert_tokens_to_string(self, tokens): | |
"""Converts a sequence of tokens (string) in a single string.""" | |
text = "".join(tokens) | |
text = bytearray([self.byte_decoder[c] for c in text]).decode("utf-8", errors=self.errors) | |
return text | |
``` | |
## troubleshooting | |