leemeng mkshing commited on
Commit
b06c3e7
โ€ข
1 Parent(s): c5c38b1

fix tokenizer loading to decode digits (#3)

Browse files

- fix tokenizer loading to decode digits (ca2257f10f9916abb571823991a97dbfffcd35b6)


Co-authored-by: Makoto Shing <[email protected]>

Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -42,7 +42,7 @@ Then start generating text with `japanese-stablelm-base-alpha-7b` by using the f
42
  import torch
43
  from transformers import LlamaTokenizer, AutoModelForCausalLM
44
 
45
- tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1")
46
 
47
  model = AutoModelForCausalLM.from_pretrained(
48
  "stabilityai/japanese-stablelm-base-alpha-7b",
@@ -76,7 +76,7 @@ tokens = model.generate(
76
  do_sample=True,
77
  )
78
 
79
- out = tokenizer.decode(tokens[0], skip_special_tokens=False)
80
  print(out)
81
  """
82
  AI ใง็ง‘ๅญฆ็ ”็ฉถใ‚’ๅŠ ้€Ÿใ™ใ‚‹ใซใฏใ€ใƒ‡ใƒผใ‚ฟ้ง†ๅ‹•ๅž‹ๆ–‡ๅŒ–ใŒๅฟ…่ฆใงใ‚ใ‚‹ใ“ใจใ‚‚ๆ˜Žใ‚‰ใ‹ใซใชใฃใฆใใฆใ„ใพใ™ใ€‚็ ”็ฉถใฎใ‚ใ‚‰ใ‚†ใ‚‹ๅด้ขใงใ€ใƒ‡ใƒผใ‚ฟใŒใ‚ˆใ‚Š้‡่ฆใซใชใฃใฆใ„ใ‚‹ใฎใงใ™ใ€‚
 
42
  import torch
43
  from transformers import LlamaTokenizer, AutoModelForCausalLM
44
 
45
+ tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['โ–โ–'])
46
 
47
  model = AutoModelForCausalLM.from_pretrained(
48
  "stabilityai/japanese-stablelm-base-alpha-7b",
 
76
  do_sample=True,
77
  )
78
 
79
+ out = tokenizer.decode(tokens[0], skip_special_tokens=True)
80
  print(out)
81
  """
82
  AI ใง็ง‘ๅญฆ็ ”็ฉถใ‚’ๅŠ ้€Ÿใ™ใ‚‹ใซใฏใ€ใƒ‡ใƒผใ‚ฟ้ง†ๅ‹•ๅž‹ๆ–‡ๅŒ–ใŒๅฟ…่ฆใงใ‚ใ‚‹ใ“ใจใ‚‚ๆ˜Žใ‚‰ใ‹ใซใชใฃใฆใใฆใ„ใพใ™ใ€‚็ ”็ฉถใฎใ‚ใ‚‰ใ‚†ใ‚‹ๅด้ขใงใ€ใƒ‡ใƒผใ‚ฟใŒใ‚ˆใ‚Š้‡่ฆใซใชใฃใฆใ„ใ‚‹ใฎใงใ™ใ€‚