AIBunCho commited on
Commit
f8f0425
1 Parent(s): b9445e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -17,7 +17,55 @@ GPT-J-6BをTPUで2週間日本語tokenizerを用いて日本語データで事
17
 
18
  ## Uses
19
 
 
 
 
 
 
20
  ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ```
23
 
 
17
 
18
  ## Uses
19
 
20
+ ```
21
+ pip install transformers sentencepiece accelerate
22
+ ```
23
+
24
+
25
  ```python
26
+ from transformers import GPTJForCausalLM, AlbertTokenizer
27
+ import torch
28
+
29
+ tokenizer = AlbertTokenizer.from_pretrained('AIBunCho/japanese-novel-gpt-j-6b', keep_accents=True, remove_space=False, use_auth_token="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")
30
+
31
+ model = GPTJForCausalLM.from_pretrained("AIBunCho/japanese-novel-gpt-j-6b", torch_dtype=torch.float16, low_cpu_mem_usage=True, use_auth_token="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")
32
+
33
+ model.half()
34
+ model.eval()
35
+
36
+ if torch.cuda.is_available():
37
+ model = model.to("cuda")
38
+
39
+ prompt = """
40
+ わたくしといふ現象は
41
+ """.strip()
42
+
43
+ input_ids = tokenizer.encode(
44
+ prompt,
45
+ add_special_tokens=False,
46
+ return_tensors="pt"
47
+ ).cuda()
48
+
49
+ # this is for reproducibility.
50
+ # feel free to change to get different result
51
+ seed = 27
52
+ torch.manual_seed(seed)
53
+
54
+ tokens = model.generate(
55
+ input_ids.to(device=model.device),
56
+ max_new_tokens=32,
57
+ temperature=0.6,
58
+ top_p=0.9,
59
+ repetition_penalty=1.2,
60
+ do_sample=True,
61
+ pad_token_id=tokenizer.pad_token_id,
62
+ bos_token_id=tokenizer.bos_token_id,
63
+ eos_token_id=tokenizer.eos_token_id
64
+ )
65
+
66
+ out = tokenizer.decode(tokens[0], skip_special_tokens=True)
67
+ print(out)
68
+ """わたくしといふ現象は、その因果律を断ち切ることができるのです。"""
69
 
70
  ```
71