recruit-jp
/

japanese-typo-detector-roberta-base

Token Classification

Inference Endpoints

Model card Files Files and versions Community

keisuke-kiryu commited on Nov 17, 2023

Commit

7665c5f

·

1 Parent(s): ba37e17

Update README.md

Files changed (1) hide show

README.md +50 -20

README.md CHANGED Viewed

@@ -22,27 +22,57 @@ widget:
 # モデルの使い方
   ```python
-    from transformers import AutoTokenizer,AutoModelForTokenClassification
-    model_name('recruit-jp/japanese-typo-detector-roberta-base')
-    tokenizer = AutoTokenizer.from_pretrained(model_name)
-    model = AutoModelForTokenClassification.from_pretrained(model_name)
-    device = "cuda:0" if torch.cuda.is_available() else "cpu"
-    model = model.to(device)
-    in_text = "これは日本語の誤植を検出する真相学習モデルです。"
-    test_inputs = tokenizer(in_text, return_tensors='pt').get('input_ids')
-    test_outputs = model(test_inputs.to(torch.device(device)))
-    for chara, logit in zip(["[CLS]"] + list(in_text) + ["[SEP]"], test_outputs.logits.squeeze().tolist()):
-    err_type_ind = np.argmax(logit)
-    err_name = model.config.id2label[err_type_ind]
-    err_desc = f"★誤字(err_index={err_type_ind}, err_name={err_name})" if err_type_ind > 0 else f""
-    print(f"{chara} : {err_desc}")
   ```
 # 学習データ

 # モデルの使い方
+## サンプルコード
   ```python
+  from transformers import AutoTokenizer,AutoModelForTokenClassification
+  import torch
+  import numpy as np
+  model_name = 'recruit-jp/japanese-typo-detector-roberta-base'
+  tokenizer = AutoTokenizer.from_pretrained(model_name)
+  model = AutoModelForTokenClassification.from_pretrained(model_name)
+  device = "cuda:0" if torch.cuda.is_available() else "cpu"
+  model = model.to(device)
+  in_text = "これは日本語の誤植を検出する真相学習モデルです。"
+  test_inputs = tokenizer(in_text, return_tensors='pt').get('input_ids')
+  test_outputs = model(test_inputs.to(torch.device(device)))
+  for chara, logit in zip(list(in_text), test_outputs.logits.squeeze().tolist()[1:-1]):
+      err_type_ind = np.argmax(logit)
+      err_name = model.config.id2label[err_type_ind]
+      err_desc = f"Detected!(err_index={err_type_ind}, err_name={err_name})" if err_type_ind > 0 else f""
+      print(f"{chara} : {err_desc}")
+  ```
+## サンプルコードの出力例
+  ```
+  こ :
+  れ :
+  は :
+  日 :
+  本 :
+  語 :
+  の :
+  誤 :
+  植 :
+  を :
+  検 :
+  出 :
+  す :
+  る :
+  真 : Detected!(err_index=4, err_name=kanji-conversion_a)
+  相 : Detected!(err_index=4, err_name=kanji-conversion_a)
+  学 :
+  習 :
+  モ :
+  デ :
+  ル :
+  で :
+  す :
+  。 :
   ```
 # 学習データ