File size: 4,484 Bytes

aef0fda
 
d181b55
 
 
 
 
0085255
da9ea0c
d181b55
5cff37b
19f8acb
5b3503c
771f88d
5b3503c
8372fdc
5cff37b
f1aac0e
6da9c95
634b1ff
74135d6
 
5cff37b
3fec28b
462f2b7
da9ea0c
462f2b7
2108463
bfa2c07
462f2b7
a9ded59
 
7c21e0f
49f74d2
a0aa790
b26ee2a
49f74d2
0efa5e3
3fec28b
0efa5e3
 
3fec28b
 
 
 
 
 
418f2c0
61f41b6
 
ea2c349
61f41b6
ea2c349
61f41b6
 
 
ea2c349
61f41b6
 
 
 
 
 
60ee2aa
61f41b6
 
 
 
 
ea2c349
61f41b6
ea2c349
61f41b6
 
 
ea2c349
61f41b6
 
 
418f2c0
61f41b6
5b4c442
 
 
bfa2c07
462f2b7

---
license: mit
language: ja
tags:
  - luke
  - sentiment-analysis
  - japanese
  - pytorch
---

# このモデルはLuke-japanese-base-liteをファインチューニングしたものです。
このモデルを用いることで文章がポジティブかネガティブかをLUKEを用いて分類することができます。
夏目漱石さんの文章（こころ、坊ちゃん、三四郎、etc）を日本語極性辞書
（　http://www.cl.ecei.tohoku.ac.jp/Open_Resources-Japanese_Sentiment_Polarity_Dictionary.html　）
を用いてポジティブ・ネガティブ判定したものを教師データとしてモデルの学習を行いました。
使用した教師データから、口語より文語に対して高い正答率となることが期待されます。

# This model is based on Luke-japanese-base-lite
This model is fine-tuned model which besed on studio-ousia/Luke-japanese-base-lite.
This could be able to distinguish between positive and negative content.
This model was fine-tuned by using Natsume Souseki's documents.
For example Kokoro, Bocchan, Sanshiro and so on...

# what is Luke?　Lukeとは？[1] 
LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transformer. LUKE treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. LUKE adopts an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores.

LUKE achieves state-of-the-art results on five popular NLP benchmarks including SQuAD v1.1 (extractive question answering), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), TACRED (relation classification), and Open Entity (entity typing).
luke-japaneseは、単語とエンティティの知識拡張型訓練済み Transformer モデルLUKEの日本語版です。LUKE は単語とエンティティを独立したトークンとして扱い、これらの文脈を考慮した表現を出力します。

# how to use 使い方
ステップ０：pythonとpytorchのインストールとtransformersのアップデート（バージョンが古すぎるとMLukeTokenizerが入っていないため）
update transformers and install python and pytorch

ステップ１：My_luke_model_pn.pthをダウンロードする。Download "My_luke_model_pn.pth"

ステップ２："My_luke_model_pn.pth"のあるディレクトリを入力し、"Mymodel_luke_pn.py"を実行する。execute "Mymodel_luke_pn.py"

出力としてはpre.logitsが得られます。
pre.logitsはtensor[[x, y]]というテンソルになっています。
num = SOFTMAX(pre.logits)にすることで、num[0]がネガティブである確率、num[1]がポジティブである確率を表すようになります。

we could get "pre.logits" as the output.
"pre.logits" is the shape like tensor[[x, y]].
"num = SOFTMAX(pre.logits)"
num[0] will show the probability of negative, num[1] will show the probability of positive.


```python

import torch

from transformers import MLukeTokenizer

from torch import nn 

tokenizer = MLukeTokenizer.from_pretrained('studio-ousia/luke-japanese-base-lite')

model = torch.load('C:\\[My_luke_model_pn.pthのあるディレクトリ]\\My_luke_model_pn.pth')

text=input()

encoded_dict = tokenizer.encode_plus(
                        text,                     
                        return_attention_mask = True,   # Attention masksの作成
                        return_tensors = 'pt',     #  Pytorch tensorsで返す
                )

pre = model(encoded_dict['input_ids'], token_type_ids=None, attention_mask=encoded_dict['attention_mask'])
SOFTMAX=nn.Softmax(dim=0)

num=SOFTMAX(pre.logits[0])

if num[1]>0.5:
    print(str(num[1]))
    print('ポジティブ')

else:
    print(str(num[1]))
    print('ネガティブ')
```

# Acknowledgments　謝辞
Lukeの開発者である山田先生とStudio ousiaさんには感謝いたします。
I would like to thank Mr.Yamada @ikuyamada and Studio ousia @StudioOusia.

# Citation
[1]@inproceedings{yamada2020luke,
  title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention},
  author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto},
  booktitle={EMNLP},
  year={2020}
}