File size: 2,314 Bytes
0a064fd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# Turkish QNLI Model
I fine-tuned Turkish-Bert-Model for Question-Answering problem with Turkish version of SQuAD; TQuAD
https://huggingface.co/dbmdz/bert-base-turkish-uncased
# Data: TQuAD
I used following TQuAD data set
https://github.com/TQuad/turkish-nlp-qa-dataset
I convert the dataset into transformers glue data format of QNLI by the following script
SQuAD -> QNLI
```
import argparse
import collections
import json
import numpy as np
import os
import re
import string
import sys
ff="dev-v0.1.json"
ff="train-v0.1.json"
dataset=json.load(open(ff))
i=0
for article in dataset['data']:
title= article['title']
for p in article['paragraphs']:
context= p['context']
for qa in p['qas']:
answer= qa['answers'][0]['text']
all_other_answers= list(set([e['answers'][0]['text'] for e in p['qas']]))
all_other_answers.remove(answer)
i=i+1
print(i,qa['question'].replace(";",":") , answer.replace(";",":"),"entailment", sep="\t")
for other in all_other_answers:
i=i+1
print(i,qa['question'].replace(";",":") , other.replace(";",":"),"not_entailment" ,sep="\t")
```
Under QNLI folder there are dev and test test
Training data looks like
> 613 II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir? filozof, kimyacı, astrolog ve çevirmen not_entailment
> 614 II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir? kişisel eğilimi ve özel temaslar nedeniyle not_entailment
> 615 Michael Scotus’un mesleği nedir? filozof, kimyacı, astrolog ve çevirmen entailment
> 616 Michael Scotus’un mesleği nedir? Palermo’ya not_entailment
# Training
Training the model with following environment
```
export GLUE_DIR=./glue/glue_dataTR/QNLI
export TASK_NAME=QNLI
```
```
python3 run_glue.py \
--model_type bert \
--model_name_or_path dbmdz/bert-base-turkish-uncased\
--task_name $TASK_NAME \
--do_train \
--do_eval \
--data_dir $GLUE_DIR \
--max_seq_length 128 \
--per_gpu_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/
```
# Evaluation Results
==
| acc | 0.9124060613527165
| loss| 0.21582801340189717
==
> See all my model
> https://huggingface.co/savasy
|