File size: 2,835 Bytes

93d80a8
5a6feb9
 
 
 
 
 
 
 
 
 
93d80a8
 
 
 
b57c5d8
d04ce6e
93d80a8
710fe20
65b39e2
710fe20
b57c5d8
65b39e2
 
93d80a8
65b39e2
 
 
 
 
d04ce6e
 
 
65b39e2
 
d04ce6e
 
65b39e2
 
 
 
 
d04ce6e
710fe20
 
d04ce6e
 
d0837e3

---
language: hu
thumbnail: 
tags:
- question-answering
- bert
widget:
- text: "Melyik folyó szeli ketté Budapestet?"
  context: "Magyarország fővárosát, Budapestet a Duna folyó szeli ketté. A XIX. században épült Lánchíd a dimbes-dombos budai oldalt köti össze a sík Pesttel. A Várdomb oldalában futó siklóval juthatunk fel a budai Óvárosba, ahol a Budapesti Történeti Múzeum egészen a római időkig visszavezetve mutatja be a városi életet. A Szentháromság tér ad otthont a XIII. századi Mátyás-templomnak és a Halászbástya lőtornyainak, amelyekből messzire ellátva gyönyörködhetünk a városban."
- text: "Mivel juthatunk fel az Óvárosba?"
  context: "Magyarország fővárosát, Budapestet a Duna folyó szeli ketté. A XIX. században épült Lánchíd a dimbes-dombos budai oldalt köti össze a sík Pesttel. A Várdomb oldalában futó siklóval juthatunk fel a budai Óvárosba, ahol a Budapesti Történeti Múzeum egészen a római időkig visszavezetve mutatja be a városi életet. A Szentháromság tér ad otthont a XIII. századi Mátyás-templomnak és a Halászbástya lőtornyainak, amelyekből messzire ellátva gyönyörködhetünk a városban."
---

## MODEL DESCRIPTION

huBERT base model (cased) fine-tuned on SQuADv2 (NEW!) 

- huBert model + Tokenizer: https://huggingface.co/SZTAKI-HLT/hubert-base-cc
- Hungarian SQUADv2 dataset: Machine Translated SQuAD dataset (Google Translate API)

<p> <i> "SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.[1]" </i> </p> 

## Model in action
- Fast usage with pipelines:

```python
from transformers import pipeline
qa_pipeline = pipeline(
    "question-answering",
    model="mcsabai/huBert-fine-tuned-hungarian-squadv2",
    tokenizer="mcsabai/huBert-fine-tuned-hungarian-squadv2",
    topk = 1,
    handle_impossible_answer = True
)
predictions = qa_pipeline({
    'context': "Máté vagyok és Budapesten élek már több mint 4 éve.",
    'question': "Hol lakik Máté?"
})
print(predictions)
# output:
# {'score': 0.9892364144325256, 'start': 16, 'end': 26, 'answer': 'Budapesten'}
```
Two important parameter:
- <p> <b> topk </b> (int, optional, defaults to 1) — The number of answers to return (will be chosen by order of likelihood). Note that we return less than topk answers if there are not enough options available within the context. </p>
- <p> <b> handle_impossible_answer </b> (bool, optional, defaults to False): Whether or not we accept impossible as an answer. </p>


[1] https://rajpurkar.github.io/SQuAD-explorer/