Model Details

This model was originally developed as part of the 1st place solution for the AI Tinkerer's Hackathon in Kuala Lumpur for an LLM-as-a-Judge use case.

It is a finetune mesolitica/malaysian-debertav2-base. We're using DeBERTa (Decoding-enhanced BERT with disentangled attention) for a Natural language inference (NLI) task. In our case, NLI is the task of determining whether a "hypothesis" is true (entailment) or false (contradiction) given a question-statement pair. DeBERTa was selected due to its SOTA performance in comparison to other models like BERT and RoBERTAa.

Training Details

Overall, solely using the Boolq-Malay dataset (comprised of both Malay and English versions of the original Boolq dataset), we obtain the follwing results:

  • No. of Epochs: 10
  • Accuracy: 66%
  • F1-Score: 65%
  • Recall: 65%
  • Precision: 66%

In the future, we can do the following to garner better results:

  • Increase the gradient_accumulation_steps to deal with the small GPU constraints or increase the batch_size if we've access to a larger GPU. The reasoning is mainly to avoid Out of Memory Errors (OOM).
  • Given more compute resources, we can also increase our patience variable and train for more than 10 epochs.

The training notebook can be found here: https://github.com/wanadzhar913/aitinkerers-hackathon-supa-team-werecooked/blob/master/notebooks-finetuning-models/02_finetune_v1_malaysian_debertav2_base.ipynb

Usage

from transformers import AutoTokenizer, AutoConfig, pipeline, \
                         DebertaV2ForSequenceClassification

config = AutoConfig.from_pretrained('wanadzhar913/malaysian-debertav2-finetune-on-boolq')
tokenizer = AutoTokenizer.from_pretrained('wanadzhar913/malaysian-debertav2-finetune-on-boolq')
model = DebertaV2ForSequenceClassification.from_pretrained('wanadzhar913/malaysian-debertav2-finetune-on-boolq', config = config)

pipe = pipeline(
    "text-classification",
    tokenizer = tokenizer,
    model=model,
    padding=True,
    device=0,
)

# https://www.astroawani.com/berita-malaysia/belanjawan-2025-gaji-minimum-ditingkatkan-kepada-rm1-700-sebulan-492383
article = """
KUALA LUMPUR: Kerajaan bersetuju untuk menaikkan kadar gaji minimum daripada RM1,500 sebulan kepada RM1,700, berkuat kuasa 1 Februari 2025.
Perdana Menteri Datuk Seri Anwar Ibrahim sewaktu membentangkan Belanjawan 2025 Malaysia MADANI di Dewan Rakyat pada Jumaat berkata,
penstrukturan ekonomi hanya dianggap berjaya apabila rakyat meraih gaji dan upah yang bermakna untuk menjalani hidup dengan lebih selesa.
"""

pipe([('Betul ke kerajaan naikkan gaji minimum?', article)])
>>> [{'label': 'entailment', 'score': 0.8098661303520203}]

pipe([('Did the government top up minimum wage?', article)])
>>> [{'label': 'entailment', 'score': 0.9928961396217346}]

pipe([('Government naikkan gaji minimum', article)])
>>> [{'label': 'entailment', 'score': 0.7880232334136963}]
Downloads last month
1
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for wanadzhar913/malaysian-debertav2-finetune-on-boolq

Finetuned
(1)
this model

Datasets used to train wanadzhar913/malaysian-debertav2-finetune-on-boolq