indo-sbert-nli-similarity-step-1
A BERT-based model fine-tuned for Natural Language Inference using a similarity approach.
Model Details
This model is a fine-tuned version of firqaaa/indo-sentence-bert-base for Natural Language Inference (NLI) tasks in Indonesian. It uses a similarity-based approach to determine the inferential relationship between a premise and hypothesis, classifying it as entailment, neutral, or contradiction.
Training Data
The model was fine-tuned on the afaji/indonli dataset, which contains Indonesian premise-hypothesis pairs labeled with entailment, neutral, or contradiction.
Evaluation Results
Validation loss: 0.1249, accuracy: 0.5831, pearson: 0.5690 Test Lay loss: 0.1365, accuracy: 0.5638, pearson: 0.5261 Test Expert loss: 0.1742, accuracy: 0.4578, pearson: 0.3038
Usage
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F
# Load model and tokenizer
model = AutoModel.from_pretrained("fabhiansan/indo-sbert-nli-similarity")
tokenizer = AutoTokenizer.from_pretrained("fabhiansan/indo-sbert-nli-similarity")
# Function for mean pooling
def mean_pooling(token_embeddings, attention_mask):
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Example NLI inputs
premise = "Keindahan alam yang terdapat di Gunung Batu Jonggol ini dapat Anda manfaatkan sebagai objek fotografi yang cantik."
hypothesis = "Keindahan alam tidak dapat difoto."
# Encode inputs
encoded_premise = tokenizer(premise, padding=True, truncation=True, return_tensors="pt")
encoded_hypothesis = tokenizer(hypothesis, padding=True, truncation=True, return_tensors="pt")
# Get embeddings
with torch.no_grad():
# Get embeddings
outputs_premise = model(**encoded_premise)
outputs_hypothesis = model(**encoded_hypothesis)
# Mean pooling
embedding_premise = mean_pooling(outputs_premise.last_hidden_state, encoded_premise["attention_mask"])
embedding_hypothesis = mean_pooling(outputs_hypothesis.last_hidden_state, encoded_hypothesis["attention_mask"])
# Normalize embeddings
embedding_premise = F.normalize(embedding_premise, p=2, dim=1)
embedding_hypothesis = F.normalize(embedding_hypothesis, p=2, dim=1)
# Compute similarity
similarity = F.cosine_similarity(embedding_premise, embedding_hypothesis).item()
# Convert similarity to NLI label
if similarity >= 0.7:
label = "entailment"
elif similarity <= 0.3:
label = "contradiction"
else:
label = "neutral"
print(f"Premise: {premise}")
print(f"Hypothesis: {hypothesis}")
print(f"Similarity: {similarity:.4f}")
print(f"NLI Label: {label}")
Limitations and Biases
- The model is specifically trained for Indonesian language and may not perform well on other languages or code-switched text.
- Performance may vary on domain-specific texts that differ significantly from the training data.
- Like all language models, this model may reflect biases present in the training data.
Citation
If you use this model in your research, please cite:
@misc{fabhiansan2025indonli,
author = {Fabhiansan},
title = {Fine-tuned SBERT for Indonesian Natural Language Inference},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/indo-sbert-nli-similarity-step-1}}
}
And also cite the original SBERT and Indo-SBERT works:
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{arasyi2022indo,
author = {Arasyi, Firqa},
title = {indo-sentence-bert: Sentence Transformer for Bahasa Indonesia with Multiple Negative Ranking Loss},
year = {2022},
month = {9},
publisher = {huggingface},
journal = {huggingface repository},
howpublished = {https://huggingface.co/firqaaa/indo-sentence-bert-base}
}
- Downloads last month
- 2