This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.
Task: Sentiment analysis (binary classification).
Languages Supported: Khmer.
Intended Use Cases:
- Analyzing customer reviews.
- Social media sentiment detection.
Limitations: - Performance may degrade on languages or domains not present in the training data. - Does not handle sarcasm or highly ambiguous inputs well.
The model was evaluated on a test set of 400 samples, achieving the following performance:
Test Accuracy: 81%
Precision: 81%
Recall: 81%
F1 Score: 81%
Confusion Matrix:
Predicted\Actual | Negative | Positive |
---|---|---|
Negative | 165 | 44 |
Positive | 31 | 160 |
The model supports a maximum sequence length of 512 tokens. |
How to Use
from huggingface_hub import hf_hub_download
import fasttext
from khmernltk import word_tokenize
model = fasttext.load_model(hf_hub_download("tykea/khmer-fasttext-sentiment-analysis", "model.bin"))
def predict(text):
# Tokenize the text
tokens = word_tokenize(text)
# Join tokens back into a single string
tokenized_text = ' '.join(tokens)
# Make predictions
predictions = model.predict(tokenized_text)
# Map labels to human-readable format
label_mapping = {
'__label__0': 'negative',
'__label__1': 'positive'
}
# Get the predicted label
predicted_label = predictions[0][0]
# Map the predicted label
human_readable_label = label_mapping.get(predicted_label, 'unknown')
return human_readable_label
predict('αααααΈααΆααααα’αα·αααααΆααααααΆαααααααΆααααααα')
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tykea/khmer-fasttext-sentiment-analysis
Base model
facebook/fasttext-km-vectors