This model is a binary classifier developed to analyze comment authorship patterns on Korean news articles. For further details, refer to our paper on Journalism: News comment sections and online echo chambers: The ideological alignment between partisan news stories and their user comments

  • This model is a BERT classification model to classify Korean user generated comments into binary labels of liberal or conservative.
  • This model was trained on approximately 37,000 user generated comments collected from NAVER's news portal. The dataset was collected in 2019; as such, note that comments related to recent political topics might not be classified correctly.
  • This model is a finetuned model based on ETRI's KorBERT.

How to use

  • The model requires an edited version of the transformers class BertTokenizer, which can be found in the file KorBertTokenizer.py.
  • Usage example:
from KorBertTokenizer import KorBertTokenizer
from transformers import BertForSequenceClassification
import torch

tokenizer = KorBertTokenizer.from_pretrained('conviette/korPolBERT')
model = BertForSequenceClassification.from_pretrained('conviette/korPolBERT')

def classify(text):
    inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt')

    with torch.no_grad():
        logits=model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        return model.config.id2label[predicted_class_id]


input_strings = ['์ขŒํŒŒ๊ฐ€ ๋‚˜๋ผ ๊ฒฝ์ œ ์•ˆ๋ณด ๋ง์•„๋จน๋Š”๋‹ค',
                 '์ˆ˜๊ผด๋“ค์€ ๋‚˜๋ผ ์ผ๋ณธํ•œํ…Œ ํŒ”์•„๋จน์—ˆ๋ƒ']

for input_string in input_strings:
    print('===\n์ž…๋ ฅ ํ…์ŠคํŠธ: {}\n๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ: {}\n==='.format(input_string, classify(input_string)))

Model performance

Downloads last month
5
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.